Hi, I’m a specialist on density matrices. Sorry for the delay.
There are two different things we call a “reduced density matrix.” They’re related, but they don’t have the same properties.
The first is going to look something like D_pq = <Psi | a^p a_q | Psi>. This formulation is great for methods that give you a specific wavefunction that you can take an expectation value of. This is the form you need to get D(r, r’) = \sum D_pq phi_p(r) phi_q(r’). One of the other great things about this formulation is that computing one-electron properties is, naively, straightforward. The expectation value of a one-electron property is computed simply as I_pq D_pq, where I is your integrals. There are other ways of computing properties, of course, and those lead to variations of the density matrix. This is important for specialists but is beyond the scope of what this question needs.
Implicit in all of this is the assumption that we’re working with orthonormal orbitals. This assumes that your property integrals are expressed in terms of orthonormal orbitals, and this is often an inconvenient thing to assume. What we do in those cases is write I_pq D_pq = I_mn (C_mp C_nq D_pq). This gives us an AO “density matrix” that we can just contract against AO-basis integrals and get the correct numbers. This C_mp C_nq D_pq is also called the AO density matrix. But this isn’t really a density matrix. It’s something that acts like a density matrix for one very specific purpose : contracting against AO-basis integrals to get properties.
Now for Hartree-Fock computations, this “AO density matrix” is the only computationally important property. Only people deriving equations need to worry about the actual density matrix. For every actual computation, the AO density matrix is more convenient to work with because it saves you the trouble of transforming integrals into the MO basis.
So, what does this have to do with electron number? The MO density matrix has the nice property that trace(D) is the number of electrons. What about the AO density matrix?
num_electrons = \sum_{i occupied} <phi_i, phi_i>
= \sum_{i occupied, mu, nu} <C_mu,i chi_mu, C_nu,i chi_nu>
= \sum_{mu, nu} D_mu,nu <chi_mu, chi_nu>
= \sum_{mu, nu} D_mu,nu S_mu, nu
= trace(D * S)
From there, we can use the cyclic property of the trace to rewrite as trace(S^{1/2} * D * S^{1/2}).
Line 1 used the assumption of orthonormal orbitals, and the jump between lines 2 and 3 used the assumption of a single-determinant wavefunction. I’d need to think more to know whether the trace(D*S) formula is still valid when you don’t have the single-determinant assumption, but in those cases, you probably have the MO density matrix anyways and can just compute the electron number from that.
EDIT: Yes, the tr(D * S) should be valid more generally. C * S * C.T is just going to be the overlap matrix in MOs, which had better be the identity matrix by the orthonormality assumption. In MOs, we already know tr(D) = num. el. by other arguments.