For any binary classification dataset, let ππ΅ β βπΓπ and ππ β βπΓπ beβ¦
2024
For any binary classification dataset, let ππ΅ β βπΓπ and ππ β βπΓπ be the between-class and within-class scatter (covariance) matrices, respectively. The Fisher linear discriminant is defined by π’β β βπ , that maximizes
\(J(u) = \frac{u^T S_B u}{u^T S_W u} \)
If π = π½(π’β ), ππ is non-singular and ππ΅ β 0, then (π’β , π) must satisfy which ONE of the following equations?
Note: β denotes the set of real numbers.
- A.
\(S_W^{-1} S_B u^* = \lambda u^* \) - B.
\(S_W u^* = \lambda S_B u^* \) - C.
\(S_B S_W u^* = \lambda u^* \) - D.
\(u^{*T} u^* = \lambda^2 \)
Show answer & explanation
Correct answer: A
Key result: At a stationary point u* of J(u) we have S_B u* = Ξ» S_W u*. Because S_W is invertible, this is equivalent to S_W^{-1} S_B u* = Ξ» u*, so u* is an eigenvector of S_W^{-1} S_B with eigenvalue Ξ» = J(u*).
Derivation (sketch):
Consider the Rayleigh quotient J(u) = (u^T S_B u)/(u^T S_W u). To find stationary points, use a Lagrange multiplier or take the derivative of J(u).
Setting the derivative to zero yields (S_B - Ξ» S_W) u = 0, i.e. S_B u = Ξ» S_W u, where Ξ» equals J(u) at the stationary point.
Since S_W is non-singular, multiply by S_W^{-1} to get S_W^{-1} S_B u = Ξ» u, the standard generalized-eigenvalue form.
The maximizing direction u* is the eigenvector of S_W^{-1} S_B corresponding to the largest eigenvalue Ξ».
Why the other proposed equations are incorrect:
An equation that places S_W on the left and S_B on the right (e.g. S_W u = Ξ» S_B u) would imply a reciprocal relationship and thus does not match Ξ» = J(u*) as defined.
A product like S_B S_W u = Ξ» u changes the matrix order and is not produced by the stationary condition; matrix multiplication order matters.
An identity relating the squared norm of u* to Ξ»^2 is unrelated to the Rayleigh quotient optimization and does not follow from the derivation.