Take the SDSS spectra (which are beautifully calibrated spectrophotometrically) and interpolate them onto a common rest-frame (de-redshifted) wavelength grid. Do clever things to interpolate over missing and corrupted data where necessary; this might involve performing a PCA and using the PCA to patch and then re-doing PCA and so on. Then re-normalize the data so that the
amplitudes of all the spectra are the same; I am being vague here because I don't know the best choice for definition of
amplitude. This is all pre-conditioning for the data; in principle the recommendation here could be applied to any data set; I am just proposing the SDSS spectra.
Now search for a unit-norm (or otherwised normalized) eigenspectrum such that when you dot all pre-conditioned SDSS spectra onto the eigenspectrum, you obtain a distribution of coefficients (dot products) that has minimum kurtosis. That is, instead of finding the principal components—the components with maximum variance—we will look for the platykurtic components—the components with minimum kurtosis. If you are stoked, search the orthogonal subspace for the next-to-minimum kurtosis direction and so on.
Why, you ask? Because low-kurtosis distributions are bi-modal. Indeed, early experiments (performed by Vivi Tsalmantza (MPIA) and myself back in 2008) indicate that this will identify the eigenspectra that best separate the
red sequence galaxies from the
blue cloud. If you really want to go to town invent a bimodality scalar that is better than kurtosis.
One note: Optimization is a challenge. This sure ain't convex. My approach back in the day was to throw down randomly generated spectra, choose ones that happened to hit fairly low kurtosis, and optimize locally from those.