impute missing data in spectra

Let me say at the outset that I don't think that imputing missing data is a good idea in general. However, missing-data imputation is a form of cross-validation that provides a very good test of models or methods. My suggestion would be to take a large number of spectra (say stars or galaxies in SDSS), censor patches (multi-pixel segments) of them randomly, saving the censored patches. Build data-driven models using the uncensored data by means of PCA, HMF, mixture-of-Gaussians EM, and XD, at different levels of complexity (different numbers of components). Compare in their ability to reconstruct the censored data. Then use the best of the methods as your spectral models for, for example, redshift identification! Now that I type that I realize the best target data are the LRGs in SDSS-III BOSS, where the (low) redshift failure rate could be pushed lower with a better model. Advanced goal: Go hierarchical and infer/understand priors too.

No comments:

Post a Comment