2012-08-27

emission-line clustering and classification

The BPT diagram has been incredibly productive in classifying galaxies into star-forming and AGN-powered classes. However, the diagram only shows two ratios of nearby lines; ratios of nearby lines so that dust and spectrograph calibration don't mess up the data, and two because it is a single two-dimensional plot. There might be many features in emission-line space sitting undiscovered in the data; there might be many sub-classes and rich structure within the star-forming and AGN groups.

From a data perspective, times have really changed since BPT: (1) There are dozens (well, a dozen) of visible lines in hundreds of thousands of spectra. (2) We have good noise models for the line measurements and this is especially important when they get low in signal-to-noise (as they do if you want to use many lines. (3) We have very well-calibrated spectra now, even spectrophotometrically good to a few percent in the SDSS. (4) The effects of dust attenuation are pretty well understood in the optical. So let's go high dimensional and find all the complex structure that must be there!

The first step is to measure all the lines in a long list, and measure them even when the signal-to-noise is low. We don't care about detections we care about measurements with well-understood noise. The second step is to develop dust-insensitive metrics: What is the distance in data space between two sets of dust-line measurements as a function of noise but marginalizing out the dust affecting each spectrum? Now in that space, let's do some clustering.

I have done nothing on this except discuss it, years ago, with John Moustakas (Siena College). At that time, we were thinking in terms of generating archetypes with an integer program (with my now-deceased guru Sam Roweis). You could use things like support vector machines (great for these kinds of tasks) but we have no labels to classify on. The idea is to find classes not yet discovered! Also SVMs are not sensitive to the uncertainties in the data. I would recommend something like extreme deconvolution which does density estimation of the noise-deconvolved distribution. It can deal with very low signal-to-noise data gracefully. It would have to be modified, however, to project out (marginalize out) the dust-extinction direction in line space. Not impossible but not trivial either.

No comments:

Post a Comment