Data-driven models tend to be very naive about noise. Jo Bovy (IAS) built a great data-driven model of the quasar population that makes use of our highly vetted photometric noise model, to produce the best-performing photometric redshift system for quasars (that I know). This has been a great success of Bovy's extreme deconvolution (XD) hierarchical distribution modeling code. Let's do this again but for galaxies!
We know more about galaxies than we do quasars—so maybe a data-driven model doesn't make much sense—but we also know that data-driven models (even ones that don't take account of the noise) perform comparably well to theory-driven models, when it comes to galaxy photometric redshift prediction. So a data-driven model that takes account of the noise might kick ass. This was strongly recommended to me by Emmanuel Bertin (IAP). In other news, Bernhard Schölkopf (MPI-IS) opined to me that it might be the causal nature of the XD model that makes it so effective. I guess that's a non-sequitur.