Principal curves in the space of probability measures
Given a data distribution which is concentrated around a one-dimensional structure, can we infer that structure? We consider a variant of this inference problem in the case where the data is itself probability-measure-valued, in other words each "data point" is actually a probability measure chosen at random according to some distribution over the space of probability measures itself.
To this end, we introduce principal curves in the Wasserstein space of probability measures, as a nonlinear variant of measure-valued principal components. Our motivation comes from the problem of "trajectory inference" in computational biology, where a developing population of cells is observed via a time-dependent distribution of gene expression data. However, in certain experimental situations, the time of collection is unknown, and so the ordering of the batches of data must be inferred from the geometry of the data itself. We propose an estimator based on Wasserstein principal curves, and prove it is consistent for recovering a curve of probability measures from empirical samples; the consistency of the estimated ordering is deduced as a corollary. Our consistency theorem is obtained via a series of results regarding principal curves in general compact metric spaces.
Bio: Andrew Warren is a Canadian mathematician and statistician. He earned his doctorate from Carnegie Mellon University in 2022 under the direction of Dejan Slepčev. He was a visitor at l'Institut des Hautes Études Scientifiques during 2022-3; since 2023 he has been a postdoctoral researcher at the University of British Columbia. His research interests belong to the union of: optimal transport and calculus of variations, statistics and machine learning, and partial differential equations.

