Distance Geometry and Data Science
Many problems in data science are addressed by mapping entities of various kind to vectors in a Euclidean space of some dimension. Most of these methods (e.g. Multidimensional Scaling, Principal Component Analysis, K-means clustering, random projections) are based on the proximity of pairs of vectors. In order for the results of these methods to make sense, the proximity of entities in the original problem must be well approximated in the Euclidean space setting. If proximity were known for each pair of original entities, this mapping would be a good example of isometric embedding. Usually, however, this is not the case, as data are partial, noisy and wrong. I shall survey some of the methods above from the point of view of Distance Geometry. The reference text is the invited survey https://link.springer.com/article/10.1007/s11750-020-00563-0.