Distance geometry for word embeddings and applications

Speaker:

Contributed Talk by Sammy Khalife, LIX, Ecole polytechnique

Date and Time:

Friday, May 14, 2021 - 11:30am to 12:00pm

Location:

Online

Abstract:

Many methods used for the treatment of sequential data often rely on the construction of intermediate representations of unitary entities (e.g. words in natural language, or k-mers in bioinformatics). Traditionally, these representations are constructed with optimization formulations arising either from co-occurrence based models, or encoders combined with attention mechanisms. In this work, we propose a new formulation to embed these entities based on the Distance Geometry Problem (DGP): find object positions based on a subset of their pairwise distances. Considering the empirical Pointwise Mutual Information (PMI) as an inner product approximation, we discuss two algorithms to obtain approximate solutions of the underlying Euclidean DGP on large instances. The main advantage of our approach is its significantly lower computational complexity, which allows us to train representations much faster with a negligible quality loss. Furthermore, our experiments demonstrate these properties are valuable when pre-trained models are not available, for example in the case of low resource natural language processing, or for protein sequence embeddings in bioinformatics.

Slides

The Fields Institute for
Research in Mathematical Sciences

Distance geometry for word embeddings and applications

Scheduled as part of

People and Contacts

Calendar and Events