Tuesday Mar 12, 2024

arxiv preprint - Is Cosine-Similarity of Embeddings Really About Similarity?

In this episode, we discuss Is Cosine-Similarity of Embeddings Really About Similarity? by Harald Steck, Chaitanya Ekanadham, Nathan Kallus. The paper investigates the use of cosine-similarity in quantifying semantic similarity between embedded vectors in high-dimensional space, and reveals potential issues when applied to embeddings from regularized linear models. Analytical study of these models shows that cosine-similarity can produce meaningless or non-unique similarity measures, with the effects of regularization often implicitly influencing the results. The authors warn against the uncritical use of cosine-similarity in deep learning models due to these findings and suggest considering alternative methods to ensure the validity and clarity of semantic similarity assessments.

Comment (0)

No comments yet. Be the first to say something!