Deep Learning Embeddings for Data Series Similarity Search

Qitong Wang and Themis Palpanas, published in ACM SIGKDD 2021

Abstract

A key operation for the (increasingly large) data series collection analysis is similarity search. According to recent studies, SAX-based indexes offer state-of-the-art performance for similarity search tasks. However, their performance lags under high-frequency, weakly correlated, excessively noisy, or other dataset-specific properties. In this work, we propose Deep Embedding Approximation (DEA), a novel family of data series summarization techniques based on deep neural networks. Moreover, we describe SEAnet, a novel architecture especially designed for learning DEA, that introduces the Sum of Squares preservation property into the deep network design. Finally, we propose a new sampling strategy, SEASam, that allows SEAnet to effectively train on massive datasets. Comprehensive experiments on 7 diverse synthetic and real datasets verify the advantages of DEA learned using SEAnet, when compared to other state-of-the-art traditional and DEA solutions, in providing high- quality data series summarizations and similarity search results.

Materials

Paper in KDD21, VLDB22 PhD Workshop
Slides at KDD21
Talk at KDD21, Tsinghua AI Time PhD Forum (in Chinese)

Codes

SEAnet architecture
Indexing & query answering

Datasets

Astro (astrophysics)  
Deep1B (computer vision)  
SALD (neuroscience)  
Seismic (seismology)  

Cite this work

BibTex bib
ACM Ref Qitong Wang and Themis Palpanas. 2021. Deep Learning Embeddings for Data Series Similarity Search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD’21). Association for Computing Machinery, New York, NY, USA, 1708–1716. DOI:https://doi.org/10.1145/3447548.3467317