SPADE: Streaming PARAFAC2 Decomposition for Large Datasets

SIAM International Conference on Data Mining, (SDM 2020)

Published May 7, 2020

Ekta Gurjal, Georgios Theocharous, Evangelos E. Papalexakis

In tensor mining, PARAFAC2 is a powerful and a multimodal factor analysis method that is ideally suited for modeling for batch processing of data which forms “irregular” tensors, e.g., user movie viewing profiles, where each user’s timeline does not necessarily align with other users. However, these days data is dynamically changing which hinders the use of this model for large data. The tracking of the PARAFAC2 decomposition for the dynamic tensors is very pivotal and challenging task due to the variability of incoming data and lack of online efficient algorithm in terms of time and memory. In this paper, we fill this gap by proposing an efficient method to compute the PARAFAC2 decomposition of streaming large tensor datasets containing millions of entries, called SPADE. In terms of effectiveness, our proposed method shows comparable results with the prior work, PARAFAC2, while being computationally much more efficient. We evaluate SPADE on both synthetic and real datasets, indicatively, our proposed method shows 10 − 23× speedup and saves 17−150× memory usage over the baseline methods and is also capable of handling larger tensor streams (≈ 7 million users) for which the batch baseline was not able to operate. To the best of our knowledge, SPADE is the first approach to online PARAFAC2 decomposition while not only being able to provide on par accuracy but also provide better performance in terms of scalability and efficiency.

Learn More

Research Area:  AI & Machine Learning