Corpus-based Automatic Text Expansion

18th International Conference on Intelligent Text Processing and Computational Linguistics

Publication date: April 17, 2017

Balaji Vasan Srinivasan, Rishiraj Saha Roy, Harsh Jhamtani, Natwar Modani, Niyati Chhaya

The task of algorithmically expanding a textual content based on an existing corpus can aid in efficient authoring and is feasible if the desired additional materials are already present in the corpus. We propose an algorithm that automatically expands a piece of text, by identifying paragraphs from the repository as candidates for augmentation to the original content. The proposed method involves: extracting the keywords, searching the corpus, selecting and ranking relevant textual units while maintaining diversity in the overall information in the expanded content, and finally concatenating the selected text units. We propose metrics to evaluate the expanded content for diversity and relevance, and compare them against manual annotations. Results indicate viability of the proposed approach.

Learn More