Leveraging Positional Information to Automatically Emphasize Key Portions of a Text

AAAI 2022 Workshop on Scientific Document Understanding

Published March 1, 2022

Sebastian Gehrmann, Franck Dernoncourt

Emphasizing key insights of a document can help users understand the overall meaning of a text, for example through highlights or by changing the font style to bold or italic. Highlighting is one of the most common annotation methods and has been shown to improve information retention. Therefore, an algorithm that automatically emphasizes key portions of a text would be highly valuable to improve a reader's experience. Previous approaches focus on identifying the semantically most meaningful parts of the text, but do not consider positional information. As a result, predicted highlights are redundant and cluster around a single location. This paper presents a method to automatically emphasize key portions of a text that leverages the positional information. We evaluate our method on a subset of the DUC 2001 and Enron corpora in which human annotators highlighted the most important sentences. Compared to an extractive summarization algorithm, our method yields 12.5 points improvement in top 4 recall and 13.4 points improvement in top 20% recall in detecting the relevant sentences. The improvement demonstrates that more information beyond contextual relevance need to be considered when aiming to identify sentences that are most interesting to the reader when presented within the text.