Multi-label Connectionist Temporal Classification

International Conference on Document Analysis and Recognition (ICDAR)

Publication date: September 20, 2019

Curtis Wigington, Brian Price, Scott Cohen

The Connectionist Temporal Classification (CTC) loss function [1] enables end-to-end training of a neural net- work for sequence-to-sequence tasks without the need for prior alignments between the input and output. CTC is traditionally used for training sequential, single-label problems; each element in the sequence has only one class. In this work, we show that CTC is not suitable for multi-label tasks and we present a novel Multi-label Connectionist Temporal Classification (MCTC) loss function for multi-label, sequence-to-sequence classification. Multi-label classes can represent meaningful attributes of a single element; for example, in Optical Music Recognition (OMR), a music note can have separate duration and pitch attributes. Our approach achieves state-of-the-art results on Joint Handwritten Text Recognition and Name Entity Recognition, Asian Character Recognition, and OMR.

Research Area:  Adobe Research iconDocument Intelligence