Dramatically Reducing Training Data Size Through Vocabulary Saturation

Users' field has seen significant improvements in the quality of machine translation systems over the past several years. The single biggest factor in this improvement has been the accumulation of ever larger stores of data. However, the authors now find themselves the victims of their own success, in that it has become increasingly difficult to train on such large sets of data, due to limitations in memory, processing power, and ultimately, speed (i.e., data to models takes an inordinate amount of time).

Provided by: Microsoft Topic: Hardware Date Added: Jul 2013 Format: PDF

Find By Topic