Dramatically Reducing Training Data Size Through Vocabulary Saturation

Provided by: Microsoft
Topic: Hardware
Format: PDF
Users' field has seen significant improvements in the quality of machine translation systems over the past several years. The single biggest factor in this improvement has been the accumulation of ever larger stores of data. However, the authors now find themselves the victims of their own success, in that it has become increasingly difficult to train on such large sets of data, due to limitations in memory, processing power, and ultimately, speed (i.e., data to models takes an inordinate amount of time).

Find By Topic