Moving Text Analysis Tools to the Cloud
Text analysis is an important computational task, as unstructured data including text abound and can potentially provide interesting information and knowledge in a variety of areas. In the authors' collaboration with Digital Humanists, they have started to examine the opportunities that the cloud offers to improving the response times of text-analysis tools so that users can comparatively analyze large text corpora across a variety of dimensions. To that end, they have started migrating existing text analysis tools to the cloud, beginning with TAPoR, the Text Analysis Portal for Research. In this paper, they discuss their experience redesigning and re-implementing four basic TAPoR operations on Hadoop and they report on the performance improvements enabled by the migration.