Cloud

How to use Amazon Mechanical Turk to perform tasks too complex to compute

Despite advances in machine learning and artificial intelligence, there is no substitute for human intuition for certain tasks. Learn how you can use Amazon Mechanical Turk for your data processing.

peoplelaptop.jpg
Image: iStockphoto/cybrain

Computing has come a long way in the last decade—optical character recognition (OCR) software is adept at reading printed text and voice recognition software can understand normal speech. But, there are still many tasks for which it is impractical to use computers, such as labeling objects in an image, separating safe for work images from indecent ones, or ensuring product catalogs do not contain duplicate entries.

Microwork platforms like Amazon Mechanical Turk connect requesters with tasks that are easy for humans to do to workers who can perform them.

What you can do with Mechanical Turk

With Mechanical Turk, requesters can create Human Intelligence Tasks (HITs) for providers—that is, the people working to respond to your requests—to perform. These tasks can be practically anything that can be performed while using a computer. Amazon provides project templates for common use cases, including data collection, gauging sentiment based on a text, surveys (including externally hosted surveys), transcription, as well as image categorization, moderation, and tagging.

There are specific circumstances for which you can categorize your HITs. It is possible to use Mechanical Turk with data sets that are indecent, though these should be labeled as such so that workers who are in a public space are not caught unaware of the link they are clicking on, and workers who do not wish to view this material are informed of the content.

Requesters can create "qualifications" for verifying the ability of certain workers to perform a given HIT, and restricting HITs to only those qualified workers. Additionally, the status of "masters" is given to workers with a demonstrated record of accuracy in photo moderation or categorization. If you create a HIT that is exclusive to users designated as qualified or masters, the reward for your HIT should be higher than one available for general access.

What to know when making your first HIT

The key to getting a satisfactory result from workers is to be as precise as possible in your description and instructions, and be easily indexable. Providing concise, descriptive information in the HIT description aides workers in understanding the aim of your HIT.

tr159-a.jpg

When creating a HIT, be direct about the purpose of the HIT, and be aware of your audience. This is not a PR pitch, so language like "opportunity to participate" or "chance to provide opinion" is not productive. Putting an estimated time for completion in the title is often welcomed, as long as it is accurate. Workers often perform multiple HITs in a given session, so a HIT that can be completed relatively quickly, with a reward that is fair, will likely be completed quickly.

tr159-b.jpg

It is important to be mindful of your budget for your project. While you set the fee you pay workers, fees on HITs can add up quickly, depending on the project type. By default, Amazon charges 20% of the reward and bonus (if any) you provide to workers. HITs with 10 or more assignments are subject to an additional 20% fee on the paid reward, making tasks such as surveys substantially more expensive than other use cases.

SEE: Electronic Communication Policy Template (Tech Pro Research)

For photo moderation, assignments can be set to 1 (or 2, if you want a second opinion) while, for surveys, larger assignment numbers are needed in order to obtain a sufficiently large sample. Restricting HITs to workers with a masters status incurs an additional 5% fee.

tr159-c.jpg

The survey portion is HTML formatted, with an example bootstrap template provided. Reducing vertical space in the layout—using drop-down boxes rather than lists with radio buttons, for example—reduces the need for scrolling, making the process of completing the HIT faster for workers. If you replace the default template, be mindful of keeping extraneous visual information to a minimum in order to direct focus to the content of the HIT.

What's your view?

Have you used Mechanical Turk for your project? If so, what was the task you were requesting? Have you performed HITs on Mechanical Turk? Share your experiences in the comments.

Also see

About

James Sanders is a Java programmer specializing in software as a service and thin client design, and virtualizing legacy programs for modern hardware.

Editor's Picks

Free Newsletters, In your Inbox