Mechanical Turk Best Practices

Last night, Dolores Labs hosted what was billed as the first-ever Mechanical Turk meetup, and I was fortunate enough to have been able to squeeze into what turned out to be a great series of presentations. While Amazon was the pioneer and remains the largest provider in the space, other services like Dolores Labs and Nathan Eagle’s txteagle have emerged to expand the pool of users and turks.

In the past, we’ve turned to Dolores Labs when we needed (machine-learning) training sets and were unable to quickly find reliable ones. To increase the quality of the output we receive from turks, we try to get multiple turks to perform an individual task and aggregate their work into a single answer. (We jokingly refer to this as the wisdom of micro-crowds.) Working on problems quite different from the ones we tackle, the first set of speakers presented research results confirming that this form of aggregation actually works. Rion Snow of Stanford’s AI Lab presented results that suggest that for a large set of tasks, the aggregate work of 4-6 turks compare favorably to the work of a single (domain) expert. Working primarily in the area of NLP and computational linguistics, Bob Carpenter of alias-i presented similar results when evaluating turk-generated against gold standard training sets. (It’s hard enough when turks disagree, but as Bob Carpenter highlighted, disagreements among experts makes it difficult to arrive at a gold standard.) Bob has found that in certain situations an iterative approach works best (“code-a-little”, “learn-a-little”) and tools that allow you to start suggesting “answers” to a new set of turks would help immensely. Coincidentally, one of the speakers presented a toolkit that allows users to do just that: Greg Little’s TurKit is a JavaScript API for running iterative tasks in mechanical turk.

Another set of speakers talked about the emergence of mechanical turks as a research tool. Social scientists Aaron Shaw and John Horton spoke of favorably of their experience using turks for research experiments in economics and paired surveys. Among other things, they’ve conducted studies on the turk labor market by testing demand for tasks of varying difficulty (something Bob Carpenter also talked about), and by evaluating demand for follow-on tasks at lower wages. Alexander Sorokin of UIUC, presented work on using turks to annotate training sets for computer vision and robotics. For those interested in using turks to annotate images, Alex has a toolkit ready to go.

For most users of mechanical turk (us included), it has become an API call that fits smoothly within their workflow. (Or as someone at the meetup wryly suggested, turk is a Remote Person Call.) The last pair of speakers, Lilly Irani and Six Silberman, reminded us that behind mechanical turk lies thousands of workers (“the crowd in the cloud”) working without (health care) benefits, oftentimes at extremely low hourly wages. Irani and Silberman suggested that rather than abstracting mechanical turk services as mere API calls, users should start thinking of the plight of the turks (“Mechanical Turk Bill of Rights”) behind the service. As a first step they have a released a Firefox plugin that aims to narrow the information assymetry between turks (those performing tasks) and requesters (those posting tasks). While requesters can see ratings for turks, requesters aren’t rated: Turkopticon lets turks rate requesters. They need more turks to download and start using Turkopticon, so if you know any mechanical turks please enourage them do so.

(†) According to Amazon representatives in the audience, a majority of turks are in the U.S. That may change in the future, once Amazon is able to get approval for other payment systems. Because of the possibility of money-laundering, services like AMT are subject to strict KYC controls.

tags: , , ,
  • http://tim.oreilly.com/ Tim O'Reilly

    Great article, Ben.

    FWIW, I was struck by the point in this article about the Google search quality process by the mturk-like quality of how they do search evaluation via a team of tens of thousands of distributed testers. Very much to your point about getting multiple turks to perform the same task, and then create a single synthetic result.

    This is an important technique, and more people ought to understand and use it.

  • Peter Czukor

    Ben,

    > For most users of mechanical turk (us included), it has become an API call that fits smoothly within
    > their workflow. Or as someone at the meetup wryly suggested, turk is a Remote Person Call.

    That someone’d be me. Always wanted to be a meme, I guess here’s my chance.

    So Ben, you can call me Wryly now, (so how do you address Tim? Oh, never mind…)

    Peter Czukor
    A9

    P.S. Can we have a “preview” before final Submit function?

  • https://www.livework.com/ Ross Dakin

    Keep an eye on LiveWork – it’s going to give mturk a run for their money.

  • http://pgaval.wordpress.com Petros

    Very nice article! Are there any ongoing efforts on “internationalizing” such crowds and/or gathering inter-culture turk answers ?

    Thanks

    • http://radar.oreilly.com/ben/ Ben Lorica

      Hi Petros,

      1. As I mentioned in my footnote:
      “According to Amazon representatives in the audience, a majority of turks are in the U.S. That may change in the future, once Amazon is able to get approval for other payment systems. Because of the possibility of money-laundering, services like AMT are subject to strict KYC controls.”
      2. There are other MT services like txteagle that are focused on harnessing workers from developing countries.
      3. Finally, Lilly Irani and Six Silberman conducted a survey of Amazon Mechanical Turks and they did get responses from turks based overseas. You may want to contact them for detailed results of their survey.
      Regards,
      Ben

  • Saddik Murtala

    Can someone please explain to me what Mechanical Turk is? I am really lost in that term.

    And where does it fit into Web2.0.

    Regards.

  • http://forum.vatan.tc/ Forum

    Turks are very advanced in science and technology. New research areas is established in turkey, need to see the place.

  • http://friendfeed.com/randulo randulo

    I figured this out when I tried it! Now doing a conference on Mturk at 1PM EDT Wednesday: http://tr.im/amtchat

  • http://www.linkedin.com/in/davidwewing David Ewing

    Hi Ben, I came across your post looking for best practices for using Mechanical Turk. Do you have any updates to this piece? Thanks – David