The CoderVerse is a multilingual group of graduate students and post-PhDs located around the world who perform "coding" (labeling, tagging, annotation) for customers of Texifter. In return for an hourly wage and unlimited access to DiscoverText, members of the CoderVerse learn to quickly and accurately add value to text analytics projects. They are not programmers; technical programming is not involved. Rather, they are users of our point & click graphical user interface for collaborative text analytics: DiscoverText.
While there is no typical project or guaranteed work, one thing you can be certain of: through the use of measurement we incentivize coders to be efficient, responsive, and accurate. In the past, coders have trained machine-learning relevance, topic, ideology, and sentiment models. Sometimes coders are just labeling the whole item with one or more tags. Other times, they are meticulously breaking down longer documents, generating novel new codes, or acting as interpreters for clients with multilingual text collections.
Guide to Being a Good Coder
The term “coder” can create some confusion. For our purposes, a coder is not a computer programmer. A coder is someone who records observations using a variety of techniques, but primarily through keystrokes, using a fast connection to the Internet and a web browser. Keystroke coding is the key for accuracy and speed. Coders are valuable when they are quick and likely to make valid observations. Just being fast is not good enough. To thrive in the CoderVerse, you need to ask questions when confused, dig deeper into data when necessary, write memos when asked to do so, and generally be a constructive critic. You may need to take positions about coding and justify the position in writing.
Familiarize Yourself with the DiscoverText Interface:
Help pages: https://texifter.zendesk.com/hc/en-us
Use this URL to login: https://app.discovertext.com/login.aspx
You Must Submit a W8 or W9
All coders need to submit a hard copy of a completed W8 or W9 with an ink on paper signature.
Do your best when filling in the form, and if in doubt about any of the answers, leave it blank. Please send it via regular mail to:
237 Shutesbury Road
Amherst, MA 01002
- Coders select code(s) from a predefined set of codes listed in the Code Book for each document.
- In a single code assignment, the set of codes that are available in the code book will be displayed as buttons so coders can pick only one code for each document.
- In a multiple code assignment, the set of codes that are available in the code book will be displayed as check-boxes so coders can select multiple codes for each document.
- Coders are not allowed to create new codes in this mode.
Open (“User-Defined”) Coding
- With Open Coding, there is no predefined set of codes upfront. If you are the first coder working on a dataset then you will most probably see a blank code book.
- Coders must discover the codes from within the data.
- Coders work through the first few documents using the navigation buttons, and try to identify themes from the data.
- Coders can create new codes as they identify themes.
- Coders should try to test and use the codes that have already been created, but are free to create as many codes deemed relevant to the data if none of the existing codes fit the document.
- Usually about 20-25 preliminary codes are adequate.
- If you find multiple codes sharing the same theme (codes that are either created by you or other coders), you may want to merge the codes. Be careful when merging codes. You cannot undo a code merge.
Annotation & Memo Writing
- You may want to include an annotation or memo when you encounter an interesting example or your own notes for a particular code.
- Annotations and memos can be based on the "trigger text", which contains all the text in a document or just a portion of the text in a document.
- Coders can selectively add annotations for a document.
How to Become a Good Coder
- WATCH THIS VIDEO: https://vimeo.com/69834903
- READ THIS DOCUMENT: https://texifter.zendesk.com/hc/en-us/articles/200712245-Intro-to-the-DiscoverText-interface
- Make sure you read and understand instructions clearly before you start coding a dataset.
- Ensure that you understand the definition and description of each code to avoid misusing the codes.
- Sometimes, referring to the metadata of the document is helpful when you are unsure of what code to assign to the document.
- Click the metadata icon to show/hide metadata.
- Do not randomly select codes for any document. Ask for help or clarification if you encounter any confusion.
- Try to be as consistent as possible when you are coding duplicate documents. If the content of the documents are identical, the code assigned to these documents should be identical, too.
- Sometimes you will see a Classification bar at the top of the coding interface indicating what the machine classifier’s code prediction. You do not necessarily need to agree or follow the machine classifier’s prediction. Use your own judgment to decide on the best code for each document. Some of the most important human observations contradict the machine prediction.
- When you encounter a document in a foreign language, you can use external browser-based translators like Google Translate.
- Ideally, you want to be fast and accurate. Do not sacrifice accuracy for speed. There is usually more than one coder working on a dataset, and high inter-coder reliability with other coders is important.
- Datasets are sometimes provided on a first come, first serve basis. If you see the message “You have completed coding for this dataset,” it means that the dataset is no longer available for coding because there are no uncoded documents remaining. Once enough coders have labeled a dataset it will be closed and inaccessible. Sometimes additional datasets for the same project will need to be coded in a few hours or days.
- When you are coding, please close down all other windows/tabs in your browser, except for those you may need to check a URL, search a term, or translate a document. Having lots of open tabs can slow coding down. Checking social media or doing personal activities on the clock is strictly prohibited.
- When you take breaks, please use the Stop button in the coding window to stop the built-in timer. The pay is a robust $15/hour and we would like 60 minutes of coding for that.
- The system keeps track of your total coding time. If you walk away from your computer without hitting Stop, or take a few seconds to reply to email, tweet, or post to Facebook, it slows down your average time per item.
- When there are 200 or fewer items, please code them all, preferably in one sitting.
General Availability and Time Commitments
You are expected to be generally available to code as projects and their datasets become available, but we understand that you may have other commitments. At times you may see a flurry of activity with lots of coding assignments and emails going back and forth; at other times there may be a lull and nothing going on. This is normal, and depends on the needs of the customers.
Please begin coding a dataset when you receive an email announcement, which will usually have a subject such as "Dataset: Test Set v1". Usually these assignments are time-sensitive and we need to return results to the customer. However, if you are not available don't fret - there may be several coders working simultaneously, and there may be other datasets to code at another time.
Submit your time using a timesheet (a Google Form) that will be provided for this purpose.
- You will need to enter your email address, the date, the number of minutes, and select from a list the type of task that you performed.
- Enter your time into the timesheet every day – do not wait until the last day of the month.
- All timesheets must be submitted by the last day of the month.
- Payments will be issued via PayPal.
- Tasks must be reported in minutes.
- Do not pad your hours; accurately report the number of minutes on a task.