Introduction
Annotations can be done for a variety of purposes, and therefore the precise requirements you may want to meet will vary greatly. For example, if you are interested in assessing the reliability of the adult word count, you might extract and annotate solely the segments that have been thus defined -- meaning that you won't mark the onset and offset of phrases or pauses, a change of speaker if one is found within this allotted, and none of the above for all the rest of the segments. Or perhaps you are interested solely on the onset/offset accuracy, and thus you'll want to be able to look at the waveform (and listen to the signal) and be able to easily mark onset and offset, without caring about the content of what is spoken.
As a result, there is no single best recipe for annotations - it will depend on your goals. The following notes may help you make a decision regarding which system, or which hybrid, is the best for you.
NOTE: At present, this wiki contains the thoughts of only one person, who has the hope of annotating to capture:
- the precise content of what was said in a CHAT-friendly format
- the speaker ID (idem)
- the presence and location of both pauses and voiced hesitations (e.g., "ehem", "hum", ...)
The following is the list of systems that members of our network have used. Further information on each can be found in the subpages.
- TRS: Has mostly been used by LENA. Our information comes from the LENA instructions. Melanie Soderstrom is also using this.
- Praat: Has been used by the team at Radboud University. (Paula Fikkert)
- CLAN: Has been used by Elika Bergelson at U of Rochester. Here is the process her group follows to import the LENA files:, and then annotate them in CLAN.
- Datavyu: Has been used by Elika Bergelson at U of Rochester for *video* annotation
- MATLAB: Has been used by the team at Washington State Univeristy (Mark VanDam)
- ELAN: Has been used by the team at UC Merced. (Anne Warlaumont). Here are some of her annotation instructions
The goal is to produce annotation that is comparable, even if it was created with different software (see above). We intend to follow the following ground rules.
Files included
Basic filesOnly one file is always required, and this is the meta-data file following [these conventions]
For LENA recordings, the .its is always available. It can be modified (e.g., the DOB and date), but in that case this should be noted in the metadata. Additionally, some researchers record multiple children with the same DLP. In this case, the .its should be noted as unreliable in the metadata.
For non-LENA recordings, a pseudo-.its can be generated.
Sound files can take a variety of forms.
Sections that need to be excised (to anonymize, etc.) are silenced, thus conserving the integrity of the timestamp [revisit after discussion]
Annotation files Additionally, there are optional files to be made available.
- Beginning and end of turns are determined (1) using the LENA automatic segmentation; (2) using another a human or automatic system that uses the same criteria; (3) using the LENA segmentation followed by human correction (by moving boundaries and/or deleting/adding boundaries); (4) following CHAT conventions; (5) following other conventions. In all of these cases, this is noted in the metadata.
- More work is needed at this point.
- If the boundary is not correct, then the transcriber can tag the utterance as being incorrectly segmented.
- The automatic LENA labels for speakers are contained in the utterance. The transcriber can add a code for their own classification of the speaker, but not remove the LENA one.
- The contents of a turn follow the latest CHILDES conventions, with the only exception that free use can be made of (.) to indicate utterance-internal pauses.