IRB and Research Ethics Board, HOW TO



Getting started


When embarking on a project to collect real-world recordings of child language/child language input, it is important to think about privacy and legal issues regarding data collection, storage, distribution and exploitation before the actual data collection starts. Some of these concerns, like being prepared for ensuring the usability of the recordings, transcripts and derivative data beyond the immediate project goals may seem like a downstream problem, and getting through ethics may seem like simply a hoop to jump through. However, it is important to take some time right from the outset to think about your goals and priorities for later data usage and access right from the beginning, so that your consent forms cover all of the future usage that you, your collaborators, and possibly the research community at large want to do with your data, while respecting the rights of those recorded. This will help to avoid the avoidable pitfalls and problems that may prevent you from being as open as possible with your data once the study is done. There is no one right or wrong answer to many of these issues, as much depends on policies within institutions, funding agencies and laws of individual countries, the nature of your data, your lab practices, and your priorities as a researcher. This document will help you consider these competing interests and come to an intentional decision about how your data are treated once you have finished collecting them, rather than reacting to decisions you never even realized you made. Please keep in mind that since every situation is different, it is up to the individual researcher to ensure that they are complying with the requirements of their granting agency and their institution as well as any applicable laws.

What are the ACTUAL requirements of your ethics board?

Getting a study through ethics is often a very involved process. It can be tempting as a researcher to use existing templates for consent forms, and/or quickly accede to all requests by the ethics board. While it is rarely a good idea to take an adversarial role when interacting with your ethics board, the needs of a language recording project are somewhat unique compared with the typical studies that psychology ethics boards encounter. It is important to make these unique needs clear, often that is all it takes to get a board to back down from an especially restrictive request.
It is a great deal of work to review the many researcher requests sent to the board, and the board may fall back on cookie-cutter practices to streamline this task, just as researchers rely on template consent forms rather than creating them from scratch. But these shortcuts may create problems for studies that are collecting child language recordings. For example, ethics boards may expect all data to be made completely anonymous prior to sharing with other researchers.
Although this is an easy process for many kinds of studies, and a simple way to ensure that participants' privacy is respected, it is usually an impractical request for child language recordings by today's standards/definitions of what constitutes anonymous, and not typically the actual standard of participant privacy required by the ethics regulations the board is working with. Sharing of confidential, non-anonymous data (or even in some cases non-confidential recordings) is possible, but only if the appropriate consents are obtained from the participants. It is therefore important to discuss the details of your data-sharing plan with the ethics board at the time of the original submission.

Regarding data collection


Who and where they are being recorded


If your recordings will involve more than just a single family (such as recording in a daycare or school), you will need to consider what kinds of consent you might need from the families in the daycare/school. Another important consideration may be ensuring that teachers do not feel compelled to participate by their employers or the parents. Even if the child is not systematically elsewhere, please consider that there may be legislations governing recording in public places in your country. If you live in the United States, consult the reporter's recording guide to determine whether such recordings are illegal in your state. In Canada, things are generally less strict. You are generally legally allowed to record any conversation that you are part of, however there are privacy laws and laws against intercepting a private communication that may impact the legality of recording third parties. It may be wise to consult your institution's legal affairs office to be sure you understand the laws in your area.

Regarding analyses


Who will have access to these recordings and how?


If you are simply going to rely on LENA counts, and you are never going to share your recordings, you can state that no human will listen to the recordings. This, however, would severely limit your options for data-sharing. If humans do listen to at least part of the recordings, you may want to state that they will be accessible (in whole or in part) by you and your collaborators (listed or unlisted), your students, and depending on your plans for long-term archiving, researchers at large. It is important to consider not just your current needs, but what you might want to do with the recordings going forward.

Quality control and the training of students


A system is only as strong as its weakest link. This is an especially important consideration for audio-recordings or transcripts which are being anonymized to share with other researchers. All it takes is one missed example of a participant's real name slipping through to make a breach of confidentiality. Once this information is out in the public domain, there isn't a way to truly bring it back. Therefore it is very important to develop lab policies and practices to ensure that students are appropriately trained and that there are checks on the system to ensure quality control of the anonymizing process.

What one listens for


What will you do if you hear something unusual? Will (and can) you follow up on signs that the child has a language delay? Some previous LENA researchers have openly stated that they will not report on potentially legal issues such as maltreatment, illegal activities, etc. However, be careful - in some cases, one may be required by law to report suspected child abuse.

Regarding local storage and archiving


Retaining contact information


It is a good idea to retain in some form the contact information for participants for as long as possible within the lab, as well as the connection between the participant's identity and the individual recordings/transcripts. Ethics boards may ask researchers to make data truly anonymous as quickly as possible, by severing even within-lab connections between participant contact information/identity and the data. This is the easiest way to ensure that participants' rights to privacy are protected. However, there is a danger in severing this link too early. In the case of recordings and transcripts, it is possible to end up in a situation where the data are only pseudo-anonymous--i.e., it is not easy or possible for the researcher to figure out the identity of the participant or contact, but neither is it clear that the data are truly considered anonymous. In this situation, the researcher may end up unable to contact the participant to verify with the participant that they are comfortable with the sharing of data, but the data may not be sufficiently anonymous to share without explicit consent. The best way to avoid this situation is to ensure that you have obtained all the consents you need right when the recordings are made. However, it is not always possible to predict to what uses you may wish to put the data, so keeping the contact information as long as possible is also advisable. Ethics boards will likely be receptive to retaining the link between contact information and data much longer than their standard practice if this need is clearly articulated.

Participants' right to withdraw


Some research ethics policies (e.g. the TCPS [1]) require researchers to remove data from a sample if requested by a participant, AT ANY POINT, including years later, if at all praticable. A plan should therefore be in place for how to accomplish removal of data if a participant revokes consent, and any situations where this would not be possible should be communicated in advance to the participant. Moreover, some countries have particular laws that govern children's rights in particular, such as the right to revoke consent when they come of age. In many countries, participants of any age can revoke their consent at any time, even years later, and researchers may be obliged to delete data, as long as it is not already in the public domain or 100% anonymous. Researchers working with vulnerable populations may want to take extra care that consent is freely given and that participants fully understand who will have access to their recordings and how. Carefully consider how data will be stored and who will have access to the original dataset. The safest way for data to be stored is on an encrypted, password-protected computer that is never connected to the internet, to which only current lab staff have access. However, even before considering data-sharing, there are a number of reasons that people other than lab staff may need access to your data. Among others, institutional staff may require access for quality control and to ensure compliance with ethical standards, and technicians may require access in order to troubleshoot technical problems with software (potentially from a remote location). Laws related to reporting child abuse may limit your legal ability to promise complete confidentiality. Given the enormous time and financial investment in collecting audio-recordings and making transcripts and annotated data files, it is important that all files are carefully backed up. It is recommended that at least one back-up version is stored offsite, as building fires and other location-specific calamities can and do happen. Offsite back-up storage may entail hard disk storage outside of the institution, storage over an internal internet connection on an institutional drive system, or storage in "the cloud". Each of these has their own considerations with respect to confidentiality and security concerns.

Regarding data-sharing


Carefully consider how you would like data to be shared beyond the lab Will you be quoting portions of the transcripts in publications or playing sections of audio in presentations to illustrate your findings? Will you be collaborating with colleagues outside of your lab who may need access to some or all of your data? Will you be sharing your files with a public or semi-public database like CHILDES, Open Science Framework, Databrary, HomeBank and/or SecureHomeBank? You will need to make decisions about much and how confidentially you will share your data (what level of access you will allow) in each of these contexts, and ensure that you have obtained explicit permission to do so. Be sure you are familiar with the policies of any target database(s) around submission of data. In determining whether to donate your data, and if so, at what level of security, you may want to consider factors like how vulnerable your population is, and whether knowledge that the recordings will be public might affect the naturalness of their behavior. These concerns must be weighed against the considerable benefit of making the recordings available to other researchers and compliance with increasing expectations of data access within the research community and by granting/publication agencies. At the time being, DARCLE members are planning two archives, HomeBank and SecureHomeBank. A description can be found at HomeBank.

How and when to anonymize data for data-sharing


There are two primary components to the process of "anonymizing" audio recordings and transcripts.
First, decisions must be made about what metadata are made available to the public. Obviously the more information you provide about the characteristics of your sample, the more generally useful your data will be to other researchers. However, as discussed above, the more meta-data you provide, the greater the chance that anonymity will be breached. Sometimes there are trade-offs between different types of information. For example, if you want to obscure the child's birth date, it is usually more important to preserve a child's exact age than the date of the recording. However, some researcher down the road may be interested in differences between recordings in the summer or winter, or between recordings during the week or on weekends, and for them, the date may be more informative than the exact age in days. It is therefore important to preserve this information (at least in-lab) wherever possible so that such alternate analyses might be made down the road, keeping in mind that publicly available data can be combined by third parties across different sources.

Second, identifying and/or embarrassing and/or sensitive segments are typically removed or replaced with alternate text (e.g. pseudonyms) from transcripts, and deleted from shared audiofiles. The extent of this process will be dependent on the specifics of your agreement with participants (see below). It is sometimes easiest to do this audio and transcript anonymizing at the time of transcription. However, as with meta-data, it is advisable wherever possible to preserve an unedited copy of both the audio file and transcript, as the content of these hidden segments can be of significant research interest. For example, if a researcher is interested in the relationship between a child's exposure to particular phonemes and the timing of their development in the child's own speech, the phonemes in their own first name constitute a significant part of that input. As with meta-data, allowing access to the true distribution of phonemes the child hears in the sample must be carefully weighed against the possibility of a breach of confidentiality of their identity.

Third, given the length of daylong recordings, you may consider having different criteria for different portions. For instance, you could insure that a short segment (15-120 minutes) is fully anonymized and this has been checked by a human; whereas the rest is not. Such a solution may be a compromise between the relative importance of your participants' privacy, your desire to data-share, and the resources you have to listen through and remove all sensitive information.Indeed, you could contribute the smaller dataset to HomeBank, and the full dataset to SecureHomeBank.

Note: Confidential versus anonymous data


The exact definitions of confidential versus anonymous data may vary from country to country, but the basic difference is that anonymous data does not contain any identifying information about the participant, while with confidential data, the participant's identity is not publicly shared, but it is still possible to connect the participant's identity with their data. Data may be confidential within a lab, but only shared as anonymous data. An example of this would be the case where the researcher publicly shares transcripts in which participant identities have been replaced by pseudonyms. These transcripts may be considered anonymous, however, the researcher may retain a spreadsheet key that connects the pseudonyms with their original names, making the data not anonymous within the lab. The distinction between confidential and anonymous data is important because anonymous data can typically be shared without getting permission from the participant. In the abstract, it is very easy to differentiate confidential from anonymous data. However, in practice the distinction can be less clear. In determining whether data are truly anonymous, it is important to consider not only what is being shared at a single point in time, but also what might be obtained from other sources and how data may be combined or analyzed to determine a participant's identity. For example, knowing an infant participant's exact age and gender is a typical piece of information that goes along with such transcripts. In and of themselves, these pieces of information may not breach anonymity. However, when combined with the date of recording, it is now possible to determine the exact date of birth, which may be considered identifying information. Information about the location of recording, ethnicity, health status, etc., further narrow the field of possibilities. Even the general content of the recording (aside from the use of names and other explicit identifiers which can be easily removed) may serve to provide information about identity. At a certain point, anonymous data are no longer anonymous.

Obtaining consent from participants


Once all the above considerations have been pondered, it is time to set up a process for obtaining consent from your participants. Some recommendations:

And one last tip


Avoid unnecessarily restrictive or specific wording in consent forms. Ethics boards, for good reason, usually like researchers to be very specific about how data will be stored and handled, and who will have access. This level of specificity is sometimes mirrored in the consent form, particularly if templates are being used. However, the wording that you put in your ethics submission form is usually relatively easy to change with an amendment of some kind, while you are probably stuck with the wording in your consent form. While it is important to be clear to participants about how you will treat your data, you want to carefully consider not only what you are doing with their data now, but what you may want to do down the road as new opportunities emerge or as you learn about better practices. Be conservative in your statements about how data are treated to avoid committing to excessively rigid practices.

Useful sites:



Email the webmaster: mrk.vandam |at| gmail |dot| com
Visit us at DARCLE.ORG

Valid CSS!

Valid HTML 4.01 Transitional