LANGUAGE CLASSFICATION



Identiftying langauges

Jan Edward's question on behalf of R?ita Kaushanskaya. "Do you think there is any interest in using ASR to distinguish between Spanish and English input in Lena recordings? Would this be possible, given that Spanish and English have different syllable structures and prosody? Now, researchers who are interested in children's code-switching environments rely on parent questionnaires, but Rita would love to have a more objective measure." We can extend this to a more general question (French-English, etc.?) - Melanie

There are methods for language classification that are based on speaker diarization. Therefore, ideally you need to "register" the talkers and languages so that the algorithm can find them, but it can't deduce them in the first place. That would mean coding a portion of your dataset. One may wonder if you can create a 'fake' dataset to bootstrap the process, and this could be a possibility, not sure if people have checked to see how well it works once the algorithm is asked to generalize to realistic recordings.

Emmanuel's group is indeed interested in this, so people who have datasets are invited to contact them.

This is a good wiki page opportunity: page of existing articles, algorithms, data snippets--we'll put them on the list for creation. A small subgroup is formed to discuss this in more detail, including: Chris F, Anne W, Emmanuel D, Gina P, ... and we'll send out an email in case anyone else is interested.

Email the webmaster: mrk.vandam |at| gmail |dot| com
Visit us at DARCLE.ORG

Valid CSS!

Valid HTML 4.01 Transitional