The Corpus: EuroCoAT (the European Corpus of Academic Talk)
EuroCoAT NOW AVAILABLE ONLINE!!
The research team is happy to announce that interested researchers may now access the 58,834-word corpus of office hours’ consultations carried out in English at five different universities in Europe. The transcripts may be downloaded as PDF, XML or txt. Contact us to register!
HOW TO CITE THE CORPUS (EuroCoAT: the European Corpus of Academic Talk)
Recommended citation for EuroCoAT:
MacArthur, F.; Alejo, R.; Piquer-Piriz, A.; Amador-Moreno, C.; Littlemore, J.; Ädel, A.; Krennmayr, T.; Vaughn, E. (2014) EuroCoAT. The European Corpus of Academic Talk. http://www.eurocoat.es.
GENERAL INFORMATION ABOUT THE CORPUS
Each of the 27 transcripts is accompanied by a document (“Contextual Information”). In each, the following information is provided:
1. Participants' background information (Lecturers & Students)
2. Topics covered
3. On-stage effect
4. Positioning of participants
5. General observations
The system of transcription used in the current project is based on the following systems developed by researchers at the Vienna-Oxford International Corpus of English (VOICE):
- VOICE Project. 2007. "Mark-up conventions". VOICE Transcription Conventions [2.1]. http://www.univie.ac.at/voice/documents/VOICE_mark-up_conventions_v2-1.pdf (date of last access 04/06/2013).
- VOICE Project. 2007. "Spelling conventions". VOICE Transcription Conventions [2.1]. http://www.univie.ac.at/voice/documents/VOICE_spelling_conventions_v2-1.pdf (date of last access 04/06/2013).
However, there are some differences between the transcription and annotation systems used in VOICE and in EuroCOAT. Further details are explained in the following document:
SPELLING AND MARK-UP CONVENTIONS
The specific details on the spelling and mark-up conventions used in the transcripts can be found by clicking on the following links (or at the bottom of the section):
- The transcripts of the office hours’ consultations are organized in four columns, which record (from left to right) (1) the speech entry number (which does not necessarily correspond to a turn at talk, as in VOICE); (2) speaker identification (either the initials of the participants’ pseudonyms or R for ‘researcher’); (3) transcribed speech (4) time stamp (every 30 seconds) and word/fragment of speech at which it occurs.