Brigitte Bigi
Publications Software Corpus

Corpus Grenelle II and its annotations

Created in 2010-2011 by Brigitte Bigi, Cristel Portes, Agnès Steuckardt, Marion Tellier.


Grenelle II annotations

  1. Enriched orthographic transcription (manual), time-aligned at the utterance level (automatic)
  2. Time-aligned phonemes, tokens and events (automatic)
  3. Time-aligned syllables (automatic)
  4. Prosodic contours and intonation (manual)
  5. Morpho-syntax time-aligned at the token level (automatic)
  6. Hand gestures
  7. Self-repetitions (semi-automatic)
  8. Interruptions (manual)


B. Bigi, C. Portes, A. Steuckardt, M. Tellier
Multimodal Annotations and Categorization for Political Debates,
ICMI Workshop on Multimodal Corpora for Machine learning (ICMI-MMC), Alicante (Espagne), 2011

B. Bigi, C. Portes, A. Steuckardt, M. Tellier
A Multimodal Study of Answers to Disruptions.,
Journal on Multimodal User Interfaces, Volume 7, Issue 1, Pages 55-66, Springer (Publisher). ISSN 1783-7677. DOI 10.1007/s12193-012-0110-zi, 2012

Corpus MARC-Fr

Created in 2011 by Brigitte Bigi and Pauline Péri

Description :

Corpus in French manually phonetized and time-aligned at the phoneme level. It's duration is 7 minutes (5400 phones), and made of 3 extracts of the following corpora: CID, AixOx and Grenelle.


Freely available for downloads:SLDR 000786


B. Bigi, P. Péri, R. Bertrand
Orthographic Transcription: Which Enrichment is required for Phonetization?,
Language Resources and Evaluation Conference, Istanbul (Turkey), pages 1756-1763, ISBN 978-2-9517408-7-7. 2012

Corpus AixOx

Read corpus created between 2010 and 2012 by Sophie Herment, Anastassia Loukina, Anne Tortel, Daniel Hirst, Brigitte Bigi


40 paragraphes of about 1 minute in French and English, from EUROM 1 corpus. French texts are read by French native speakers (mainly from Aix-en-Provence) and by English native speakers (from Oxford). English texts are read by native English speakers and by French native speakers.


Freely available for downloads:SLDR 000784


S. Herment, A. Loukina, A. Tortel, D. Hirst, B. Bigi
AixOx, a multi-layered learners corpus: automatic annotation
Proceedings of international conference on corpus linguistics, Jaèn (Spain), March 2012.

CID - Corpus of Conversational Data


Extracts of CID, just for demo

CID - Some of the annotations

Enriched orthographic transcription (manual), time-aligned at the IPU level (automatic)

Time-aligned phonemes and tokens and events like noises, laughter (automatic) and time-aligned syllables (automatic)

Prosodic contours (manual), Momel - Modelization of melody (automatic) and INternational Transcription System for INTonation (automatic)

Morpho-syntax and syntax time-aligned at the token level (automatic) and time-aligned lemmas (automatic);

Dysfluencies (manual), Discourse and interaction (manual), Other- and Self- Repetitions (semi-automatic)


P. Blache, R. Bertrand, B. Bigi, E. Bruno, E. Cela, R. Espesser, G. Ferré, M. Guardiola, D. Hirst, E.-P. Magro, J.-C. Martin, C. Meunier, M.-A. Morel, E. Murisasco, I Nesterenko, P. Nocera, B. Pallaud, L. Prévot, B. Priego-Valverde, J. Seinturier, N. Tan, M. Tellier, S. Rauzy
Multimodal Annotation of Conversational Data,
The Fourth Linguistic Annotation Workshop, ACL 2010, pages 186-191, Uppsala, Suède, 2010.