Kaja Dobrovoljc

Logo

Linguist interested in digital language resources, corpus linguistics and language technology.

Contact:
kaja.dobrovoljc@ijs.si
kaja.dobrovoljc@ff.uni-lj.si

How to pronounce:
/ka:ja dɔbrɔˈvoːlts/

About me

I am a linguist and researcher specializing in the design and evaluation of language resources and technologies, with a focus on their application in corpus linguistic research.

I currently split my time between Department of Slovene Studies at the Faculty of Arts, University of Ljubljana, and Department for Artificial Intelligence at the Jožef Stefan Institute.

For more info, see my full CV here, or view my profiles on Google Scholar and ResearchGate.


Research topics

Current projects

Selected past projects

Publications

For a full list, please see the SICRIS database.


Recent news

News archive

  • October 2023: Honoured to give an invited talk on 'Cross-lingually Harmonized Approaches to Spoken Data Annotation' at SPELLL 2023.
  • July 2023: Join us at ESSLLI 2023, the European Summer School in Logic, Language, and Information, hosted by the University of Ljubljana, where I'll be serving as the Local PC Chair.
  • October 2022: Very excited to learn that my postdoctoral project proposal 'A Treebank-Driven Approach to the Study of Spoken Slovenian' has been selected for funding.
  • September 2022: Kick-off meeting of the UniDive COST Action on universality, diversity, and idiosyncrasy in language technology. I am honoured to have been elected as a co-leader of the WG1 on Corpus Annotation.
  • May 2022: Looking forward to the LREC 2022 in Marseille where I will be presenting a paper on spoken language treebanks (main conference) and a paper on the SSJ treebank extension (LAW workshop).
  • March 2022: I was invited as a speaker at the ESFRI 20th anniversary conference to present the CLARIN infrastructure and its impact on my research work. The presentation was also featured as a CLARIN Impact Story.
  • October 2021: Kick-off meeting for project SLED: Monitor Corpus for Slovene and Related Language Resources.
  • July 2021: Launch of the DSDE Universal Dependencies annotation campaign aiming at 5,000 new manually parsed sentences for Slovenian.
  • April 2021: I co-organized the EACL 2021 Language Diversity Games as part of the Language Diversity Panel and Games event at EACL 2021.
  • March 2021: I joined the Development of Slovene in a Digital Environment project to work on SSJ UD treebank extension, CLASSLA-Stanza pipeline evaluation and GOS spoken corpus concordancer.