About me
I am a linguist and researcher specializing in the design and evaluation of language resources and technologies, with a focus on their application in corpus linguistic research.
I currently split my time between Department of Slovene Studies at the Faculty of Arts, University of Ljubljana, and Department for Artificial Intelligence at the Jožef Stefan Institute.
For more info, see my full CV here, or view my profiles on Google Scholar and ResearchGate.
Research topics
- Corpus annotation
- Language variation
- Formulaic language
- Dependency treebanks
- Discourse connectives
- Language technology
Current projects
Selected past projects
Publications
For a full list, please see the SICRIS database.
Recent news
- November 2024: Err, well … We’ve just released a bigger, better, and more polished version of the SST UD treebank, to be used in linguistic and NLP research on Slovenian speech. Embedded in ROG, it also features prosody, disfluency and dialogue act annotations.
- October 2024: Excited to announce that SyntaxFest 2025 will take place in Ljubljana, Slovenia, in August 2025! This biennial conference series brings together five workshops—TLT, UDW, DepLing, IWPT, and Quasy—and will be co-located with the 1st UniDive Shared Task on Morphosyntactic Parsing.
- July 2024: Release of STARK v3 – a significantly enhanced version of this versatile tool for bottom-up linguistic analysis and comparison of UD treebanks.
News archive
October 2023: Honoured to give an invited talk on 'Cross-lingually Harmonized Approaches to Spoken Data Annotation' at SPELLL 2023.
July 2023: Join us at ESSLLI 2023, the European Summer School in Logic, Language, and Information, hosted by the University of Ljubljana, where I'll be serving as the Local PC Chair.
October 2022: Very excited to learn that my postdoctoral project proposal 'A Treebank-Driven Approach to the Study of Spoken Slovenian' has been selected for funding.
September 2022: Kick-off meeting of the UniDive COST Action on universality, diversity, and idiosyncrasy in language technology. I am honoured to have been elected as a co-leader of the WG1 on Corpus Annotation.
May 2022: Looking forward to the LREC 2022 in Marseille where I will be presenting a paper on spoken language treebanks (main conference) and a paper on the SSJ treebank extension (LAW workshop).
March 2022: I was invited as a speaker at the ESFRI 20th anniversary conference to present the CLARIN infrastructure and its impact on my research work. The presentation was also featured as a CLARIN Impact Story.
October 2021: Kick-off meeting for project SLED: Monitor Corpus for Slovene and Related Language Resources.
July 2021: Launch of the DSDE Universal Dependencies annotation campaign aiming at 5,000 new manually parsed sentences for Slovenian.
April 2021: I co-organized the EACL 2021 Language Diversity Games as part of the Language Diversity Panel and Games event at EACL 2021.
March 2021: I joined the Development of Slovene in a Digital Environment project to work on SSJ UD treebank extension, CLASSLA-Stanza pipeline evaluation and GOS spoken corpus concordancer.