WORKSHOP PROGRAM AND ACCEPTED PAPERS
Workshop Program
9:00–10:50 Morning Session: New Datasets
Chair: Mariana Romanyshyn
9:00–9:10 | Opening Remarks |
9:10–9:55 | Keynote Speech: Mona Diab |
9:55–10:15 | Silver Data for Coreference Resolution in Ukrainian: Translation, Alignment, and Projection
Pavlo Kuchmiichuk |
10:15–10:35 | The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s
Olha Kanishcheva, Tetiana Kovalova, Maria Shvedova and Ruprecht von Waldenfels |
10:35–10:50 | Creating a POS Gold Standard Corpus of Modern Ukrainian
Vasyl Starko and Andriy Rysin |
10:50–11:20 Morning Break
11:20–12:55 Morning Session: New Directions
Chair: Oleksii Ignatenko
11:20–12:05 | Keynote Speech: Gulnara Muratova |
12:05–12:20 | The Evolution of Pro-Kremlin Propaganda From a Machine Learning and Linguistics Perspective
Veronika Solopova, Christoph Benzmüller and Tim Landgraf |
12:20–12:40 | Extension Multi30K: Multimodal Dataset for Integrated Vision and Language Research in Ukrainian
Nataliia Saichyshyna, Daniil Maksymenko, Oleksii Turuta, Andriy Yerokhin, Andrii Babii and Olena Turuta |
12:40–12:55 | Exploring Word Sense Distribution in Ukrainian with a Semantic Vector Space Model
Nataliia Cheilytko and Ruprecht von Waldenfels |
12:55–14:25 Lunch
14:25–16:00 Afternoon Session: Shared Task
Chair: Mariana Romanyshyn
14:25–14:40 | UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Oleksiy Syvokon, Olena Nahorna, Pavlo Kuchmiichuk and Nastasiia Osidach |
14:40–14:55 | The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian
Oleksiy Syvokon and Mariana Romanyshyn |
14:55–15:15 | Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction
Maksym Bondarenko, Artem Yushko, Andrii Shportko and Andrii Fedorych |
15:15–15:30 | A Low-Resource Approach to the Grammatical Error Correction of Ukrainian
Frank Palma Gomez, Alla Rozovskaya and Dan Roth |
15:30–15:50 | RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans
Bohdan Didenko and Andrii Sameliuk |
15:50–16:00 | Best Paper and Thank You |
16:00–16:30 Afternoon Break
16:30–18:10 Afternoon Session: UberText
Chair: Oleksii Ignatenko
16:30–16:50 | Introducing UberText 2.0: A Corpus of Modern Ukrainian at Scale
Dmytro Chaplynskyi |
16:50–17:05 | GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian
Volodymyr Kyrylov and Dmytro Chaplynskyi |
17:05–17:25 | Learning Word Embeddings for Ukrainian: A Comparative Study of FastText Hyperparameters
Nataliia Romanyshyn, Dmytro Chaplynskyi and Kyrylo Zakharov |
17:25–17:45 | Contextual Embeddings for Ukrainian: A Large Language Model Approach to Word Sense Disambiguation
Yurii Laba, Volodymyr Mudryi, Dmytro Chaplynskyi, Mariana Romanyshyn and Oles Dobosevych |
17:45–18:00 | Abstractive Summarization for the Ukrainian Language: Multi-Task Learning with Hromadske.ua News Dataset
Svitlana Galeshchuk |
18:00–18:10 | Closing Words |
Keynote Speakers
Mona Diab, Lead Responsible AI Research Scientist with Meta, Professor of Computer Science at the George Washington University (on leave)
Topic: The Сriticality of Low-Resource Language Research for NLP Future Sustainability
In this keynote, I will highlight the necessity of low-resource language research. I will argue for the critical need for research and productivization of low-resource language technology as a way for ensuring diversity and inclusion but also the future sustainability of NLP at large. I will showcase some of our work in Arabic dialects and also work by various groups on creating resources and technologies for “digitally underprivileged” languages.
Gulnara Muratova, NGO “QIRI’M Young”, co-coordinator of the National Corpus of the Crimean Tatar Language
Topic: National Corpus of the Crimean Tatar Language
Crimean Tatar is the language of the indigenous people of Ukraine that is currently listed by UNESCO as one of the severely endangered languages. Nowadays, additional danger to the language is posed by the Russian Federation’s temporary occupation of the Autonomous Republic of Crimea, where most of the speakers of Crimean Tatar live. To preserve Crimean Tatar, create a fundamental online platform for linguistic research, and integrate it with various digital platforms, a team of enthusiasts began the development of the first National Corpus of the Crimean Tatar language. Limited or no access to the crucial printed sources, four different graphic systems, no tools for processing Crimean Tatar — these are only a few of the challenges that the team faced. Learn more at UNLP!
The project was initiated by the Ministry of Reintegration of the Temporarily Occupied Territories of Ukraine within the Strategy for the Development of the Crimean Tatar Language for 2022-2032 and is implemented with the support of the Swiss-Ukrainian EGAP Program by the Eastern Europe Foundation.