Program

Workshop Program and Accepted Papers

UNLP 2023 on YouTube

WORKSHOP PROGRAM AND ACCEPTED PAPERS

Workshop Program

9:00–10:50 Morning Session: New Datasets

Chair: Mariana Romanyshyn

9:00–9:10	Opening Remarks
9:10–9:55	Keynote Speech: Mona Diab
9:55–10:15	Silver Data for Coreference Resolution in Ukrainian: Translation, Alignment, and Projection Pavlo Kuchmiichuk
10:15–10:35	The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s Olha Kanishcheva, Tetiana Kovalova, Maria Shvedova and Ruprecht von Waldenfels
10:35–10:50	Creating a POS Gold Standard Corpus of Modern Ukrainian Vasyl Starko and Andriy Rysin

10:50–11:20 Morning Break

11:20–12:55 Morning Session: New Directions

Chair: Oleksii Ignatenko

11:20–12:05	Keynote Speech: Gulnara Muratova
12:05–12:20	The Evolution of Pro-Kremlin Propaganda From a Machine Learning and Linguistics Perspective Veronika Solopova, Christoph Benzmüller and Tim Landgraf
12:20–12:40	Extension Multi30K: Multimodal Dataset for Integrated Vision and Language Research in Ukrainian Nataliia Saichyshyna, Daniil Maksymenko, Oleksii Turuta, Andriy Yerokhin, Andrii Babii and Olena Turuta
12:40–12:55	Exploring Word Sense Distribution in Ukrainian with a Semantic Vector Space Model Nataliia Cheilytko and Ruprecht von Waldenfels

12:55–14:25 Lunch

14:25–16:00 Afternoon Session: Shared Task

Chair: Mariana Romanyshyn

14:25–14:40	UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language Oleksiy Syvokon, Olena Nahorna, Pavlo Kuchmiichuk and Nastasiia Osidach
14:40–14:55	The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian Oleksiy Syvokon and Mariana Romanyshyn
14:55–15:15	Comparative Study of Models Trained on Synthetic Data for Ukrainian Grammatical Error Correction Maksym Bondarenko, Artem Yushko, Andrii Shportko and Andrii Fedorych
15:15–15:30	A Low-Resource Approach to the Grammatical Error Correction of Ukrainian Frank Palma Gomez, Alla Rozovskaya and Dan Roth
15:30–15:50	RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans Bohdan Didenko and Andrii Sameliuk
15:50–16:00	Best Paper and Thank You

16:00–16:30 Afternoon Break

16:30–18:10 Afternoon Session: UberText

Chair: Oleksii Ignatenko

16:30–16:50	Introducing UberText 2.0: A Corpus of Modern Ukrainian at Scale Dmytro Chaplynskyi
16:50–17:05	GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian Volodymyr Kyrylov and Dmytro Chaplynskyi
17:05–17:25	Learning Word Embeddings for Ukrainian: A Comparative Study of FastText Hyperparameters Nataliia Romanyshyn, Dmytro Chaplynskyi and Kyrylo Zakharov
17:25–17:45	Contextual Embeddings for Ukrainian: A Large Language Model Approach to Word Sense Disambiguation Yurii Laba, Volodymyr Mudryi, Dmytro Chaplynskyi, Mariana Romanyshyn and Oles Dobosevych
17:45–18:00	Abstractive Summarization for the Ukrainian Language: Multi-Task Learning with Hromadske.ua News Dataset Svitlana Galeshchuk
18:00–18:10	Closing Words

Keynote Speakers

Mona Diab, Lead Responsible AI Research Scientist with Meta, Professor of Computer Science at the George Washington University (on leave)

Topic: The Сriticality of Low-Resource Language Research for NLP Future Sustainability

In this keynote, I will highlight the necessity of low-resource language research. I will argue for the critical need for research and productivization of low-resource language technology as a way for ensuring diversity and inclusion but also the future sustainability of NLP at large. I will showcase some of our work in Arabic dialects and also work by various groups on creating resources and technologies for “digitally underprivileged” languages.

Gulnara Muratova, NGO “QIRI’M Young”, co-coordinator of the National Corpus of the Crimean Tatar Language

Topic: National Corpus of the Crimean Tatar Language

Crimean Tatar is the language of the indigenous people of Ukraine that is currently listed by UNESCO as one of the severely endangered languages. Nowadays, additional danger to the language is posed by the Russian Federation’s temporary occupation of the Autonomous Republic of Crimea, where most of the speakers of Crimean Tatar live. To preserve Crimean Tatar, create a fundamental online platform for linguistic research, and integrate it with various digital platforms, a team of enthusiasts began the development of the first National Corpus of the Crimean Tatar language. Limited or no access to the crucial printed sources, four different graphic systems, no tools for processing Crimean Tatar — these are only a few of the challenges that the team faced. Learn more at UNLP!

The project was initiated by the Ministry of Reintegration of the Temporarily Occupied Territories of Ukraine within the Strategy for the Development of the Crimean Tatar Language for 2022-2032 and is implemented with the support of the Swiss-Ukrainian EGAP Program by the Eastern Europe Foundation.