Senior Researcher, Department of Applied Linguistics
employed at the Institute since 2022
Tatiana Shavrina is a specialist in the fields of applied linguistics, computational linguistics, multimodality, language modeling, and natural language processing. She has been working in the applied linguistics sector at the Institute of Linguistics of the Russian Academy of Sciences since 2022. She is the author of over 50 academic publications and also the editor of more than ten collections of research papers and conference proceedings in computational linguistics and intelligent technologies. She has organized 15 contests and hackathons in the field of natural language processing.
From 2019 to 2022, Tatiana pursued a PhD degree at the Linguistics School of the Higher School of Economics. In 2022, I successfully defended my PhD thesis titled "Linguistic Interpretation and Evaluation of Russian Word Vector Models" under the supervision of Professor O.N. Lyashevskaya in the field of Applied and Mathematical Linguistics (specialty 10.02.21).
Tatiana is a developer of language model evaluation systems for the Russian language, including Russian SuperGLUE. Tatiana has evaluated over 2000 language models for the Russian language. Additionally, Tatiana is a developer of an open-source machine learning corpus "Taiga".
Over the past 10 years, Tatiana has been dedicated to popularizing computer linguistics, language, and intelligence modeling through language models and artificial intelligence.
Scientific Profiles
Science Index
Selected publications
(List of publications from 2010 onwards)
Scientific publications in journals indexed in Russian and international citation systems
- Fenogenova, Alena; Tikhonova, Maria; Mikhailov, Vladislav; Shavrina, Tatiana; Emelyanov, Anton; Shevelev, Denis; Kukushkin, Alexandr; Malykh, Valentin; Artemova, Ekaterina; Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models, Proceedings of the Annual International Conference “Dialogue”. 2022
- Tikhonova, Maria; Mikhailov, Vladislav; Pisarevskaya, Dina; Malykh, Valentin; Shavrina, Tatiana; Ad astra or astray: Exploring linguistic knowledge of multilingual BERT through NLI task, Natural Language Engineering, 1-30, 2022 Cambridge University Press
- Voloshina, Ekaterina; Serikov, Oleg; Shavrina, Tatiana; Is neural language acquisition similar to natural? A chronological probing study, Proceedings of the Annual International Conference “Dialogue”. 2022
- Glushkova, Taisia; Machnev, Alexey; Fenogenova, Alena; Shavrina, Tatiana; Artemova, Ekaterina; Ignatov, Dmitry I; Danetqa: a yes/no question answering dataset for the Russian language, International Conference on Analysis of Images, Social Networks and Texts, 57-68, 2020, Springer
- Lyashevskaya O. N., Shavrina T. O., Trofimov I. V., Vlasova N. A.; GRAMEVAL 2020 Shared Task: Russian Full Morphology and Universal Dependencies Parsing, Proceedings of the Annual International Conference “Dialogue”. 553-569, 2020, RSUH
- Shavrina, Tatiana; Genre classification problem: in pursuit of systematics on a big webcorpus, Proceedings of Third Workshop "Computational linguistics and language science", 4, 70-83, 2019
- Shavrina, Tatiana Olegovna; Benko, Vladimír; Omnia Russica: even larger russian corpus, CORPORA-2019, 94-102, 2019,
- Smurov, IM; Ponomareva, Maria; Shavrina, TO; Droganova, Kira; Agrr-2019: Automatic gapping resolution for russian, Proceedings of the Annual International Conference “Dialogue”. 561-575, 2019,
- Shavrina, TO; Word vector models as an object of linguistic research, Proceedings of the Annual International Conference “Dialogue”. 576-588, 2019,
- Shavrina, Tatiana; Differential approach to webcorpus construction, Proceedings of the Annual International Conference “Dialogue”. 2018
- Shavrina, Tatiana; Genre Classification on Text-Internal Features: a Corpus Study, ARANEA 2018, 134, 2018
- Sorokin, Aleksei; Shavrina, Tatiana; Lyashevskaya, Olga; Bocharov, B; Alexeeva, Svetlana; Droganova, Kira; Fenogenova, Alena; Granovsky, Dmitry; MorphoRuEval-2017: an evaluation track for the automatic morphological analysis methods for Russian, Proceedings of the Annual International Conference “Dialogue”. 2017
- Sorokin, Alexey; Baytin, Alexey; Galinskaya, Irina; Shavrina, Tatiana; Spellrueval: the first competition on automatic spelling correction for Russian, Proceedings of the Annual International Conference “Dialogue”. 2016
- Selegey, D; Shavrina, T; Selegey, V; Sharoff, S; Automatic morphological tagging of Russian social media corpora: training and testing, Proceedings of the Annual International Conference “Dialogue”. 589-604, 2016,
- Sorokin, AA; Shavrina, TO; Automatic spelling correction for Russian social media texts, Proceedings of the Annual International Conference “Dialogue”. 688-701, 2016,
- Shavrina, T; Sorokin, A; Modeling advanced lemmatization for Russian language using TnT-Russian morphological parser, Proceedings of the Annual International Conference “Dialogue”. 2015
- Shavrina, Tatiana; Pisarevskaya, Dina; Malykh, Valentin; Building a Bilingual QA-system with ruGPT-3, International Conference on Analysis of Images, Social Networks and Texts, 124-136, 2022 Springer
- Lyashevskaya, Olga; Bocharov, Victor; Sorokin, Alexey; Shavrina, Tatiana; Granovsky, Dmitry; Alexeeva, Svetlana; Text collections for evaluation of Russian morphological taggers, Jazykovedny Casopis, 68, 2, 258-267, 2017, De Gruyter Poland
- Shavrina, Tatiana; Shapovalova, Olga; To the methodology of corpus construction for machine learning: «Taiga» syntax tree corpus and parser, CORPORA-2017, 78-84, 2017
Published conference papers
- Pisarevskaya, Dina; Shavrina, Tatiana; WikiOmnia: generative QA corpus on the whole Russian Wikipedia, Proceedings of the Gem Workshop, EMNLP 2022 2022
- Shliazhko, Oleh; Fenogenova, Alena; Tikhonova, Maria; Mikhailov, Vladislav; Kozlova, Anastasia; Shavrina, Tatiana; mGPT: Few-Shot Learners Go Multilingual, arXiv preprint arXiv:2204.07580, 2022
- Logacheva, Varvara; Dementieva, Daryna; Krotova, Irina; Fenogenova, Alena; Nikishina, Irina; Shavrina, Tatiana; Panchenko, Alexander; A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification, Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval), 90-101, 2022
- Shavrina, Tatiana; Mikhailov, Vladislav; Malykh, Valentin; Artemova, Ekaterina; Serikov, Oleg; Protasov, Vitaly; Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, 2022
- Shamardina, Tatiana; Mikhailov, Vladislav; Chernianskii, Daniil; Fenogenova, Alena; Saidov, Marat; Valeeva, Anastasiya; Shavrina, Tatiana; Smurov, Ivan; Tutubalina, Elena; Artemova, Ekaterina; Findings of The RuATD Shared Task 2022 on Artificial Text Detection in Russian, Proceedings of the Annual International Conference “Dialogue”. 2022
- Rofin, Mark; Mikhailov, Vladislav; Florinskiy, Mikhail; Kravchenko, Andrey; Tutubalina, Elena; Shavrina, Tatiana; Karabekyan, Daniel; Artemova, Ekaterina; Vote'n'Rank: Revision of Benchmarking with Social Choice Theory, arXiv preprint arXiv:2210.05769, 2022
- Chizhikova, Anastasia; Murzakhmetov, Sanzhar; Serikov, Oleg; Shavrina, Tatiana; Burtsev, Mikhail; Attention Understands Semantic Relations, Proceedings of the Thirteenth Language Resources and Evaluation Conference, 4040-4050, 2022
- Serikov, Oleg; Protasov, Vitaly; Voloshina, Ekaterina; Knyazkova, Viktoria; Shavrina, Tatiana; Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation, Proceedings of BlackBoxNLP workshop, EMNLP 2022 2022
- Taktasheva, Ekaterina; Shavrina, Tatiana; Fenogenova, Alena; Shevelev, Denis; Katricheva, Nadezhda; Tikhonova, Maria; Akhmetgareeva, Albina; Zinkevich, Oleg; Bashmakova, Anastasiia; Iordanskaia, Svetlana; TAPE: Assessing Few-shot Russian Language Understanding, Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2022) conference, findings, 2022
- Scao, Teven Le; Fan, Angela; Akiki, Christopher; Pavlick, Ellie; Ilić, Suzana; Hesslow, Daniel; Castagné, Roman; Luccioni, Alexandra Sasha; Yvon, François; Gallé, Matthias; BLOOM: A 176B-Parameter Open-Access Multilingual Language Model, arXiv preprint arXiv:2211.05100, 2022
- Malykh, Valentin; Kukushkin, Alexander; Artemova, Ekaterina; Mikhailov, Vladislav; Tikhonova, Maria; Shavrina, Tatiana; MOROCCO: Model Resource Comparison Framework, arXiv preprint arXiv:2104.14314, 2021
- Shavrina, Tatiana; Malykh, Valentin; How not to Lie with a Benchmark: Rearranging NLP Learderboards, Proceedings of the ICBINB workshop, NeurIPS conference, 2021
- Shavrina, Tatiana; Emelyanov, Anton; Fenogenova, Alena; Fomin, Vadim; Mikhailov, Vladislav; Evlampiev, Andrey; Malykh, Valentin; Larin, Vladimir; Natekin, Alex; Vatulin, Aleksandr; Humans Keep It One Hundred: an Overview of AI Journey, Proceedings of the 12th Language Resources and Evaluation Conference, 2276-2284, 2020,
- Shavrina, Tatiana; Fenogenova, Alena; Emelyanov, Anton; Shevelev, Denis; Artemova, Ekaterina; Malykh, Valentin; Mikhailov, Vladislav; Tikhonova, Maria; Chertok, Andrey; Evlampiev, Andrey; RussianSuperGLUE: A Russian language understanding evaluation benchmark, Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2020) conference, 2020,
- Mikhailov, Vladislav; Shavrina, Tatiana; Domain-Transferable Method for Named Entity Recognition Task, arXiv preprint arXiv:2011.12170, 2020
- Ponomareva, Maria; Droganova, Kira; Smurov, Ivan; Shavrina, Tatiana; AGRR-2019: A Corpus for Gapping Resolution in Russian, Proceedings of the The 7th Workshop on Balto-Slavic Natural Language Processing, ACL conference, 2019
Popular Science
- Tatiana Shavrina: “ScienceVideoLab” – can AI draw a painting? Online lecture on AI
- Tatiana Shavrina "Debates 'Convince a Skeptic', Anthropogenesis portal: 'Is it impossible to create an Artificial Intelligence system equal to human intelligence?'"
- Experience in participation in grants (2010 - present)
- Grant "Computer-linguistic platform of a new generation for digital documentation of the Russian language: infrastructure, resources, scientific research" No 075-15-2020-793, 2020–2022, performer
Teaching
Tatiana taught 2 courses for Masters in HSE University:
- “Machine Learning” course for Computational linguistics MA programme
- “Text Analysis. Generative Models” course for “Financial Technologies and Data Analysis” MA programme
Guidance for doctoral students, graduate students and applicants
- Tikhonova M.I. – PhD student 2018–2022. 05.13.18 Mathematical modeling, numerical methods and software packages
Cooperation with scientific foundations
- Conference Organizing Committee, “Dialogue” 2016-2023
- Conference Organizing Committee, AGI 2022: section Interpretable NLP (INLP)
- Workshop organizer, ACL 2022: NLP Power! The First Workshop on Efficient Benchmarking in NLP
- Workshop organizer, COLING 2022: Field matters | The first workshop on NLP applications to field linguistics
- Tutorial Lead Speaker: INLG 2022: Tutorial speaker - Artificial Text Detection task
- Reviewer:
- Dialogue conference 2017-onwards
- ACL 2023 conference
- Workshop BlackBoxNLP 2022
- Conference COLING 2022
- Conference NeurIPS datasets and benchmarks special track 2022
- Conference Dialogue 2016-2023
- AINL Conference 2021
- AIST 2020 conference