I'm a first year PhD student at the Center for Language and Speech Processing at Johns Hopkins University, advised by Professor David Yarowsky. Before this, I worked at ALMAnaCH, INRIA in Paris, for a year, on investigating the behaviour of Transformer-based models on closely related dialects and languages. Even before that, I graduated from the EMLCT Masters' program as an Erasmus scholar, with a dual MSc. in Computational Linguistics at Charles University, Prague (first year), and Language and Science Technologies at Saarland University, Germany (second year). I'm interested in understanding how we can use cool things about linguistic structures and neural networks to build NLP tools that are available for a diverse sets of languages in their dialectical, colloquiall, and code-switched variants :)
Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, and Rachel Bawden. 2023. Cross-Lingual Strategies for Low-Resource Language Modeling: A Study on Five Indic Dialects. In Proceedings of the 18th Conference on Traitement Automatique des Language Naturelles. Paris, France. TALN.
Niyati Bafna, Josef van Genabith, Cristina España-Bonet, and Zdeněk Žabokrtský. 2022. Combining Noisy Semantic Signals with Orthographic Cues: Cognate Induction for the Indic Dialect Continuum. In Proceedings of the 26th Conference on Computational Natural Language Learning, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Niyati Bafna and Zdeněk Žabokrtský. 2022. Subword-based Cross-lingual Transfer of Embeddings from Hindi to Marathi and Nepali. In Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 61–71, Seattle, Washington. Association for Computational Linguistics.
Kartik Sharma, Niyati Bafna, and Samar Husain. 2021. Clause Final Verb Prediction in Hindi: Evidence for Noisy Channel Model of Communication. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 160–170, Online. Association for Computational Linguistics.
Zdeněk Žabokrtský, Niyati Bafna, Jan Bodnár, Lukáš Kyjánek, Emil Svoboda, Magda Ševčíková, and Jonáš Vidra. 2022. Towards Universal Segmentations: UniSegments 1.0. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1137–1149, Marseille, France. European Language Resources Association.
Niyati Bafna, Martin Vastlik, and Ondrej Bojar. 2021. Constrained Decoding for Technical Term Retention in English-Hindi MT. In Proceedings of the 18th International Conference on Natural Language Processing, pages 1–6, National Institute of Technology Silchar, India. NLP Association of India.
Niyati Bafna and Dipti Sharma. 2019. Towards Handling Verb Phrase Ellipsis in English-Hindi Machine Translation. In Proceedings of the 16th International Conference on Natural Language Processing, pages 150–159, International Institute of Information Technology, Hyderabad, India. NLP Association of India.
June 2023 Presented my work, called Cross-Lingual Strategies for Low-Resource Language Modeling: A Study on Five Indic Dialects, at TALN 2023, Paris, France. It's about comparing basic multilingual strategies for language modelling, for truly low-resource languages that belong to the same dialect continuum or language family.
April 2023 Accepted my PhD offer at JHU CSLP, and will be starting in Fall 2023! I'll be advised by Professor David Yarowsky.
Oct 2022 Invited talk at Linguistic Mondays, Institute of Formal and Applied Linguistics, Charles University, on two of my recent works in experiments on Indic languages: Empirical Models for an Indic Dialect Continuum
Oct 2022 New paper accepted at CoNLL 2022! This paper was adapted from my M.Sc. thesis; it is about data collection for 26 dialects and languages of the Indic language continuum, along with strategies for cognate induction for these languages as a step towards building bilingual resources for (extremely) low resouce languages.
Oct 2022 Starting as a research engineer at ALMAnaCH, INRIA in Paris, with Benoît Sagot and Rachel Bawden; super excited :)
Aug 2022 Defended my thesis (twice), graduated from Charles University and Saarland University! I did my thesis jointly with the MLT group at DFKI and UFAL. I was supervised by Prof. Josef van Genabith and Cristina España-Bonet from the former and Zdeněk Žabokrtský from the latter. The thesis is about cognate induction and data collection for 26 (extremely) low resourced languages of the Indic dialect continuum; check it out here: Empirical Models for an Indic Language Continuum!
Jul 2022 New paper at SIGMORPHON@NAACL '22, about subword level embeddings transfer from Hindi to Marathi and Nepali.
Apr 2022 New paper at LREC '22 (the UniSegements project) with UFAL, harmonizing different morphological resources for 17 languages. I worked on Hindi, Marathi, Malayalam, Tamil, and Bengali...
Dec 2021 New paper at ICON '21, NIT Silchar, about constrained deconding for technical terms in English-Hindi MT with UFAL.
Apr 2022 New paper at CMCL@NAACL '21 with Prof. Samar Husain at IIT Delhi, India, about computational modelling of cognitive hypotheses, specifically, the adaptability hypothesis and noisy channel hypothesis.
Oct 2020 Started at the EMLCT Masters' program with an Erasmus scholarship.
May 2020 Graduated with a bachelors' degree from Ashoka University :-)
Dec 2019 New paper at ICON 19, IIIT Hyderabad, with Prof. Dipti Misra Sharma at LTRC, about verb phrase ellipsis handling in English-Hindi MT.
Here's a PDF version of all of this stuff.
I'm interested in multilingual NLP, especially dialectal NLP, that is, I want to understand how dialects interact with each other and whether we can use that information to make NLP more equitable and accessible. I'm particularly motivated to work with the Indic dialect continuum, because it has 40+ dialectal variants and hundreds of millions of speakers, and near-non-existent NLP other than for a handful of state languages. I follow Arabic NLP, which often has to deal with similar problems as Indic NLP... But language family aside, I'm mostly excited about working with a bunch of languages in the mix that have mysterious and wonderful connections with each other, and figuring out how we can make progress in NLP for all of them.
Code-switching is my favourite linguistic phenomenon to talk about (and I do talk about it a lot). I'm interested in almost anything CS - understanding when humans code-switch (syntactically and semantically), MT for CS, controllable CS generation, rule-based artificial CS synthesis, and other stuff! I closely followed the work of MSR, India on Hindi-English code-switching.
I'm also interested in the linguistic interpretability of neural networks, especially using architectures that are inherently inducive to explanation and interpretability, from the perspective of making them more data efficient, and therefore more amenable to transfer. But also just because it would be cool to know what's going on!
When I'm not working, I enjoy salsa/bachata dancing, playing tennis, solving cryptic crosswords, writing, and watching terrible films in the language that I'm learning (currently, French). I'm also a (very) amateur guitar player and an (even more) amateur juggler.
Finally, I have a lifelong desire to sing acapella but have never actually tried it.