PyData Global 2022

Urdu poems to Shakespearean English - Machine Translation
12-01, 20:30–21:00 (UTC), Talk Track I

All languages are rich in prose and poetry. A lot of the literature is inaccessible because of a lack of understanding of that language. It is often difficult to appreciate a simple translation of a poem due to gaps in cultural knowledge. A poem translated in the style of an author familiar to the reader might help to both add cultural context for the reader and capture the essence of the poem itself.


English is the dominant language of the web but with around 7000 languages worldwide and many more dialects, there is a need to explore literature in other languages. Rather than a simple machine translation or pivot machine translation which does not capture the nuance of the cultural context a language belongs to, is there a creative way to bridge the cultures? If you are multilingual and want there to be a space carved out for your spoken/written language, this talk will be worth your time.

I will focus on the data and methods I used but I will not go into the details of for example 'what transformers are?'. I will however explain terms like 'zero-shot translations', 'parallel corpus', etc. so the audience can follow along. My main aim is to give people an overview of what it would take for an idea like this to take off successfully and you can learn from the obstacles I am facing, to find representation for your language of interest. I hope my talk will inspire creative expression in others.

Some technical details:
The Urdu poetry to Shakespearean English translation is zero-shot, so Modern English was used as a pivot language. I fine-tuned a MarianMT model developed by Helsinki-NLP using Quran and Bible Urdu-English parallel corpus (BLEU score: 13.3). To convert Modern English into Shakespeare-Styled English, I fine-tuned a model based on a GPT2 and trained on Shakespeare’s plays to ”generate Shakespeare-like text.


Prior Knowledge Expected

No previous knowledge expected

Sidra Effendi is a Master's student at the University of Michigan, Ann Arbor in the School of Information. Her areas of interest are Information Retrieval, NLP, and data visualization. She has prior experience working as a Software Engineer and had her own startup which gave her exposure into UX and market research. At the University of Michigan, she developed a search engine funded by Microsoft and an NLP based digital curation assistant for UN ReliefWeb .