Date of Award

2022

Document Type

Thesis

Degree Name

Bachelors

Department

Natural Sciences

First Advisor

Gillman, David

Area of Concentration

Computer Science

Abstract

This thesis explores the field of Text Style Transfer using Natural Language Process- ing techniques. This task falls into the larger topic of Text Attribute Transfer, a field also encompassing closely-related tasks such as Machine Translation and Paraphrase Generation. Where style transfer differs from these existing tasks is that it focuses on trying to capture the more fine-grained aspects of writing such as diction and semantic syntax. The goal of this thesis is to explore some of the fundamental topics in Natu- ral Language Processing necessary for the understanding of Attribute Transfer tasks. These topics include text normalization techniques, tokenization and word embedding techniques such as Word2Vec and FastText, and a beginning discussion of Machine Translation using Sequence-to-Sequence modelling with an attention mechanism. Fol- lowing this, this paper will briefly discuss modern techniques for monolingual-corpus Style Transfer such as Style-Content Disentanglement and Pseudo-parallel Corpus Construction. We will attempt to apply techniques for parallel-corpus Style Trans- fer to a dataset of Shakespearean texts rewritten in Modern English. Metrics for evaluating the success of these modeling techniques will also be discussed, with a demonstration being performed on the outcome of our parallel-corpus experiment and a discussion on some of the shortcomings with existing evaluation metrics.

Share

COinS