Date of Award
2019
Document Type
Thesis
Degree Name
Bachelors
Department
Natural Sciences
First Advisor
Doucette, John
Area of Concentration
Computer Science
Abstract
I extract the lexical representation of target words related to the concepts of gender and sensuality in Tolstoy utilizing the word2vec algorithm. The word2vec algorithm takes as input a corpus and outputs a vector space where the vectorized words that are closest in the space are most similar. This similarity is defined through calculating the cosine similarity between vectors in the space. The corpus consists of Anna Karenina and The Kreutzer Sonata, two works that address the woman question in nineteenth-century Russia, and is retrieved from Project Gutenberg. I provide visualizations of the words most similar to target words within the two novels, as well as a list of beginning literary analyses for each target word’s results.
Recommended Citation
Schwarz, Emma, "AUTOMATED TEXT EXTRACTION OF ANNA KARENINA AND THE KREUTZER SONATA: QUANTIFYING THE LEXICAL REPRESENTATION OF WOMEN IN TOLSTOY" (2019). Theses & ETDs. 5799.
https://digitalcommons.ncf.edu/theses_etds/5799