My thesis in Data Science for my Master in Science in Engineering degree is on the topic of Persona-based language generation. More specifically, I am using a dataset collected on Reddit which contains a myriad of attributes associated.
By scraping the subreddits "IAmA" and "#Iamnot" for positive and negative samples respectively on Reddit, users were able to be identified and classified as having certain attributes.
With the help of collecting a great number of users for each category, sample texts were then scraped and identified as having a certain attribute/feature.
For my thesis, I am planning for doing a style transfer, by using the above mentioned dataset as a light-weight classification model to force GPT-3 to generate text according to a predefined set of characteristics.
Furthermore, reseach papers such as "Prefix-Tuning: Optimizing Continuous Prompts for Generation " or "Plug and Play Language Models" by UberAI are also being considered as part of the final model.
The research is being conducted under the mentorship of PhD candidate Daphne Ippolito and Postdoc Lara Martin as well as the advisement of Professor Christopher Callison-Burch and my thesis supervisor Professor Clayton Greenberg.