This repository has been archived on 2024-10-22. You can view files and clone it, but cannot push or open issues or pull requests.
soft-analytics-01/docs/sections/introduction.tex

9 lines
791 B
TeX
Raw Normal View History

\section*{Introduction}
The goal of this assignment was to create a machine learning model able to assign a user to a GitHub issue.
The very first step towards this goal was to scrape from the VSCode GitHub repository the past issue.
These issues will be used to train the machine learning model (a deep neural network called BERT).
The next logical step was to perform cleaning on the raw scraped data.
We noticed that some of the parts of the issue body or title introduced noise that could negatively affect the training process.
For this reason, the data was cleaned before being fed to BERT\@.
Finally, a pre-trained (on english documents) base model of BERT was trained using our cleaned data, and returns a ranking of the top 5 most probable user to be assigned to the queried issue.