8 lines
791 B
TeX
8 lines
791 B
TeX
\section*{Introduction}
|
|
The goal of this assignment was to create a machine learning model able to assign a user to a GitHub issue.
|
|
The very first step towards this goal was to scrape from the VSCode GitHub repository the past issue.
|
|
These issues will be used to train the machine learning model (a deep neural network called BERT).
|
|
The next logical step was to perform cleaning on the raw scraped data.
|
|
We noticed that some of the parts of the issue body or title introduced noise that could negatively affect the training process.
|
|
For this reason, the data was cleaned before being fed to BERT\@.
|
|
Finally, a pre-trained (on english documents) base model of BERT was trained using our cleaned data, and returns a ranking of the top 5 most probable user to be assigned to the queried issue.
|