This repository has been archived on 2024-10-22. You can view files and clone it, but cannot push or open issues or pull requests.
soft-analytics-01/docs/sections/introduction.tex
Claudio Maggioni 07232eddcc Final version of the bug-triaging project
Commit history has been discarded to remove large files from the repo.
2024-01-03 15:22:56 +01:00

8 lines
791 B
TeX

\section*{Introduction}
The goal of this assignment was to create a machine learning model able to assign a user to a GitHub issue.
The very first step towards this goal was to scrape from the VSCode GitHub repository the past issue.
These issues will be used to train the machine learning model (a deep neural network called BERT).
The next logical step was to perform cleaning on the raw scraped data.
We noticed that some of the parts of the issue body or title introduced noise that could negatively affect the training process.
For this reason, the data was cleaned before being fed to BERT\@.
Finally, a pre-trained (on english documents) base model of BERT was trained using our cleaned data, and returns a ranking of the top 5 most probable user to be assigned to the queried issue.