88 lines
2.9 KiB
Markdown
88 lines
2.9 KiB
Markdown
|
# Assignment 2: If statements
|
||
|
|
||
|
**Group 2: Baris Aksakal, Edoardo Riggio, Claudio Maggioni**
|
||
|
|
||
|
## Repository Structure
|
||
|
|
||
|
- `/dataset`: code and data related to scraping repository from GitHub;
|
||
|
- `/models`
|
||
|
- `/baris`: code and persisted model of the original architecture built by
|
||
|
Baris. `model_0.1.ipynb` and `test_model.ipynb` are respectively an
|
||
|
earlier and later iteration of the code used to train this model;
|
||
|
- `/final`: persisted model for the final architecture with training and
|
||
|
test evaluation statistics;
|
||
|
- `/test_outputs.csv`: CSV deliverable for the test set evaluation on
|
||
|
the test set we extracted;
|
||
|
- `/test_usi_outputs.csv`: CSV deliverable for the test set evaluation
|
||
|
on the provided test set.
|
||
|
- `/test`: unit tests for the model training scripts;
|
||
|
- `/train`: dependencies of the main model training script;
|
||
|
- `/train_model.py`: main model training script;
|
||
|
- `/plot_acc.py`: accuracy statistics plotting script.
|
||
|
|
||
|
## Environment Setup
|
||
|
|
||
|
In order to execute both the scraping and training scripts, Python 3.10 or
|
||
|
greater is required. Dependencies can be installed through a virtual env by
|
||
|
running:
|
||
|
|
||
|
```shell
|
||
|
python3 -m venv .env
|
||
|
source .env/bin/activate
|
||
|
pip install -r requirements.txt
|
||
|
```
|
||
|
|
||
|
## Dataset Extraction
|
||
|
|
||
|
Please refer to [the README.md file in `/dataset`](dataset/README.md) for
|
||
|
documentation on the dataset extraction process.
|
||
|
|
||
|
## Model Training
|
||
|
|
||
|
Model training can be performed by running the script:
|
||
|
|
||
|
```shell
|
||
|
python3 train_model.py
|
||
|
```
|
||
|
|
||
|
The script is able to resume fine-tuning if the pretraining phase was completed
|
||
|
by a previous execution, and it is able to directly skip to model evaluation on
|
||
|
the two test sets if fine-tuning was already completed.
|
||
|
|
||
|
The persisted pretrained model is located in `/models/final/pretrain`. Each
|
||
|
epoch of the fine-tuning train process is persisted at path
|
||
|
`/models/final/<N>`, where `<N>` is the epoch number starting from 0. The epoch
|
||
|
number for the epoch selected by the early stopping process is stored in
|
||
|
`/models/final/best.txt`.
|
||
|
|
||
|
`/models/final/stats.csv` stores the training and validation loss and accuracy
|
||
|
statistics during the training process. `/models/final/test_outputs.csv` is the
|
||
|
CSV deliverable for the test set evaluation on the test set we extracted, while
|
||
|
`/models/final/test_usi_outputs.csv` is the CSV deliverable for the test set
|
||
|
evaluation on the provided test set.
|
||
|
|
||
|
The stdout for the training process script can be found in the file
|
||
|
`/models/final/train_log.txt`.
|
||
|
|
||
|
### Plots
|
||
|
|
||
|
The train and validation loss and accuracy plots can be generated from
|
||
|
`/models/final/stats.csv` with the following command:
|
||
|
|
||
|
```shell
|
||
|
python3 plot_acc.py
|
||
|
```
|
||
|
|
||
|
The output is stored in `/models/final/training_metrics.png`.
|
||
|
|
||
|
# Report
|
||
|
|
||
|
To compile the report run:
|
||
|
|
||
|
```shell
|
||
|
cd report
|
||
|
pdflatex -interaction=nonstopmode -output-directory=. main.tex
|
||
|
pdflatex -interaction=nonstopmode -output-directory=. main.tex
|
||
|
```
|
||
|
|
||
|
The report is then located in `report/main.pdf`.
|