2.9 KiB
Assignment 2: If statements
Group 2: Baris Aksakal, Edoardo Riggio, Claudio Maggioni
Repository Structure
: code and data related to scraping repository from GitHub;/models
: code and persisted model of the original architecture built by Baris.model_0.1.ipynb
are respectively an earlier and later iteration of the code used to train this model;/final
: persisted model for the final architecture with training and test evaluation statistics;/test_outputs.csv
: CSV deliverable for the test set evaluation on the test set we extracted;/test_usi_outputs.csv
: CSV deliverable for the test set evaluation on the provided test set.
: unit tests for the model training scripts;/train
: dependencies of the main model training script;/train_model.py
: main model training script;/plot_acc.py
: accuracy statistics plotting script.
Environment Setup
In order to execute both the scraping and training scripts, Python 3.10 or greater is required. Dependencies can be installed through a virtual env by running:
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
Dataset Extraction
Please refer to the README.md file in /dataset
documentation on the dataset extraction process.
Model Training
Model training can be performed by running the script:
python3 train_model.py
The script is able to resume fine-tuning if the pretraining phase was completed by a previous execution, and it is able to directly skip to model evaluation on the two test sets if fine-tuning was already completed.
The persisted pretrained model is located in /models/final/pretrain
. Each
epoch of the fine-tuning train process is persisted at path
, where <N>
is the epoch number starting from 0. The epoch
number for the epoch selected by the early stopping process is stored in
stores the training and validation loss and accuracy
statistics during the training process. /models/final/test_outputs.csv
is the
CSV deliverable for the test set evaluation on the test set we extracted, while
is the CSV deliverable for the test set
evaluation on the provided test set.
The stdout for the training process script can be found in the file
The train and validation loss and accuracy plots can be generated from
with the following command:
python3 plot_acc.py
The output is stored in /models/final/training_metrics.png
To compile the report run:
cd report
pdflatex -interaction=nonstopmode -output-directory=. main.tex
pdflatex -interaction=nonstopmode -output-directory=. main.tex
The report is then located in report/main.pdf