Archived

This repository has been archived on 2024-10-22. You can view files and clone it, but cannot push or open issues or pull requests.

Claudio Maggioni a4ceee8716 Final version of the project

History has been rewritten to delete large files in repo

2024-01-03 15:28:43 +01:00

2.9 KiB

Raw Blame History

Assignment 2: If statements

Group 2: Baris Aksakal, Edoardo Riggio, Claudio Maggioni

Repository Structure

/dataset: code and data related to scraping repository from GitHub;
/models
- /baris: code and persisted model of the original architecture built by Baris. model_0.1.ipynb and test_model.ipynb are respectively an earlier and later iteration of the code used to train this model;
- /final: persisted model for the final architecture with training and test evaluation statistics;
  - /test_outputs.csv: CSV deliverable for the test set evaluation on the test set we extracted;
  - /test_usi_outputs.csv: CSV deliverable for the test set evaluation on the provided test set.
/test: unit tests for the model training scripts;
/train: dependencies of the main model training script;
/train_model.py: main model training script;
/plot_acc.py: accuracy statistics plotting script.

Environment Setup

In order to execute both the scraping and training scripts, Python 3.10 or greater is required. Dependencies can be installed through a virtual env by running:

python3 -m venv .env 
source .env/bin/activate 
pip install -r requirements.txt

Dataset Extraction

Please refer to the README.md file in /dataset for documentation on the dataset extraction process.

Model Training

Model training can be performed by running the script:

python3 train_model.py

The script is able to resume fine-tuning if the pretraining phase was completed by a previous execution, and it is able to directly skip to model evaluation on the two test sets if fine-tuning was already completed.

The persisted pretrained model is located in /models/final/pretrain. Each epoch of the fine-tuning train process is persisted at path /models/final/<N>, where <N> is the epoch number starting from 0. The epoch number for the epoch selected by the early stopping process is stored in /models/final/best.txt.

/models/final/stats.csv stores the training and validation loss and accuracy statistics during the training process. /models/final/test_outputs.csv is the CSV deliverable for the test set evaluation on the test set we extracted, while /models/final/test_usi_outputs.csv is the CSV deliverable for the test set evaluation on the provided test set.

The stdout for the training process script can be found in the file /models/final/train_log.txt.

Plots

The train and validation loss and accuracy plots can be generated from /models/final/stats.csv with the following command:

python3 plot_acc.py

The output is stored in /models/final/training_metrics.png.

Report

To compile the report run:

cd report
pdflatex -interaction=nonstopmode -output-directory=. main.tex
pdflatex -interaction=nonstopmode -output-directory=. main.tex

The report is then located in report/main.pdf.

2.9 KiB Raw Blame History