Archived

No description

This repository has been archived on 2024-10-22. You can view files and clone it, but cannot push or open issues or pull requests.

Find a file

Claudio Maggioni a4ceee8716 Final version of the project History has been rewritten to delete large files in repo		2024-01-03 15:28:43 +01:00
dataset	Final version of the project	2024-01-03 15:28:43 +01:00
models	Final version of the project	2024-01-03 15:28:43 +01:00
report	Final version of the project	2024-01-03 15:28:43 +01:00
test	Final version of the project	2024-01-03 15:28:43 +01:00
train	Final version of the project	2024-01-03 15:28:43 +01:00
.gitignore	Final version of the project	2024-01-03 15:28:43 +01:00
environment.yml	Final version of the project	2024-01-03 15:28:43 +01:00
plot_acc.py	Final version of the project	2024-01-03 15:28:43 +01:00
README.md	Final version of the project	2024-01-03 15:28:43 +01:00
requirements.txt	Final version of the project	2024-01-03 15:28:43 +01:00
train_model.py	Final version of the project	2024-01-03 15:28:43 +01:00

README.md

Assignment 2: If statements

Group 2: Baris Aksakal, Edoardo Riggio, Claudio Maggioni

Repository Structure

/dataset: code and data related to scraping repository from GitHub;
/models
- /baris: code and persisted model of the original architecture built by Baris. model_0.1.ipynb and test_model.ipynb are respectively an earlier and later iteration of the code used to train this model;
- /final: persisted model for the final architecture with training and test evaluation statistics;
  - /test_outputs.csv: CSV deliverable for the test set evaluation on the test set we extracted;
  - /test_usi_outputs.csv: CSV deliverable for the test set evaluation on the provided test set.
/test: unit tests for the model training scripts;
/train: dependencies of the main model training script;
/train_model.py: main model training script;
/plot_acc.py: accuracy statistics plotting script.

Environment Setup

In order to execute both the scraping and training scripts, Python 3.10 or greater is required. Dependencies can be installed through a virtual env by running:

python3 -m venv .env 
source .env/bin/activate 
pip install -r requirements.txt

Dataset Extraction

Please refer to the README.md file in /dataset for documentation on the dataset extraction process.

Model Training

Model training can be performed by running the script:

python3 train_model.py

The script is able to resume fine-tuning if the pretraining phase was completed by a previous execution, and it is able to directly skip to model evaluation on the two test sets if fine-tuning was already completed.

The persisted pretrained model is located in /models/final/pretrain. Each epoch of the fine-tuning train process is persisted at path /models/final/<N>, where <N> is the epoch number starting from 0. The epoch number for the epoch selected by the early stopping process is stored in /models/final/best.txt.

/models/final/stats.csv stores the training and validation loss and accuracy statistics during the training process. /models/final/test_outputs.csv is the CSV deliverable for the test set evaluation on the test set we extracted, while /models/final/test_usi_outputs.csv is the CSV deliverable for the test set evaluation on the provided test set.

The stdout for the training process script can be found in the file /models/final/train_log.txt.

Plots

The train and validation loss and accuracy plots can be generated from /models/final/stats.csv with the following command:

python3 plot_acc.py

The output is stored in /models/final/training_metrics.png.

Report

To compile the report run:

cd report
pdflatex -interaction=nonstopmode -output-directory=. main.tex
pdflatex -interaction=nonstopmode -output-directory=. main.tex

The report is then located in report/main.pdf.