|
||
---|---|---|
dataset | ||
models | ||
report | ||
test | ||
train | ||
.gitignore | ||
environment.yml | ||
plot_acc.py | ||
README.md | ||
requirements.txt | ||
train_model.py |
Assignment 2: If statements
Group 2: Baris Aksakal, Edoardo Riggio, Claudio Maggioni
Repository Structure
/dataset
: code and data related to scraping repository from GitHub;/models
/baris
: code and persisted model of the original architecture built by Baris.model_0.1.ipynb
andtest_model.ipynb
are respectively an earlier and later iteration of the code used to train this model;/final
: persisted model for the final architecture with training and test evaluation statistics;/test_outputs.csv
: CSV deliverable for the test set evaluation on the test set we extracted;/test_usi_outputs.csv
: CSV deliverable for the test set evaluation on the provided test set.
/test
: unit tests for the model training scripts;/train
: dependencies of the main model training script;/train_model.py
: main model training script;/plot_acc.py
: accuracy statistics plotting script.
Environment Setup
In order to execute both the scraping and training scripts, Python 3.10 or greater is required. Dependencies can be installed through a virtual env by running:
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
Dataset Extraction
Please refer to the README.md file in /dataset
for
documentation on the dataset extraction process.
Model Training
Model training can be performed by running the script:
python3 train_model.py
The script is able to resume fine-tuning if the pretraining phase was completed by a previous execution, and it is able to directly skip to model evaluation on the two test sets if fine-tuning was already completed.
The persisted pretrained model is located in /models/final/pretrain
. Each
epoch of the fine-tuning train process is persisted at path
/models/final/<N>
, where <N>
is the epoch number starting from 0. The epoch
number for the epoch selected by the early stopping process is stored in
/models/final/best.txt
.
/models/final/stats.csv
stores the training and validation loss and accuracy
statistics during the training process. /models/final/test_outputs.csv
is the
CSV deliverable for the test set evaluation on the test set we extracted, while
/models/final/test_usi_outputs.csv
is the CSV deliverable for the test set
evaluation on the provided test set.
The stdout for the training process script can be found in the file
/models/final/train_log.txt
.
Plots
The train and validation loss and accuracy plots can be generated from
/models/final/stats.csv
with the following command:
python3 plot_acc.py
The output is stored in /models/final/training_metrics.png
.
Report
To compile the report run:
cd report
pdflatex -interaction=nonstopmode -output-directory=. main.tex
pdflatex -interaction=nonstopmode -output-directory=. main.tex
The report is then located in report/main.pdf
.