# Assignment 2: If statements **Group 2: Baris Aksakal, Edoardo Riggio, Claudio Maggioni** ## Repository Structure - `/dataset`: code and data related to scraping repository from GitHub; - `/models` - `/baris`: code and persisted model of the original architecture built by Baris. `model_0.1.ipynb` and `test_model.ipynb` are respectively an earlier and later iteration of the code used to train this model; - `/final`: persisted model for the final architecture with training and test evaluation statistics; - `/test_outputs.csv`: CSV deliverable for the test set evaluation on the test set we extracted; - `/test_usi_outputs.csv`: CSV deliverable for the test set evaluation on the provided test set. - `/test`: unit tests for the model training scripts; - `/train`: dependencies of the main model training script; - `/train_model.py`: main model training script; - `/plot_acc.py`: accuracy statistics plotting script. ## Environment Setup In order to execute both the scraping and training scripts, Python 3.10 or greater is required. Dependencies can be installed through a virtual env by running: ```shell python3 -m venv .env source .env/bin/activate pip install -r requirements.txt ``` ## Dataset Extraction Please refer to [the README.md file in `/dataset`](dataset/README.md) for documentation on the dataset extraction process. ## Model Training Model training can be performed by running the script: ```shell python3 train_model.py ``` The script is able to resume fine-tuning if the pretraining phase was completed by a previous execution, and it is able to directly skip to model evaluation on the two test sets if fine-tuning was already completed. The persisted pretrained model is located in `/models/final/pretrain`. Each epoch of the fine-tuning train process is persisted at path `/models/final/`, where `` is the epoch number starting from 0. The epoch number for the epoch selected by the early stopping process is stored in `/models/final/best.txt`. `/models/final/stats.csv` stores the training and validation loss and accuracy statistics during the training process. `/models/final/test_outputs.csv` is the CSV deliverable for the test set evaluation on the test set we extracted, while `/models/final/test_usi_outputs.csv` is the CSV deliverable for the test set evaluation on the provided test set. The stdout for the training process script can be found in the file `/models/final/train_log.txt`. ### Plots The train and validation loss and accuracy plots can be generated from `/models/final/stats.csv` with the following command: ```shell python3 plot_acc.py ``` The output is stored in `/models/final/training_metrics.png`. # Report To compile the report run: ```shell cd report pdflatex -interaction=nonstopmode -output-directory=. main.tex pdflatex -interaction=nonstopmode -output-directory=. main.tex ``` The report is then located in `report/main.pdf`.