This repository has been archived on 2024-10-22. You can view files and clone it, but cannot push or open issues or pull requests.
soft-analytics-02/README.md

88 lines
2.9 KiB
Markdown
Raw Permalink Normal View History

# Assignment 2: If statements
**Group 2: Baris Aksakal, Edoardo Riggio, Claudio Maggioni**
## Repository Structure
- `/dataset`: code and data related to scraping repository from GitHub;
- `/models`
- `/baris`: code and persisted model of the original architecture built by
Baris. `model_0.1.ipynb` and `test_model.ipynb` are respectively an
earlier and later iteration of the code used to train this model;
- `/final`: persisted model for the final architecture with training and
test evaluation statistics;
- `/test_outputs.csv`: CSV deliverable for the test set evaluation on
the test set we extracted;
- `/test_usi_outputs.csv`: CSV deliverable for the test set evaluation
on the provided test set.
- `/test`: unit tests for the model training scripts;
- `/train`: dependencies of the main model training script;
- `/train_model.py`: main model training script;
- `/plot_acc.py`: accuracy statistics plotting script.
## Environment Setup
In order to execute both the scraping and training scripts, Python 3.10 or
greater is required. Dependencies can be installed through a virtual env by
running:
```shell
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
```
## Dataset Extraction
Please refer to [the README.md file in `/dataset`](dataset/README.md) for
documentation on the dataset extraction process.
## Model Training
Model training can be performed by running the script:
```shell
python3 train_model.py
```
The script is able to resume fine-tuning if the pretraining phase was completed
by a previous execution, and it is able to directly skip to model evaluation on
the two test sets if fine-tuning was already completed.
The persisted pretrained model is located in `/models/final/pretrain`. Each
epoch of the fine-tuning train process is persisted at path
`/models/final/<N>`, where `<N>` is the epoch number starting from 0. The epoch
number for the epoch selected by the early stopping process is stored in
`/models/final/best.txt`.
`/models/final/stats.csv` stores the training and validation loss and accuracy
statistics during the training process. `/models/final/test_outputs.csv` is the
CSV deliverable for the test set evaluation on the test set we extracted, while
`/models/final/test_usi_outputs.csv` is the CSV deliverable for the test set
evaluation on the provided test set.
The stdout for the training process script can be found in the file
`/models/final/train_log.txt`.
### Plots
The train and validation loss and accuracy plots can be generated from
`/models/final/stats.csv` with the following command:
```shell
python3 plot_acc.py
```
The output is stored in `/models/final/training_metrics.png`.
# Report
To compile the report run:
```shell
cd report
pdflatex -interaction=nonstopmode -output-directory=. main.tex
pdflatex -interaction=nonstopmode -output-directory=. main.tex
```
The report is then located in `report/main.pdf`.