2023-04-25 11:33:41 +00:00
|
|
|
# Information Modelling & Analysis: Project 2
|
|
|
|
|
2023-05-31 16:19:44 +00:00
|
|
|
Student: Claudio Maggioni
|
2023-04-25 11:33:41 +00:00
|
|
|
|
|
|
|
Please follow the instructions provided in the project slides
|
|
|
|
and consider the submission instructions available on iCorsi.
|
|
|
|
|
|
|
|
For your convencience, I the following resources are available in the `resources` folder:
|
|
|
|
- **defects4j-checkout-closure-1f**: The output of the command `defects4j checkout -p Closure -v 1f -w ...`
|
2023-05-31 16:19:44 +00:00
|
|
|
- **modified_classes** The list of buggy classes in: `framework/projects/Closure/modified_classes/`
|
|
|
|
|
|
|
|
## Setup
|
|
|
|
|
|
|
|
To install the required libraries run:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
python3 -m venv .env
|
|
|
|
source .env/bin/activate
|
|
|
|
pip install -r requirements.txt
|
|
|
|
```
|
|
|
|
|
|
|
|
## Data pre-processing
|
|
|
|
|
|
|
|
To extract and label the feature vectors run:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
python3 ./extract_feature_vectors.py
|
|
|
|
python3 ./label_feature_vectors.py
|
|
|
|
```
|
|
|
|
|
|
|
|
The labeled feature vectors are stored in file
|
|
|
|
`./metrics/feature_vectors_labeled.csv` from the repository root.
|
|
|
|
|
|
|
|
## Training
|
|
|
|
|
|
|
|
To train the classifiers with the grid search procedure defined in the report
|
|
|
|
to later extract the optimal combination of hyperparameters run:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
python3 ./train_classifiers.py
|
|
|
|
```
|
|
|
|
|
|
|
|
and answer `y` to run again training when prompted. Answering `n` simply
|
|
|
|
computes again data about the best hyperparameter configuration from the metrics
|
|
|
|
produced by a previous training.
|
|
|
|
|
|
|
|
Raw cross validation training metrics are stored in `./models/models.csv`. The
|
|
|
|
optimal hyperparameter configurations found are stored in `./models/best.csv`.
|
|
|
|
|
|
|
|
## Evaluation
|
|
|
|
|
|
|
|
To run the 20-times 5-fold cross validation procedure delete the file
|
|
|
|
`./models/evaluation.csv` and run:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
python3 ./evaluate_classifiers.py
|
|
|
|
```
|
|
|
|
|
|
|
|
Raw data from the repeated cross validation procedure is stored in
|
|
|
|
`./models/evaluation.csv`. P-values for each metric of each classifier pair are
|
|
|
|
stored in `./models/model_stats.csv`.
|
|
|
|
|