metrics | ||
models | ||
report | ||
resources | ||
.gitignore | ||
evaluate_classifiers.py | ||
extract_feature_vectors.py | ||
grid_search_table.py | ||
label_feature_vectors.py | ||
metric_stats.py | ||
README.md | ||
requirements.txt | ||
train_classifiers.py |
Information Modelling & Analysis: Project 2
Student: Claudio Maggioni
Please follow the instructions provided in the project slides and consider the submission instructions available on iCorsi.
For your convencience, I the following resources are available in the resources
folder:
- defects4j-checkout-closure-1f: The output of the command
defects4j checkout -p Closure -v 1f -w ...
- modified_classes The list of buggy classes in:
framework/projects/Closure/modified_classes/
Setup
To install the required libraries run:
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
Data pre-processing
To extract and label the feature vectors run:
python3 ./extract_feature_vectors.py
python3 ./label_feature_vectors.py
The labeled feature vectors are stored in file
./metrics/feature_vectors_labeled.csv
from the repository root.
Training
To train the classifiers with the grid search procedure defined in the report to later extract the optimal combination of hyperparameters run:
python3 ./train_classifiers.py
and answer y
to run again training when prompted. Answering n
simply
computes again data about the best hyperparameter configuration from the metrics
produced by a previous training.
Raw cross validation training metrics are stored in ./models/models.csv
. The
optimal hyperparameter configurations found are stored in ./models/best.csv
.
Evaluation
To run the 20-times 5-fold cross validation procedure delete the file
./models/evaluation.csv
and run:
python3 ./evaluate_classifiers.py
Raw data from the repeated cross validation procedure is stored in
./models/evaluation.csv
. P-values for each metric of each classifier pair are
stored in ./models/model_stats.csv
.