Archived

No description

This repository has been archived on 2023-06-18. You can view files and clone it, but cannot push or open issues or pull requests.

Find a file

Claudio Maggioni 066c0d0701 updated report		2023-05-31 18:20:34 +02:00
metrics	Balanced classes for classifier training	2023-05-22 17:39:51 +02:00
models	documentation	2023-05-31 18:19:44 +02:00
report	updated report	2023-05-31 18:20:34 +02:00
resources	Initial commit	2023-04-25 11:33:41 +00:00
.gitignore	done part 5, part 6, part 7, and 20-times CV for part 8	2023-04-25 14:23:41 +02:00
evaluate_classifiers.py	documentation	2023-05-31 18:19:44 +02:00
extract_feature_vectors.py	Balanced classes for classifier training	2023-05-22 17:39:51 +02:00
grid_search_table.py	report work	2023-05-24 18:05:44 +02:00
label_feature_vectors.py	done part 5, part 6, part 7, and 20-times CV for part 8	2023-04-25 14:23:41 +02:00
metric_stats.py	beginning of report	2023-05-24 14:06:24 +02:00
README.md	documentation	2023-05-31 18:19:44 +02:00
requirements.txt	all done but practical usefulness	2023-05-27 22:39:27 +02:00
train_classifiers.py	report done up to training	2023-05-24 18:15:43 +02:00

README.md

Information Modelling & Analysis: Project 2

Student: Claudio Maggioni

Please follow the instructions provided in the project slides and consider the submission instructions available on iCorsi.

For your convencience, I the following resources are available in the resources folder:

defects4j-checkout-closure-1f: The output of the command defects4j checkout -p Closure -v 1f -w ...
modified_classes The list of buggy classes in: framework/projects/Closure/modified_classes/

Setup

To install the required libraries run:

python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt

Data pre-processing

To extract and label the feature vectors run:

python3 ./extract_feature_vectors.py
python3 ./label_feature_vectors.py

The labeled feature vectors are stored in file ./metrics/feature_vectors_labeled.csv from the repository root.

Training

To train the classifiers with the grid search procedure defined in the report to later extract the optimal combination of hyperparameters run:

python3 ./train_classifiers.py

and answer y to run again training when prompted. Answering n simply computes again data about the best hyperparameter configuration from the metrics produced by a previous training.

Raw cross validation training metrics are stored in ./models/models.csv. The optimal hyperparameter configurations found are stored in ./models/best.csv.

Evaluation

To run the 20-times 5-fold cross validation procedure delete the file ./models/evaluation.csv and run:

python3 ./evaluate_classifiers.py

Raw data from the repeated cross validation procedure is stored in ./models/evaluation.csv. P-values for each metric of each classifier pair are stored in ./models/model_stats.csv.