# Information Modelling & Analysis: Project 2 Student: Claudio Maggioni Please follow the instructions provided in the project slides and consider the submission instructions available on iCorsi. For your convencience, I the following resources are available in the `resources` folder: - **defects4j-checkout-closure-1f**: The output of the command `defects4j checkout -p Closure -v 1f -w ...` - **modified_classes** The list of buggy classes in: `framework/projects/Closure/modified_classes/` ## Setup To install the required libraries run: ```shell python3 -m venv .env source .env/bin/activate pip install -r requirements.txt ``` ## Data pre-processing To extract and label the feature vectors run: ```shell python3 ./extract_feature_vectors.py python3 ./label_feature_vectors.py ``` The labeled feature vectors are stored in file `./metrics/feature_vectors_labeled.csv` from the repository root. ## Training To train the classifiers with the grid search procedure defined in the report to later extract the optimal combination of hyperparameters run: ```shell python3 ./train_classifiers.py ``` and answer `y` to run again training when prompted. Answering `n` simply computes again data about the best hyperparameter configuration from the metrics produced by a previous training. Raw cross validation training metrics are stored in `./models/models.csv`. The optimal hyperparameter configurations found are stored in `./models/best.csv`. ## Evaluation To run the 20-times 5-fold cross validation procedure delete the file `./models/evaluation.csv` and run: ```shell python3 ./evaluate_classifiers.py ``` Raw data from the repeated cross validation procedure is stored in `./models/evaluation.csv`. P-values for each metric of each classifier pair are stored in `./models/model_stats.csv`.