diff --git a/report/main.md b/report/main.md index e47a3d9..16b84ff 100644 --- a/report/main.md +++ b/report/main.md @@ -150,7 +150,15 @@ The script `./train_classifiers.py`, according to the random seed $3735924759$, The metrics for each classifier and each hyperparameter configuration in decreasing order of accuracy are reported in the following sections. -For each classifier, I then choose the hyperparameter configuration with highest accuracy. +For each classifier, I then choose the hyperparameter configuration with highest accuracy. Namely, these configurations are: + +| **Classifier** | **Hyper-parameter configuration** | **Precision** | **Accuracy** | **Recall** | **F1 Score** | +|:----|:--------|-:|-:|-:|--:| +| DecisionTreeClassifier | `criterion`: gini, `splitter`: best | 0.7885 | 0.8506 | 0.9535 | 0.8632 | +| GaussianNB | -- | 0.8 | 0.6782 | 0.4651 | 0.5882 | +| MLPClassifier | `activation`: logistic, `hidden_layer_sizes`: (60, 80, 100), `learning_rate`: constant, `max_iter`: 500000, `solver`: lbfgs | 0.8958 | 0.9425 | 1 | 0.9451 | +| RandomForestClassifier | `class_weight`: balanced, `criterion`: gini, `max_features`: sqrt | 0.8367 | 0.8851 | 0.9535 | 0.8913 | +| SVC | `gamma`: scale, `kernel`: rbf | 0.7174 | 0.7356 | 0.7674 | 0.7416 | ## Decision Tree (DT) @@ -300,6 +308,8 @@ For sake of brevity, only the top 100 results by accuracy are shown. | gini | balanced_subsample | log2 | 0.803922 | 0.862069 | 0.953488 | 0.87234 | | entropy | balanced_subsample | log2 | 0.803922 | 0.862069 | 0.953488 | 0.87234 | + + # Evaluation ## Output Distributions diff --git a/report/main.pdf b/report/main.pdf index 553eced..523022b 100644 Binary files a/report/main.pdf and b/report/main.pdf differ diff --git a/train_classifiers.py b/train_classifiers.py index fadcc28..e3e06ff 100755 --- a/train_classifiers.py +++ b/train_classifiers.py @@ -150,6 +150,11 @@ def find_best_and_save(df: pd.DataFrame): metrics = ['precision', 'accuracy', 'recall', 'f1'] df_best.loc[:, metrics] = df_best.loc[:, metrics].round(decimals=4) + df_best = df_best.reindex( + ['classifier', 'params'] + \ + [x for x in df_best.columns if x in metrics], \ + axis=1) + print(df_best.to_markdown(index=False))