report done up to training

2023-05-24 18:15:43 +02:00 · 2023-05-24 18:15:43 +02:00 · 2797dc7a9d
commit 2797dc7a9d
parent 185bee2933
3 changed files with 16 additions and 1 deletions
--- a/report/main.md
+++ b/report/main.md
@ -150,7 +150,15 @@ The script `./train_classifiers.py`, according to the random seed $3735924759$,
 The metrics for each classifier and each hyperparameter configuration in decreasing order of
 accuracy are reported in the following sections.

-For each classifier, I then choose the hyperparameter configuration with highest accuracy.
+For each classifier, I then choose the hyperparameter configuration with highest accuracy. Namely, these configurations are:
+
+| **Classifier** | **Hyper-parameter configuration** | **Precision** | **Accuracy** | **Recall** | **F1 Score** |
+|:----|:--------|-:|-:|-:|--:|
+| DecisionTreeClassifier | `criterion`: gini, `splitter`: best | 0.7885 | 0.8506 | 0.9535 | 0.8632 |
+| GaussianNB | -- | 0.8 | 0.6782 | 0.4651 | 0.5882 |
+| MLPClassifier | `activation`: logistic, `hidden_layer_sizes`: (60, 80, 100), `learning_rate`: constant, `max_iter`: 500000, `solver`: lbfgs | 0.8958 | 0.9425 | 1 | 0.9451 |
+| RandomForestClassifier | `class_weight`: balanced, `criterion`: gini, `max_features`: sqrt | 0.8367 | 0.8851 | 0.9535 | 0.8913 |
+| SVC | `gamma`: scale, `kernel`: rbf | 0.7174 | 0.7356 | 0.7674 | 0.7416 |

 ## Decision Tree (DT)

@ -300,6 +308,8 @@ For sake of brevity, only the top 100 results by accuracy are shown.
 | gini        | balanced_subsample | log2           |    0.803922 |   0.862069 | 0.953488 | 0.87234  |
 | entropy     | balanced_subsample | log2           |    0.803922 |   0.862069 | 0.953488 | 0.87234  |

+
+
 # Evaluation

 ## Output Distributions
--- a/report/main.pdf
+++ b/report/main.pdf
--- a/train_classifiers.py
+++ b/train_classifiers.py
@ -150,6 +150,11 @@ def find_best_and_save(df: pd.DataFrame):
      
    metrics = ['precision', 'accuracy', 'recall', 'f1']
    df_best.loc[:, metrics] = df_best.loc[:, metrics].round(decimals=4)
+    df_best = df_best.reindex(
+        ['classifier', 'params'] + \
+        [x for x in df_best.columns if x in metrics], \
+    axis=1)
+    
    print(df_best.to_markdown(index=False))