report done up to training

This commit is contained in:
Claudio Maggioni 2023-05-24 18:15:43 +02:00
parent 185bee2933
commit 2797dc7a9d
3 changed files with 16 additions and 1 deletions

View file

@ -150,7 +150,15 @@ The script `./train_classifiers.py`, according to the random seed $3735924759$,
The metrics for each classifier and each hyperparameter configuration in decreasing order of The metrics for each classifier and each hyperparameter configuration in decreasing order of
accuracy are reported in the following sections. accuracy are reported in the following sections.
For each classifier, I then choose the hyperparameter configuration with highest accuracy. For each classifier, I then choose the hyperparameter configuration with highest accuracy. Namely, these configurations are:
| **Classifier** | **Hyper-parameter configuration** | **Precision** | **Accuracy** | **Recall** | **F1 Score** |
|:----|:--------|-:|-:|-:|--:|
| DecisionTreeClassifier | `criterion`: gini, `splitter`: best | 0.7885 | 0.8506 | 0.9535 | 0.8632 |
| GaussianNB | -- | 0.8 | 0.6782 | 0.4651 | 0.5882 |
| MLPClassifier | `activation`: logistic, `hidden_layer_sizes`: (60, 80, 100), `learning_rate`: constant, `max_iter`: 500000, `solver`: lbfgs | 0.8958 | 0.9425 | 1 | 0.9451 |
| RandomForestClassifier | `class_weight`: balanced, `criterion`: gini, `max_features`: sqrt | 0.8367 | 0.8851 | 0.9535 | 0.8913 |
| SVC | `gamma`: scale, `kernel`: rbf | 0.7174 | 0.7356 | 0.7674 | 0.7416 |
## Decision Tree (DT) ## Decision Tree (DT)
@ -300,6 +308,8 @@ For sake of brevity, only the top 100 results by accuracy are shown.
| gini | balanced_subsample | log2 | 0.803922 | 0.862069 | 0.953488 | 0.87234 | | gini | balanced_subsample | log2 | 0.803922 | 0.862069 | 0.953488 | 0.87234 |
| entropy | balanced_subsample | log2 | 0.803922 | 0.862069 | 0.953488 | 0.87234 | | entropy | balanced_subsample | log2 | 0.803922 | 0.862069 | 0.953488 | 0.87234 |
# Evaluation # Evaluation
## Output Distributions ## Output Distributions

Binary file not shown.

View file

@ -150,6 +150,11 @@ def find_best_and_save(df: pd.DataFrame):
metrics = ['precision', 'accuracy', 'recall', 'f1'] metrics = ['precision', 'accuracy', 'recall', 'f1']
df_best.loc[:, metrics] = df_best.loc[:, metrics].round(decimals=4) df_best.loc[:, metrics] = df_best.loc[:, metrics].round(decimals=4)
df_best = df_best.reindex(
['classifier', 'params'] + \
[x for x in df_best.columns if x in metrics], \
axis=1)
print(df_best.to_markdown(index=False)) print(df_best.to_markdown(index=False))