hw1: done T1-3 (no report) and Q1

2021-05-04 14:29:06 +02:00 · 2021-05-04 14:29:06 +02:00 · 7fb4401bbc
commit 7fb4401bbc
parent 2bc8e0ae55
2 changed files with 63 additions and 1 deletions
--- a/assignment_1/report_Maggioni_Claudio.md
+++ b/assignment_1/report_Maggioni_Claudio.md
@ -102,20 +102,82 @@ better.
 1.  **Explain the curves' behavior in each of the three highlighted
    sections of the figures, namely (a), (b), and (c).**

-    I dont know
+    In the highlighted section (a) the expected test error, the observed
+    validation error and the observed training error are significantly high and
+    close toghether. All the errors decrease as the model complexity increases.
+    In (c), instead, we see a low training error but high validation and
+    expected test error. The last two increase as the model complexity increases
+    while the training error is in a plateau. Finally, in (b), we see the test
+    and validation error curves reaching their respectively lowest points while
+    the training error curve decreases as the model complexity increases, albeit
+    in a less steep fashion as its behaviour in (a).

 1.  **Is any of the three section associated with the concepts of
    overfitting and underfitting? If yes, explain it.**

+    Section (a) is associated with underfitting and section (c) is associated
+    with overfitting.
+
+    The behaviour in (a) is fairly easy to explain: since the model complexity
+    is insufficient to capture the behaviour of the training data, the model is
+    unable to provide accurate predictions and thus all MSEs we observe are
+    rather high. It's worth to point out that the training error curve is quite
+    close to the validation and the test error: this happens since the model is
+    both unable to learn accurately the training data and unable to formulate
+    accurate predictions on the validation and test data.
+
+    In (c) instead, the model complexity is higher than the intrinsic complexity
+    of the data to model, and thus this extra complexity will learn the
+    intrinsic noise of the data. This is of course not desirable, and the dire
+    consequences of this phenomena can be seen in the significant difference
+    between the observed MSE on training data and MSEs for validation and test
+    data. Since the model learns the noise of the training data, the model will
+    accurately predict noise fluctuations on the training data, but since this
+    noise is completely meaningless information for fitting new datapoints, the
+    model is unable to accurately predict for validation and test datapoints and
+    thus the MSEs for those sets are high.
+
+    Finally in (b) we observe fairly appropriate fitting. Since the model
+    complexity is at least on the same order of magnitude of the intrinsic
+    complexity of the data the model is able to learn to accurately predict new
+    data without learning noise. Thus, both the validation and the test MSE
+    curves reach their lowest point in this region of the graph.
+
 1.  **Is there any evidence of high approximation risk? Why? If yes, in
    which of the below subfigures?**

+    Depending on the scale and magnitude of the x axis, there could be
+    significant approximation risk. This can be observed in subfigure (b),
+    namely by observing the difference in complexity between the model with
+    lowest validation error and the optimal model (the model with lowest
+    expected test error). The distance between the two lines indicated that the
+    currently chosen family of models (i.e. the currently chosen gray box model
+    function, and not the value of its hyperparameters) is not completely
+    adequate to model the process that generated the data to fit. High
+    approximation risk would cause even a correctly fitted model to have high
+    test error, since the inherent structure behind the chosen family of models
+    would be unable to capture the true behaviour of the data.
+
 1.  **Do you think that by further increasing the model complexity you
    will be able to bring the training error to zero?**

+    Yes, I think so. The model complexity could be increased up to the point
+    where the model would be so complex that it could actually remember all x-y
+    pairs of the training data, thus turning the model function effectively in a
+    one-to-one direct mapping between input and output data of the training set.
+    Then, the loss on the training dataset would be exactly 0.
+    This of course would mean that an absurdly high amount of noise would be
+    learned as well, thus making the model completely useless for prediction of
+    new datapoints.
+
 1.  **Do you think that by further increasing the model complexity you
    will be able to bring the structural risk to zero?**

+    No, I don't think so. In order to achieve zero structural risk we would need
+    to have an infinite training dataset covering the entire input parameter
+    domain. Increasing the model's complexity would actually make the structural
+    risk increase due to overfitting.
+
 ## Q2. Linear Regression

 Comment and compare how the (a.) training error, (b.) test error and
--- a/assignment_1/report_Maggioni_Claudio.pdf
+++ b/assignment_1/report_Maggioni_Claudio.pdf