hw1: done T1-3 (no report) and Q1

2021-05-04 14:29:06 +02:00 · 2021-05-04 14:29:06 +02:00 · 7fb4401bbc
commit 7fb4401bbc
parent 2bc8e0ae55
2 changed files with 63 additions and 1 deletions
--- a/assignment_1/report_Maggioni_Claudio.md
+++ b/assignment_1/report_Maggioni_Claudio.md
@ -102,20 +102,82 @@ better.
 1.  **Explain the curves' behavior in each of the three highlighted
    sections of the figures, namely (a), (b), and (c).**
-    I dont know
+    In the highlighted section (a) the expected test error, the observed
    validation error and the observed training error are significantly high and
    close toghether. All the errors decrease as the model complexity increases.
    In (c), instead, we see a low training error but high validation and
    expected test error. The last two increase as the model complexity increases
    while the training error is in a plateau. Finally, in (b), we see the test
    and validation error curves reaching their respectively lowest points while
    the training error curve decreases as the model complexity increases, albeit
    in a less steep fashion as its behaviour in (a).
 1.  **Is any of the three section associated with the concepts of
    overfitting and underfitting? If yes, explain it.**
    Section (a) is associated with underfitting and section (c) is associated
    with overfitting.
    The behaviour in (a) is fairly easy to explain: since the model complexity
    is insufficient to capture the behaviour of the training data, the model is
    unable to provide accurate predictions and thus all MSEs we observe are
    rather high. It's worth to point out that the training error curve is quite
    close to the validation and the test error: this happens since the model is
    both unable to learn accurately the training data and unable to formulate
    accurate predictions on the validation and test data.
    In (c) instead, the model complexity is higher than the intrinsic complexity
    of the data to model, and thus this extra complexity will learn the
    intrinsic noise of the data. This is of course not desirable, and the dire
    consequences of this phenomena can be seen in the significant difference
    between the observed MSE on training data and MSEs for validation and test
    data. Since the model learns the noise of the training data, the model will
    accurately predict noise fluctuations on the training data, but since this
    noise is completely meaningless information for fitting new datapoints, the
    model is unable to accurately predict for validation and test datapoints and
    thus the MSEs for those sets are high.
    Finally in (b) we observe fairly appropriate fitting. Since the model
    complexity is at least on the same order of magnitude of the intrinsic
    complexity of the data the model is able to learn to accurately predict new
    data without learning noise. Thus, both the validation and the test MSE
    curves reach their lowest point in this region of the graph.
 1.  **Is there any evidence of high approximation risk? Why? If yes, in
    which of the below subfigures?**
    Depending on the scale and magnitude of the x axis, there could be
    significant approximation risk. This can be observed in subfigure (b),
    namely by observing the difference in complexity between the model with
    lowest validation error and the optimal model (the model with lowest
    expected test error). The distance between the two lines indicated that the
    currently chosen family of models (i.e. the currently chosen gray box model
    function, and not the value of its hyperparameters) is not completely
    adequate to model the process that generated the data to fit. High
    approximation risk would cause even a correctly fitted model to have high
    test error, since the inherent structure behind the chosen family of models
    would be unable to capture the true behaviour of the data.
 1.  **Do you think that by further increasing the model complexity you
    will be able to bring the training error to zero?**
    Yes, I think so. The model complexity could be increased up to the point
    where the model would be so complex that it could actually remember all x-y
    pairs of the training data, thus turning the model function effectively in a
    one-to-one direct mapping between input and output data of the training set.
    Then, the loss on the training dataset would be exactly 0.
    This of course would mean that an absurdly high amount of noise would be
    learned as well, thus making the model completely useless for prediction of
    new datapoints.
 1.  **Do you think that by further increasing the model complexity you
    will be able to bring the structural risk to zero?**
    No, I don't think so. In order to achieve zero structural risk we would need
    to have an infinite training dataset covering the entire input parameter
    domain. Increasing the model's complexity would actually make the structural
    risk increase due to overfitting.
 ## Q2. Linear Regression
 Comment and compare how the (a.) training error, (b.) test error and
--- a/assignment_1/report_Maggioni_Claudio.pdf
+++ b/assignment_1/report_Maggioni_Claudio.pdf