diff --git a/assignment_1/report_Maggioni_Claudio.md b/assignment_1/report_Maggioni_Claudio.md index 0bb7cd2..6362809 100644 --- a/assignment_1/report_Maggioni_Claudio.md +++ b/assignment_1/report_Maggioni_Claudio.md @@ -102,20 +102,82 @@ better. 1. **Explain the curves' behavior in each of the three highlighted sections of the figures, namely (a), (b), and (c).** - I dont know + In the highlighted section (a) the expected test error, the observed + validation error and the observed training error are significantly high and + close toghether. All the errors decrease as the model complexity increases. + In (c), instead, we see a low training error but high validation and + expected test error. The last two increase as the model complexity increases + while the training error is in a plateau. Finally, in (b), we see the test + and validation error curves reaching their respectively lowest points while + the training error curve decreases as the model complexity increases, albeit + in a less steep fashion as its behaviour in (a). 1. **Is any of the three section associated with the concepts of overfitting and underfitting? If yes, explain it.** + Section (a) is associated with underfitting and section (c) is associated + with overfitting. + + The behaviour in (a) is fairly easy to explain: since the model complexity + is insufficient to capture the behaviour of the training data, the model is + unable to provide accurate predictions and thus all MSEs we observe are + rather high. It's worth to point out that the training error curve is quite + close to the validation and the test error: this happens since the model is + both unable to learn accurately the training data and unable to formulate + accurate predictions on the validation and test data. + + In (c) instead, the model complexity is higher than the intrinsic complexity + of the data to model, and thus this extra complexity will learn the + intrinsic noise of the data. This is of course not desirable, and the dire + consequences of this phenomena can be seen in the significant difference + between the observed MSE on training data and MSEs for validation and test + data. Since the model learns the noise of the training data, the model will + accurately predict noise fluctuations on the training data, but since this + noise is completely meaningless information for fitting new datapoints, the + model is unable to accurately predict for validation and test datapoints and + thus the MSEs for those sets are high. + + Finally in (b) we observe fairly appropriate fitting. Since the model + complexity is at least on the same order of magnitude of the intrinsic + complexity of the data the model is able to learn to accurately predict new + data without learning noise. Thus, both the validation and the test MSE + curves reach their lowest point in this region of the graph. + 1. **Is there any evidence of high approximation risk? Why? If yes, in which of the below subfigures?** + Depending on the scale and magnitude of the x axis, there could be + significant approximation risk. This can be observed in subfigure (b), + namely by observing the difference in complexity between the model with + lowest validation error and the optimal model (the model with lowest + expected test error). The distance between the two lines indicated that the + currently chosen family of models (i.e. the currently chosen gray box model + function, and not the value of its hyperparameters) is not completely + adequate to model the process that generated the data to fit. High + approximation risk would cause even a correctly fitted model to have high + test error, since the inherent structure behind the chosen family of models + would be unable to capture the true behaviour of the data. + 1. **Do you think that by further increasing the model complexity you will be able to bring the training error to zero?** + Yes, I think so. The model complexity could be increased up to the point + where the model would be so complex that it could actually remember all x-y + pairs of the training data, thus turning the model function effectively in a + one-to-one direct mapping between input and output data of the training set. + Then, the loss on the training dataset would be exactly 0. + This of course would mean that an absurdly high amount of noise would be + learned as well, thus making the model completely useless for prediction of + new datapoints. + 1. **Do you think that by further increasing the model complexity you will be able to bring the structural risk to zero?** + No, I don't think so. In order to achieve zero structural risk we would need + to have an infinite training dataset covering the entire input parameter + domain. Increasing the model's complexity would actually make the structural + risk increase due to overfitting. + ## Q2. Linear Regression Comment and compare how the (a.) training error, (b.) test error and diff --git a/assignment_1/report_Maggioni_Claudio.pdf b/assignment_1/report_Maggioni_Claudio.pdf index 40c7f2f..b03c1c0 100644 Binary files a/assignment_1/report_Maggioni_Claudio.pdf and b/assignment_1/report_Maggioni_Claudio.pdf differ