hw1: done T1-3 (no report) and Q1
This commit is contained in:
parent
2bc8e0ae55
commit
7fb4401bbc
2 changed files with 63 additions and 1 deletions
|
@ -102,20 +102,82 @@ better.
|
||||||
1. **Explain the curves' behavior in each of the three highlighted
|
1. **Explain the curves' behavior in each of the three highlighted
|
||||||
sections of the figures, namely (a), (b), and (c).**
|
sections of the figures, namely (a), (b), and (c).**
|
||||||
|
|
||||||
I dont know
|
In the highlighted section (a) the expected test error, the observed
|
||||||
|
validation error and the observed training error are significantly high and
|
||||||
|
close toghether. All the errors decrease as the model complexity increases.
|
||||||
|
In (c), instead, we see a low training error but high validation and
|
||||||
|
expected test error. The last two increase as the model complexity increases
|
||||||
|
while the training error is in a plateau. Finally, in (b), we see the test
|
||||||
|
and validation error curves reaching their respectively lowest points while
|
||||||
|
the training error curve decreases as the model complexity increases, albeit
|
||||||
|
in a less steep fashion as its behaviour in (a).
|
||||||
|
|
||||||
1. **Is any of the three section associated with the concepts of
|
1. **Is any of the three section associated with the concepts of
|
||||||
overfitting and underfitting? If yes, explain it.**
|
overfitting and underfitting? If yes, explain it.**
|
||||||
|
|
||||||
|
Section (a) is associated with underfitting and section (c) is associated
|
||||||
|
with overfitting.
|
||||||
|
|
||||||
|
The behaviour in (a) is fairly easy to explain: since the model complexity
|
||||||
|
is insufficient to capture the behaviour of the training data, the model is
|
||||||
|
unable to provide accurate predictions and thus all MSEs we observe are
|
||||||
|
rather high. It's worth to point out that the training error curve is quite
|
||||||
|
close to the validation and the test error: this happens since the model is
|
||||||
|
both unable to learn accurately the training data and unable to formulate
|
||||||
|
accurate predictions on the validation and test data.
|
||||||
|
|
||||||
|
In (c) instead, the model complexity is higher than the intrinsic complexity
|
||||||
|
of the data to model, and thus this extra complexity will learn the
|
||||||
|
intrinsic noise of the data. This is of course not desirable, and the dire
|
||||||
|
consequences of this phenomena can be seen in the significant difference
|
||||||
|
between the observed MSE on training data and MSEs for validation and test
|
||||||
|
data. Since the model learns the noise of the training data, the model will
|
||||||
|
accurately predict noise fluctuations on the training data, but since this
|
||||||
|
noise is completely meaningless information for fitting new datapoints, the
|
||||||
|
model is unable to accurately predict for validation and test datapoints and
|
||||||
|
thus the MSEs for those sets are high.
|
||||||
|
|
||||||
|
Finally in (b) we observe fairly appropriate fitting. Since the model
|
||||||
|
complexity is at least on the same order of magnitude of the intrinsic
|
||||||
|
complexity of the data the model is able to learn to accurately predict new
|
||||||
|
data without learning noise. Thus, both the validation and the test MSE
|
||||||
|
curves reach their lowest point in this region of the graph.
|
||||||
|
|
||||||
1. **Is there any evidence of high approximation risk? Why? If yes, in
|
1. **Is there any evidence of high approximation risk? Why? If yes, in
|
||||||
which of the below subfigures?**
|
which of the below subfigures?**
|
||||||
|
|
||||||
|
Depending on the scale and magnitude of the x axis, there could be
|
||||||
|
significant approximation risk. This can be observed in subfigure (b),
|
||||||
|
namely by observing the difference in complexity between the model with
|
||||||
|
lowest validation error and the optimal model (the model with lowest
|
||||||
|
expected test error). The distance between the two lines indicated that the
|
||||||
|
currently chosen family of models (i.e. the currently chosen gray box model
|
||||||
|
function, and not the value of its hyperparameters) is not completely
|
||||||
|
adequate to model the process that generated the data to fit. High
|
||||||
|
approximation risk would cause even a correctly fitted model to have high
|
||||||
|
test error, since the inherent structure behind the chosen family of models
|
||||||
|
would be unable to capture the true behaviour of the data.
|
||||||
|
|
||||||
1. **Do you think that by further increasing the model complexity you
|
1. **Do you think that by further increasing the model complexity you
|
||||||
will be able to bring the training error to zero?**
|
will be able to bring the training error to zero?**
|
||||||
|
|
||||||
|
Yes, I think so. The model complexity could be increased up to the point
|
||||||
|
where the model would be so complex that it could actually remember all x-y
|
||||||
|
pairs of the training data, thus turning the model function effectively in a
|
||||||
|
one-to-one direct mapping between input and output data of the training set.
|
||||||
|
Then, the loss on the training dataset would be exactly 0.
|
||||||
|
This of course would mean that an absurdly high amount of noise would be
|
||||||
|
learned as well, thus making the model completely useless for prediction of
|
||||||
|
new datapoints.
|
||||||
|
|
||||||
1. **Do you think that by further increasing the model complexity you
|
1. **Do you think that by further increasing the model complexity you
|
||||||
will be able to bring the structural risk to zero?**
|
will be able to bring the structural risk to zero?**
|
||||||
|
|
||||||
|
No, I don't think so. In order to achieve zero structural risk we would need
|
||||||
|
to have an infinite training dataset covering the entire input parameter
|
||||||
|
domain. Increasing the model's complexity would actually make the structural
|
||||||
|
risk increase due to overfitting.
|
||||||
|
|
||||||
## Q2. Linear Regression
|
## Q2. Linear Regression
|
||||||
|
|
||||||
Comment and compare how the (a.) training error, (b.) test error and
|
Comment and compare how the (a.) training error, (b.) test error and
|
||||||
|
|
Binary file not shown.
Reference in a new issue