hw1: done T1-3 (no report) and Q1
This commit is contained in:
parent
2bc8e0ae55
commit
7fb4401bbc
2 changed files with 63 additions and 1 deletions
|
@ -102,20 +102,82 @@ better.
|
|||
1. **Explain the curves' behavior in each of the three highlighted
|
||||
sections of the figures, namely (a), (b), and (c).**
|
||||
|
||||
I dont know
|
||||
In the highlighted section (a) the expected test error, the observed
|
||||
validation error and the observed training error are significantly high and
|
||||
close toghether. All the errors decrease as the model complexity increases.
|
||||
In (c), instead, we see a low training error but high validation and
|
||||
expected test error. The last two increase as the model complexity increases
|
||||
while the training error is in a plateau. Finally, in (b), we see the test
|
||||
and validation error curves reaching their respectively lowest points while
|
||||
the training error curve decreases as the model complexity increases, albeit
|
||||
in a less steep fashion as its behaviour in (a).
|
||||
|
||||
1. **Is any of the three section associated with the concepts of
|
||||
overfitting and underfitting? If yes, explain it.**
|
||||
|
||||
Section (a) is associated with underfitting and section (c) is associated
|
||||
with overfitting.
|
||||
|
||||
The behaviour in (a) is fairly easy to explain: since the model complexity
|
||||
is insufficient to capture the behaviour of the training data, the model is
|
||||
unable to provide accurate predictions and thus all MSEs we observe are
|
||||
rather high. It's worth to point out that the training error curve is quite
|
||||
close to the validation and the test error: this happens since the model is
|
||||
both unable to learn accurately the training data and unable to formulate
|
||||
accurate predictions on the validation and test data.
|
||||
|
||||
In (c) instead, the model complexity is higher than the intrinsic complexity
|
||||
of the data to model, and thus this extra complexity will learn the
|
||||
intrinsic noise of the data. This is of course not desirable, and the dire
|
||||
consequences of this phenomena can be seen in the significant difference
|
||||
between the observed MSE on training data and MSEs for validation and test
|
||||
data. Since the model learns the noise of the training data, the model will
|
||||
accurately predict noise fluctuations on the training data, but since this
|
||||
noise is completely meaningless information for fitting new datapoints, the
|
||||
model is unable to accurately predict for validation and test datapoints and
|
||||
thus the MSEs for those sets are high.
|
||||
|
||||
Finally in (b) we observe fairly appropriate fitting. Since the model
|
||||
complexity is at least on the same order of magnitude of the intrinsic
|
||||
complexity of the data the model is able to learn to accurately predict new
|
||||
data without learning noise. Thus, both the validation and the test MSE
|
||||
curves reach their lowest point in this region of the graph.
|
||||
|
||||
1. **Is there any evidence of high approximation risk? Why? If yes, in
|
||||
which of the below subfigures?**
|
||||
|
||||
Depending on the scale and magnitude of the x axis, there could be
|
||||
significant approximation risk. This can be observed in subfigure (b),
|
||||
namely by observing the difference in complexity between the model with
|
||||
lowest validation error and the optimal model (the model with lowest
|
||||
expected test error). The distance between the two lines indicated that the
|
||||
currently chosen family of models (i.e. the currently chosen gray box model
|
||||
function, and not the value of its hyperparameters) is not completely
|
||||
adequate to model the process that generated the data to fit. High
|
||||
approximation risk would cause even a correctly fitted model to have high
|
||||
test error, since the inherent structure behind the chosen family of models
|
||||
would be unable to capture the true behaviour of the data.
|
||||
|
||||
1. **Do you think that by further increasing the model complexity you
|
||||
will be able to bring the training error to zero?**
|
||||
|
||||
Yes, I think so. The model complexity could be increased up to the point
|
||||
where the model would be so complex that it could actually remember all x-y
|
||||
pairs of the training data, thus turning the model function effectively in a
|
||||
one-to-one direct mapping between input and output data of the training set.
|
||||
Then, the loss on the training dataset would be exactly 0.
|
||||
This of course would mean that an absurdly high amount of noise would be
|
||||
learned as well, thus making the model completely useless for prediction of
|
||||
new datapoints.
|
||||
|
||||
1. **Do you think that by further increasing the model complexity you
|
||||
will be able to bring the structural risk to zero?**
|
||||
|
||||
No, I don't think so. In order to achieve zero structural risk we would need
|
||||
to have an infinite training dataset covering the entire input parameter
|
||||
domain. Increasing the model's complexity would actually make the structural
|
||||
risk increase due to overfitting.
|
||||
|
||||
## Q2. Linear Regression
|
||||
|
||||
Comment and compare how the (a.) training error, (b.) test error and
|
||||
|
|
Binary file not shown.
Reference in a new issue