hw1: done Q1, Q2, Q3.1 (3.2 to rewrite)

This commit is contained in:
Claudio Maggioni 2021-05-05 22:25:09 +02:00
parent 60087c225d
commit dd9ad6d776
3 changed files with 43 additions and 8 deletions

View File

@ -186,10 +186,46 @@ Comment and compare how the (a.) training error, (b.) test error and
1. **$x_3$ is a normally distributed independent random variable
$x_3 \sim \mathcal{N}(1, 2)$**
With this new variable, the coefficients $\theta_1$ and $\theta_2$ will not
change significantly for the new optimal model. Training and test error
behave similarly, although the training error may be higher in the first
iteration of the learning procedure. All this variations are due to the fact
that the new variable $x_3$ is completely independent from $x_1$ and $x_2$,
and consequently from $y$. Therefore, the model will "understand" that $x_3$
contains no information at all and thus set $\theta_3$ to 0. This effect
would be achieved even more quickly by using Lasso instead of linear
regression, since Lasso tends to set parameters to zero when their linear
regression optimal value would be already close to 0.
1. **$x_3 = 2.5 \cdot x_1 + x_2$**
With this new variable, the coefficients would indeed change but test and
training error would stay the same. Since $x_3$ is a linear combination of
$x_1$ and $x_2$, then we can rewrite the model function in the following
way:
$$f(x, \theta) = \theta_1 x_1 + \theta_2 x_2 + \theta_3 (2.5 x_1 + x_2) =
(\theta_1 + 2.5 \theta_3) x_1 + (\theta_2 + \theta_3) x_2$$
This shows that even if the value of $\theta_1$ and $\theta_2$ would change
if this term is introduced, the solution that would be found through linear
regression would still be effectively equivalent w.r.t. effectiveness and
MSE to the optimal model for the original family of models.
1. **$x_3 = x_1 \cdot x_2$**
If the underlying process generating the data would also depend on an $x_1
\cdot x_2$ operation, then this additional input variable would change the
parameters, improve the training error, and depending on if the impact of
this quadratic term on the original data-generating process is small or big,
it would slighty or considerably improve the test error.
Essentially, this parameter would had useful complexity to the model, which
may be beneficial if the model is underfitted w.r.t. number of variables in
the linear regression function, or otherwise detrimental if the model is
correctly
fitted or overfitted already.
## Q3. Classification
1. **Your boss asked you to solve the problem using a perceptron and now
@ -212,9 +248,11 @@ Comment and compare how the (a.) training error, (b.) test error and
2. **Would you expect to have better luck with a neural network with
activation function $h(x) = - x \cdot e^{-2}$ for the hidden units?**
Boh
The activation function is still linear and data is not linearly separable
3. **What are the main differences and similarities between the
perceptron and the logistic regression neuron?**

View File

@ -74,17 +74,14 @@ X_val -= mean
X_val /= std
network = Sequential()
network.add(Dense(30, activation='relu'))
network.add(Dense(20, activation='relu'))
network.add(Dense(20, activation='relu'))
network.add(Dense(20, activation='tanh'))
network.add(Dense(10, activation='relu'))
network.add(Dense(7, activation='sigmoid'))
network.add(Dense(5, activation='relu'))
network.add(Dense(3, activation='relu'))
network.add(Dense(2, activation='relu'))
network.add(Dense(1, activation='linear'))
network.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
epochs = 100000
epochs = 1000
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=40)
network.fit(X_train, y_train, epochs=epochs, verbose=1, batch_size=15,
validation_data=(X_val, y_val), callbacks=[callback])
@ -99,4 +96,4 @@ X_test = X_test[:, 1:3]
X_test -= mean
X_test /= std
msq = mean_squared_error(network.predict(X_test), y_test)
print(msq)
#print(msq)