hw1: done Q1, Q2, Q3.1 (3.2 to rewrite)
This commit is contained in:
parent
30680d799f
commit
ccc8da0405
3 changed files with 43 additions and 8 deletions
|
@ -186,10 +186,46 @@ Comment and compare how the (a.) training error, (b.) test error and
|
|||
1. **$x_3$ is a normally distributed independent random variable
|
||||
$x_3 \sim \mathcal{N}(1, 2)$**
|
||||
|
||||
With this new variable, the coefficients $\theta_1$ and $\theta_2$ will not
|
||||
change significantly for the new optimal model. Training and test error
|
||||
behave similarly, although the training error may be higher in the first
|
||||
iteration of the learning procedure. All this variations are due to the fact
|
||||
that the new variable $x_3$ is completely independent from $x_1$ and $x_2$,
|
||||
and consequently from $y$. Therefore, the model will "understand" that $x_3$
|
||||
contains no information at all and thus set $\theta_3$ to 0. This effect
|
||||
would be achieved even more quickly by using Lasso instead of linear
|
||||
regression, since Lasso tends to set parameters to zero when their linear
|
||||
regression optimal value would be already close to 0.
|
||||
|
||||
1. **$x_3 = 2.5 \cdot x_1 + x_2$**
|
||||
|
||||
With this new variable, the coefficients would indeed change but test and
|
||||
training error would stay the same. Since $x_3$ is a linear combination of
|
||||
$x_1$ and $x_2$, then we can rewrite the model function in the following
|
||||
way:
|
||||
|
||||
$$f(x, \theta) = \theta_1 x_1 + \theta_2 x_2 + \theta_3 (2.5 x_1 + x_2) =
|
||||
(\theta_1 + 2.5 \theta_3) x_1 + (\theta_2 + \theta_3) x_2$$
|
||||
|
||||
This shows that even if the value of $\theta_1$ and $\theta_2$ would change
|
||||
if this term is introduced, the solution that would be found through linear
|
||||
regression would still be effectively equivalent w.r.t. effectiveness and
|
||||
MSE to the optimal model for the original family of models.
|
||||
|
||||
1. **$x_3 = x_1 \cdot x_2$**
|
||||
|
||||
If the underlying process generating the data would also depend on an $x_1
|
||||
\cdot x_2$ operation, then this additional input variable would change the
|
||||
parameters, improve the training error, and depending on if the impact of
|
||||
this quadratic term on the original data-generating process is small or big,
|
||||
it would slighty or considerably improve the test error.
|
||||
|
||||
Essentially, this parameter would had useful complexity to the model, which
|
||||
may be beneficial if the model is underfitted w.r.t. number of variables in
|
||||
the linear regression function, or otherwise detrimental if the model is
|
||||
correctly
|
||||
fitted or overfitted already.
|
||||
|
||||
## Q3. Classification
|
||||
|
||||
1. **Your boss asked you to solve the problem using a perceptron and now
|
||||
|
@ -212,9 +248,11 @@ Comment and compare how the (a.) training error, (b.) test error and
|
|||
2. **Would you expect to have better luck with a neural network with
|
||||
activation function $h(x) = - x \cdot e^{-2}$ for the hidden units?**
|
||||
|
||||
Boh
|
||||
The activation function is still linear and data is not linearly separable
|
||||
|
||||
3. **What are the main differences and similarities between the
|
||||
perceptron and the logistic regression neuron?**
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
Binary file not shown.
|
@ -74,17 +74,14 @@ X_val -= mean
|
|||
X_val /= std
|
||||
|
||||
network = Sequential()
|
||||
network.add(Dense(30, activation='relu'))
|
||||
network.add(Dense(20, activation='relu'))
|
||||
network.add(Dense(20, activation='relu'))
|
||||
network.add(Dense(20, activation='tanh'))
|
||||
network.add(Dense(10, activation='relu'))
|
||||
network.add(Dense(7, activation='sigmoid'))
|
||||
network.add(Dense(5, activation='relu'))
|
||||
network.add(Dense(3, activation='relu'))
|
||||
network.add(Dense(2, activation='relu'))
|
||||
network.add(Dense(1, activation='linear'))
|
||||
network.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
|
||||
|
||||
epochs = 100000
|
||||
epochs = 1000
|
||||
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=40)
|
||||
network.fit(X_train, y_train, epochs=epochs, verbose=1, batch_size=15,
|
||||
validation_data=(X_val, y_val), callbacks=[callback])
|
||||
|
@ -99,4 +96,4 @@ X_test = X_test[:, 1:3]
|
|||
X_test -= mean
|
||||
X_test /= std
|
||||
msq = mean_squared_error(network.predict(X_test), y_test)
|
||||
print(msq)
|
||||
#print(msq)
|
||||
|
|
Reference in a new issue