hw1: done Q1, Q2, Q3.1 (3.2 to rewrite)
This commit is contained in:
parent
30680d799f
commit
ccc8da0405
3 changed files with 43 additions and 8 deletions
|
@ -186,10 +186,46 @@ Comment and compare how the (a.) training error, (b.) test error and
|
||||||
1. **$x_3$ is a normally distributed independent random variable
|
1. **$x_3$ is a normally distributed independent random variable
|
||||||
$x_3 \sim \mathcal{N}(1, 2)$**
|
$x_3 \sim \mathcal{N}(1, 2)$**
|
||||||
|
|
||||||
|
With this new variable, the coefficients $\theta_1$ and $\theta_2$ will not
|
||||||
|
change significantly for the new optimal model. Training and test error
|
||||||
|
behave similarly, although the training error may be higher in the first
|
||||||
|
iteration of the learning procedure. All this variations are due to the fact
|
||||||
|
that the new variable $x_3$ is completely independent from $x_1$ and $x_2$,
|
||||||
|
and consequently from $y$. Therefore, the model will "understand" that $x_3$
|
||||||
|
contains no information at all and thus set $\theta_3$ to 0. This effect
|
||||||
|
would be achieved even more quickly by using Lasso instead of linear
|
||||||
|
regression, since Lasso tends to set parameters to zero when their linear
|
||||||
|
regression optimal value would be already close to 0.
|
||||||
|
|
||||||
1. **$x_3 = 2.5 \cdot x_1 + x_2$**
|
1. **$x_3 = 2.5 \cdot x_1 + x_2$**
|
||||||
|
|
||||||
|
With this new variable, the coefficients would indeed change but test and
|
||||||
|
training error would stay the same. Since $x_3$ is a linear combination of
|
||||||
|
$x_1$ and $x_2$, then we can rewrite the model function in the following
|
||||||
|
way:
|
||||||
|
|
||||||
|
$$f(x, \theta) = \theta_1 x_1 + \theta_2 x_2 + \theta_3 (2.5 x_1 + x_2) =
|
||||||
|
(\theta_1 + 2.5 \theta_3) x_1 + (\theta_2 + \theta_3) x_2$$
|
||||||
|
|
||||||
|
This shows that even if the value of $\theta_1$ and $\theta_2$ would change
|
||||||
|
if this term is introduced, the solution that would be found through linear
|
||||||
|
regression would still be effectively equivalent w.r.t. effectiveness and
|
||||||
|
MSE to the optimal model for the original family of models.
|
||||||
|
|
||||||
1. **$x_3 = x_1 \cdot x_2$**
|
1. **$x_3 = x_1 \cdot x_2$**
|
||||||
|
|
||||||
|
If the underlying process generating the data would also depend on an $x_1
|
||||||
|
\cdot x_2$ operation, then this additional input variable would change the
|
||||||
|
parameters, improve the training error, and depending on if the impact of
|
||||||
|
this quadratic term on the original data-generating process is small or big,
|
||||||
|
it would slighty or considerably improve the test error.
|
||||||
|
|
||||||
|
Essentially, this parameter would had useful complexity to the model, which
|
||||||
|
may be beneficial if the model is underfitted w.r.t. number of variables in
|
||||||
|
the linear regression function, or otherwise detrimental if the model is
|
||||||
|
correctly
|
||||||
|
fitted or overfitted already.
|
||||||
|
|
||||||
## Q3. Classification
|
## Q3. Classification
|
||||||
|
|
||||||
1. **Your boss asked you to solve the problem using a perceptron and now
|
1. **Your boss asked you to solve the problem using a perceptron and now
|
||||||
|
@ -212,9 +248,11 @@ Comment and compare how the (a.) training error, (b.) test error and
|
||||||
2. **Would you expect to have better luck with a neural network with
|
2. **Would you expect to have better luck with a neural network with
|
||||||
activation function $h(x) = - x \cdot e^{-2}$ for the hidden units?**
|
activation function $h(x) = - x \cdot e^{-2}$ for the hidden units?**
|
||||||
|
|
||||||
Boh
|
The activation function is still linear and data is not linearly separable
|
||||||
|
|
||||||
3. **What are the main differences and similarities between the
|
3. **What are the main differences and similarities between the
|
||||||
perceptron and the logistic regression neuron?**
|
perceptron and the logistic regression neuron?**
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
Binary file not shown.
|
@ -74,17 +74,14 @@ X_val -= mean
|
||||||
X_val /= std
|
X_val /= std
|
||||||
|
|
||||||
network = Sequential()
|
network = Sequential()
|
||||||
network.add(Dense(30, activation='relu'))
|
network.add(Dense(20, activation='tanh'))
|
||||||
network.add(Dense(20, activation='relu'))
|
|
||||||
network.add(Dense(20, activation='relu'))
|
|
||||||
network.add(Dense(10, activation='relu'))
|
network.add(Dense(10, activation='relu'))
|
||||||
|
network.add(Dense(7, activation='sigmoid'))
|
||||||
network.add(Dense(5, activation='relu'))
|
network.add(Dense(5, activation='relu'))
|
||||||
network.add(Dense(3, activation='relu'))
|
|
||||||
network.add(Dense(2, activation='relu'))
|
|
||||||
network.add(Dense(1, activation='linear'))
|
network.add(Dense(1, activation='linear'))
|
||||||
network.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
|
network.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
|
||||||
|
|
||||||
epochs = 100000
|
epochs = 1000
|
||||||
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=40)
|
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=40)
|
||||||
network.fit(X_train, y_train, epochs=epochs, verbose=1, batch_size=15,
|
network.fit(X_train, y_train, epochs=epochs, verbose=1, batch_size=15,
|
||||||
validation_data=(X_val, y_val), callbacks=[callback])
|
validation_data=(X_val, y_val), callbacks=[callback])
|
||||||
|
@ -99,4 +96,4 @@ X_test = X_test[:, 1:3]
|
||||||
X_test -= mean
|
X_test -= mean
|
||||||
X_test /= std
|
X_test /= std
|
||||||
msq = mean_squared_error(network.predict(X_test), y_test)
|
msq = mean_squared_error(network.predict(X_test), y_test)
|
||||||
print(msq)
|
#print(msq)
|
||||||
|
|
Reference in a new issue