hw1: done Q1, Q2, Q3.1 (3.2 to rewrite)

2021-05-05 22:25:09 +02:00 · 2021-05-05 22:25:09 +02:00 · dd9ad6d776
parent 60087c225d
commit dd9ad6d776
3 changed files with 43 additions and 8 deletions
--- a/assignment_1/report_Maggioni_Claudio.md
+++ b/assignment_1/report_Maggioni_Claudio.md
@ -186,10 +186,46 @@ Comment and compare how the (a.) training error, (b.) test error and
 1.  **$x_3$ is a normally distributed independent random variable
    $x_3 \sim \mathcal{N}(1, 2)$**

+    With this new variable, the coefficients $\theta_1$ and $\theta_2$ will not
+    change significantly for the new optimal model. Training and test error
+    behave similarly, although the training error may be higher in the first
+    iteration of the learning procedure. All this variations are due to the fact
+    that the new variable $x_3$ is completely independent from $x_1$ and $x_2$,
+    and consequently from $y$. Therefore, the model will "understand" that $x_3$
+    contains no information at all and thus set $\theta_3$ to 0. This effect
+    would be achieved even more quickly by using Lasso instead of linear
+    regression, since Lasso tends to set parameters to zero when their linear
+    regression optimal value would be already close to 0.
+
 1.  **$x_3 = 2.5 \cdot x_1 + x_2$**

+    With this new variable, the coefficients would indeed change but test and
+    training error would stay the same. Since $x_3$ is a linear combination of
+    $x_1$ and $x_2$, then we can rewrite the model function in the following
+    way:
+
+    $$f(x, \theta) = \theta_1 x_1 + \theta_2 x_2 + \theta_3 (2.5 x_1 + x_2) =
+    (\theta_1 + 2.5 \theta_3) x_1 + (\theta_2 + \theta_3) x_2$$
+
+    This shows that even if the value of $\theta_1$ and $\theta_2$ would change
+    if this term is introduced, the solution that would be found through linear
+    regression would still be effectively equivalent w.r.t. effectiveness and
+    MSE to the optimal model for the original family of models.
+
 1.  **$x_3 = x_1 \cdot x_2$**

+    If the underlying process generating the data would also depend on an $x_1
+    \cdot x_2$ operation, then this additional input variable would change the
+    parameters, improve the training error, and depending on if the impact of
+    this quadratic term on the original data-generating process is small or big,
+    it would slighty or considerably improve the test error.
+
+    Essentially, this parameter would had useful complexity to the model, which
+    may be beneficial if the model is underfitted w.r.t. number of variables in
+    the linear regression function, or otherwise detrimental if the model is
+    correctly
+    fitted or overfitted already.
+
 ## Q3. Classification

 1.  **Your boss asked you to solve the problem using a perceptron and now
@ -212,9 +248,11 @@ Comment and compare how the (a.) training error, (b.) test error and
 2.  **Would you expect to have better luck with a neural network with
    activation function $h(x) = - x \cdot e^{-2}$ for the hidden units?**

-    Boh
+    The activation function is still linear and data is not linearly separable

 3.  **What are the main differences and similarities between the
    perceptron and the logistic regression neuron?**


+
+
--- a/assignment_1/report_Maggioni_Claudio.pdf
+++ b/assignment_1/report_Maggioni_Claudio.pdf
--- a/assignment_1/src/build_models.py
+++ b/assignment_1/src/build_models.py
@ -74,17 +74,14 @@ X_val -= mean
 X_val /= std

 network = Sequential()
-network.add(Dense(30, activation='relu'))
-network.add(Dense(20, activation='relu'))
-network.add(Dense(20, activation='relu'))
+network.add(Dense(20, activation='tanh'))
 network.add(Dense(10, activation='relu'))
+network.add(Dense(7, activation='sigmoid'))
 network.add(Dense(5, activation='relu'))
-network.add(Dense(3, activation='relu'))
-network.add(Dense(2, activation='relu'))
 network.add(Dense(1, activation='linear'))
 network.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])

-epochs = 100000
+epochs = 1000
 callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=40)
 network.fit(X_train, y_train, epochs=epochs, verbose=1, batch_size=15,
        validation_data=(X_val, y_val), callbacks=[callback])
@ -99,4 +96,4 @@ X_test = X_test[:, 1:3]
 X_test -= mean
 X_test /= std
 msq = mean_squared_error(network.predict(X_test), y_test)
-print(msq)
+#print(msq)