hw1: done Q1, Q2, Q3.1 (3.2 to rewrite)

2021-05-05 22:25:09 +02:00 · 2021-05-05 22:25:09 +02:00 · ccc8da0405
commit ccc8da0405
parent 30680d799f
3 changed files with 43 additions and 8 deletions
--- a/assignment_1/report_Maggioni_Claudio.md
+++ b/assignment_1/report_Maggioni_Claudio.md
@ -186,10 +186,46 @@ Comment and compare how the (a.) training error, (b.) test error and
 1.  **$x_3$ is a normally distributed independent random variable
    $x_3 \sim \mathcal{N}(1, 2)$**
    With this new variable, the coefficients $\theta_1$ and $\theta_2$ will not
    change significantly for the new optimal model. Training and test error
    behave similarly, although the training error may be higher in the first
    iteration of the learning procedure. All this variations are due to the fact
    that the new variable $x_3$ is completely independent from $x_1$ and $x_2$,
    and consequently from $y$. Therefore, the model will "understand" that $x_3$
    contains no information at all and thus set $\theta_3$ to 0. This effect
    would be achieved even more quickly by using Lasso instead of linear
    regression, since Lasso tends to set parameters to zero when their linear
    regression optimal value would be already close to 0.
 1.  **$x_3 = 2.5 \cdot x_1 + x_2$**
    With this new variable, the coefficients would indeed change but test and
    training error would stay the same. Since $x_3$ is a linear combination of
    $x_1$ and $x_2$, then we can rewrite the model function in the following
    way:
    $$f(x, \theta) = \theta_1 x_1 + \theta_2 x_2 + \theta_3 (2.5 x_1 + x_2) =
    (\theta_1 + 2.5 \theta_3) x_1 + (\theta_2 + \theta_3) x_2$$
    This shows that even if the value of $\theta_1$ and $\theta_2$ would change
    if this term is introduced, the solution that would be found through linear
    regression would still be effectively equivalent w.r.t. effectiveness and
    MSE to the optimal model for the original family of models.
 1.  **$x_3 = x_1 \cdot x_2$**
    If the underlying process generating the data would also depend on an $x_1
    \cdot x_2$ operation, then this additional input variable would change the
    parameters, improve the training error, and depending on if the impact of
    this quadratic term on the original data-generating process is small or big,
    it would slighty or considerably improve the test error.
    Essentially, this parameter would had useful complexity to the model, which
    may be beneficial if the model is underfitted w.r.t. number of variables in
    the linear regression function, or otherwise detrimental if the model is
    correctly
    fitted or overfitted already.
 ## Q3. Classification
 1.  **Your boss asked you to solve the problem using a perceptron and now
@ -212,9 +248,11 @@ Comment and compare how the (a.) training error, (b.) test error and
 2.  **Would you expect to have better luck with a neural network with
    activation function $h(x) = - x \cdot e^{-2}$ for the hidden units?**
-    Boh
+    The activation function is still linear and data is not linearly separable
 3.  **What are the main differences and similarities between the
    perceptron and the logistic regression neuron?**
--- a/assignment_1/report_Maggioni_Claudio.pdf
+++ b/assignment_1/report_Maggioni_Claudio.pdf
--- a/assignment_1/src/build_models.py
+++ b/assignment_1/src/build_models.py
@ -74,17 +74,14 @@ X_val -= mean
 X_val /= std
 network = Sequential()
-network.add(Dense(30, activation='relu'))
+network.add(Dense(20, activation='tanh'))
 network.add(Dense(20, activation='relu'))
 network.add(Dense(20, activation='relu'))
 network.add(Dense(10, activation='relu'))
 network.add(Dense(7, activation='sigmoid'))
 network.add(Dense(5, activation='relu'))
 network.add(Dense(3, activation='relu'))
 network.add(Dense(2, activation='relu'))
 network.add(Dense(1, activation='linear'))
 network.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
-epochs = 100000
+epochs = 1000
 callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=40)
 network.fit(X_train, y_train, epochs=epochs, verbose=1, batch_size=15,
        validation_data=(X_val, y_val), callbacks=[callback])
@ -99,4 +96,4 @@ X_test = X_test[:, 1:3]
 X_test -= mean
 X_test /= std
 msq = mean_squared_error(network.predict(X_test), y_test)
-print(msq)
+#print(msq)