diff --git a/assignment_1/report_Maggioni_Claudio.md b/assignment_1/report_Maggioni_Claudio.md index 34226a2..2bd754b 100644 --- a/assignment_1/report_Maggioni_Claudio.md +++ b/assignment_1/report_Maggioni_Claudio.md @@ -186,10 +186,46 @@ Comment and compare how the (a.) training error, (b.) test error and 1. **$x_3$ is a normally distributed independent random variable $x_3 \sim \mathcal{N}(1, 2)$** + With this new variable, the coefficients $\theta_1$ and $\theta_2$ will not + change significantly for the new optimal model. Training and test error + behave similarly, although the training error may be higher in the first + iteration of the learning procedure. All this variations are due to the fact + that the new variable $x_3$ is completely independent from $x_1$ and $x_2$, + and consequently from $y$. Therefore, the model will "understand" that $x_3$ + contains no information at all and thus set $\theta_3$ to 0. This effect + would be achieved even more quickly by using Lasso instead of linear + regression, since Lasso tends to set parameters to zero when their linear + regression optimal value would be already close to 0. + 1. **$x_3 = 2.5 \cdot x_1 + x_2$** + With this new variable, the coefficients would indeed change but test and + training error would stay the same. Since $x_3$ is a linear combination of + $x_1$ and $x_2$, then we can rewrite the model function in the following + way: + + $$f(x, \theta) = \theta_1 x_1 + \theta_2 x_2 + \theta_3 (2.5 x_1 + x_2) = + (\theta_1 + 2.5 \theta_3) x_1 + (\theta_2 + \theta_3) x_2$$ + + This shows that even if the value of $\theta_1$ and $\theta_2$ would change + if this term is introduced, the solution that would be found through linear + regression would still be effectively equivalent w.r.t. effectiveness and + MSE to the optimal model for the original family of models. + 1. **$x_3 = x_1 \cdot x_2$** + If the underlying process generating the data would also depend on an $x_1 + \cdot x_2$ operation, then this additional input variable would change the + parameters, improve the training error, and depending on if the impact of + this quadratic term on the original data-generating process is small or big, + it would slighty or considerably improve the test error. + + Essentially, this parameter would had useful complexity to the model, which + may be beneficial if the model is underfitted w.r.t. number of variables in + the linear regression function, or otherwise detrimental if the model is + correctly + fitted or overfitted already. + ## Q3. Classification 1. **Your boss asked you to solve the problem using a perceptron and now @@ -212,9 +248,11 @@ Comment and compare how the (a.) training error, (b.) test error and 2. **Would you expect to have better luck with a neural network with activation function $h(x) = - x \cdot e^{-2}$ for the hidden units?** - Boh + The activation function is still linear and data is not linearly separable 3. **What are the main differences and similarities between the perceptron and the logistic regression neuron?** + + diff --git a/assignment_1/report_Maggioni_Claudio.pdf b/assignment_1/report_Maggioni_Claudio.pdf index b03c1c0..42c7f58 100644 Binary files a/assignment_1/report_Maggioni_Claudio.pdf and b/assignment_1/report_Maggioni_Claudio.pdf differ diff --git a/assignment_1/src/build_models.py b/assignment_1/src/build_models.py index 53c94d1..8625def 100644 --- a/assignment_1/src/build_models.py +++ b/assignment_1/src/build_models.py @@ -74,17 +74,14 @@ X_val -= mean X_val /= std network = Sequential() -network.add(Dense(30, activation='relu')) -network.add(Dense(20, activation='relu')) -network.add(Dense(20, activation='relu')) +network.add(Dense(20, activation='tanh')) network.add(Dense(10, activation='relu')) +network.add(Dense(7, activation='sigmoid')) network.add(Dense(5, activation='relu')) -network.add(Dense(3, activation='relu')) -network.add(Dense(2, activation='relu')) network.add(Dense(1, activation='linear')) network.compile(optimizer='rmsprop', loss='mse', metrics=['mse']) -epochs = 100000 +epochs = 1000 callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=40) network.fit(X_train, y_train, epochs=epochs, verbose=1, batch_size=15, validation_data=(X_val, y_val), callbacks=[callback]) @@ -99,4 +96,4 @@ X_test = X_test[:, 1:3] X_test -= mean X_test /= std msq = mean_squared_error(network.predict(X_test), y_test) -print(msq) +#print(msq)