ml/assignment_1/report_Maggioni_Claudio.md

<!-- vim: set ts=2 sw=2 et tw=80: -->

---
header-includes:
- \usepackage[utf8]{inputenc}
- \usepackage[T1]{fontenc}
- \usepackage[sc]{mathpazo}
- \usepackage{caption, subcaption}
- \usepackage{hyperref}
- \usepackage[english]{babel}
- \usepackage{amsmath, amsfonts}
- \usepackage{listings}
- \usepackage{graphicx}
- \graphicspath{{Figures/}{./}}
- \usepackage{float}
- \usepackage{geometry}
- \geometry{paper=a4paper,top=2.5cm,bottom=3cm,left=3cm,right=3cm}
- \usepackage{sectsty}
- \sectionfont{\vspace{6pt}\centering\normalfont\scshape}
- \subsectionfont{\normalfont\bfseries}
- \subsubsectionfont{\normalfont\itshape}
- \paragraphfont{\normalfont\scshape}
- \usepackage{scrlayer-scrpage}
- \ofoot*{\pagemark}
- \ifoot*{Maggioni Claudio}
- \cfoot*{}
---
\title{
	\normalfont\normalsize
	\textsc{Machine Learning\\
	Universit\`a della Svizzera italiana}\\
	\vspace{25pt}
	\rule{\linewidth}{0.5pt}\\
	\vspace{20pt}
	{\huge Assignment 1}\\
	\vspace{12pt}
	\rule{\linewidth}{1pt}\\
	\vspace{12pt}
}
\author{\LARGE Maggioni Claudio}
\date{\normalsize\today}
\maketitle

The assignment is split into two parts: you are asked to solve a
regression problem, and answer some questions. You can use all the
books, material, and help you need. Bear in mind that the questions you
are asked are similar to those you may find in the final exam, and are
related to very important and fundamental machine learning concepts. As
such, sooner or later you will need to learn them to pass the course. We
will give you some feedback afterwards.\
!! Note that this file is just meant as a template for the report, in
which we reported **part of** the assignment text for convenience. You
must always refer to the text in the README.md file as the assignment
requirements.

# Regression problem

This section should contain a detailed description of how you solved the
assignment, including all required statistical analyses of the models'
performance and a comparison between the linear regression and the model
of your choice. Limit the assignment to 2500 words (formulas, tables,
figures, etc., do not count as words) and do not include any code in the
report.

## Task 1

Use the family of models
$f(\mathbf{x}, \boldsymbol{\theta}) = \theta_0 + \theta_1 \cdot x_1 +
\theta_2 \cdot x_2 + \theta_3 \cdot x_1 \cdot x_2 + \theta_4 \cdot
\sin(x_1)$
to fit the data. Write in the report the formula of the model
substituting parameters $\theta_0, \ldots, \theta_4$ with the estimates
you've found:
$$f(\mathbf{x}, \boldsymbol{\theta}) = \_ + \_ \cdot x_1 + \_
\cdot x_2 + \_ \cdot x_1 \cdot x_2 + \_ \cdot \sin(x_1)$$
Evaluate the test performance of your model using the mean squared error
as performance measure.

## Task 2

Consider any family of non-linear models of your choice to address the
above regression problem. Evaluate the test performance of your model
using the mean squared error as performance measure. Compare your model
with the linear regression of Task 1. Which one is **statistically**
better?

## Task 3 (Bonus)

In the [**Github repository of the
course**](https://github.com/marshka/ml-20-21), you will find a trained
Scikit-learn model that we built using the same dataset you are given.
This baseline model is able to achieve a MSE of **0.0194**, when
evaluated on the test set. You will get extra points if the test
performance of your model is better (i.e., the MSE is lower) than ours.
Of course, you also have to tell us why you think that your model is
better.

# Questions

## Q1. Training versus Validation

1.  **Explain the curves' behavior in each of the three highlighted
    sections of the figures, namely (a), (b), and (c).**

    I dont know

1.  **Is any of the three section associated with the concepts of
    overfitting and underfitting? If yes, explain it.**

1.  **Is there any evidence of high approximation risk? Why? If yes, in
    which of the below subfigures?**

1.  **Do you think that by further increasing the model complexity you
    will be able to bring the training error to zero?**

1.  **Do you think that by further increasing the model complexity you
    will be able to bring the structural risk to zero?**

## Q2. Linear Regression

Comment and compare how the (a.) training error, (b.) test error and
(c.) coefficients would change in the following cases:

1.  **$x_3$ is a normally distributed independent random variable
    $x_3 \sim \mathcal{N}(1, 2)$**

1.  **$x_3 = 2.5 \cdot x_1 + x_2$**

1.  **$x_3 = x_1 \cdot x_2$**

## Q3. Classification

1.  **Your boss asked you to solve the problem using a perceptron and now
    he's upset because you are getting poor results. How would you
    justify the poor performance of your perceptron classifier to your
    boss?**

1.  **Would you expect to have better luck with a neural network with
    activation function $h(x) = - x \cdot e^{-2}$ for the hidden units?**

1.  **What are the main differences and similarities between the
    perceptron and the logistic regression neuron?**