Week 6: Exercises#

Exercises – Long Day#

1: Quadratic \(\mathrm{ReLU}\)#

The quadratic \(\mathrm{ReLU}\) is an activation function \(\sigma: \mathbb{R} \to \mathbb{R}\) given by

\[\begin{equation*} \sigma(x)= (\operatorname{ReLU}(x))^2. \end{equation*}\]

Question a#

Argue that \(\sigma\) is continuously differentiable, and find an expression for \(\sigma'(x)\).

Question b#

Find all stationary points for \(\sigma\). Determine the function value at all stationary points. Does the function attain its global minimum or maximum values at any of the stationary points?

2: Extremum or not#

Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be given by

\[\begin{equation*} f(x,y)=x^2 y + y. \end{equation*}\]

Determine all local extrema of \(f\).

3: Stationary Points in a Neural Network with Quadratic ReLU#

In this exercise we consider a simple “shallow” neural network \(\pmb{\Phi}: \mathbb{R}^2 \to \mathbb{R}\) with one hidden layer. The network uses the quadratic \(\mathrm{ReLU}\) function as activation function in the hidden layer:

\[\begin{equation*} \sigma_1(z) = (\operatorname{ReLU}(z))^2. \end{equation*}\]

The network is defined by the following weight matrices and bias vectors:

\[\begin{equation*} A_1 = \begin{bmatrix} 1 & 1 \\ -1 & -1 \end{bmatrix}, \quad \pmb{b}_1 = \begin{bmatrix} -2 \\ 0 \end{bmatrix}, \quad A_2 = \begin{bmatrix} 1 & 1 \end{bmatrix}, \quad b_2 = 3. \end{equation*}\]

The activation function in the hidden layer is applied element by element, \(\pmb{\sigma}_1(\pmb{z}) = \begin{bmatrix} \sigma_1(z_1) \\ \sigma_1(z_2) \end{bmatrix}\), while the output layer is linear (meaning, \(\sigma_2(z)=z\) is the identity function). The network function is given by:

\[\begin{equation*} \pmb{\Phi}(\pmb{x}) = A_2 \pmb{\sigma}_1(A_1 \pmb{x} + \pmb{b}_1) + b_2. \end{equation*}\]

Question a#

State an explicit functional expression for \(\pmb{\Phi}(x_1, x_2)\) that depends on \(x_1\), \(x_2\), and \(\sigma_1\).

Question b#

Find the gradient \(\nabla \pmb{\Phi}(x_1, x_2)\) in the regions of \(\mathbb{R}^2\) where, respectively, the first neuron is, the second neuron is, or both neurons are “active” (meaning, where the argument of \(\sigma_1\) is positive).

Question c#

Show that the set of stationary points of \(\pmb{\Phi}\) constitutes a “strip” (a set of infinitely many points) in the \((x_1, x_2)\) plane. Find the inequalities that describe this strip.

Question d#

Determine whether the stationary points are local minima, local maxima, or saddle points. You may use a plot of the function to support your conclusion.

4: Global Maximum and Global Minimum#

Let \(f: A \to \mathbb{R}\) be given by:

\[\begin{equation*} f(x,y)=xy(2-x-y)+1, \end{equation*}\]

where \(A \subset \mathbb{R}^2\) denotes the region in the \((x,y)\) plane where \(x\in\left[ 0,1\right]\), and \(y\in\left[ 0,1\right]\).

Question a#

Find all stationary points of \(f\) in the interior of \(A\). You may use SymPy to find the stationary points of \(f\) on \(\mathbb{R}^ 2\) (there are four), as the two equations \(\frac{\partial f}{\partial x}(x,y)=0\) and \(\frac{\partial f}{\partial y}(x,y)=0\) are not easy to solve - you may also try this out by hand, but don’t hold back from asking a TA for a hint.

Question b#

Determine the global maximum and minimum values of \(f\) as well as the points at which these values are attained.

Question c#

This exercise concerns a differentiable function of two variables defined on \([0,1]^2\). How would you attack the exercise, had it been about a differentiable function of five variables defined on \([0,1]^5\)? Discuss a possible approach with a fellow student. You may bring an AI chatbot into your discussion.

Question d#

Determine the range of \(f\).

Question e#

Plot the graph of \(f\) along with points to show where on the graph the maximum and minimum values are attained. Check that your results look reasonable.

5: A Return to Theme 1#

In Theme 1: The Gradient Method we considered three functions of the form \(f_i: \mathbb{R}^2 \to \mathbb{R}\). All of the functions had precisely one minimum but no maximum since they grew towards infinity. You may use this piece of information without proof.

We here use the functions (with their standard values) given in Python by:

# Defining variables and parameters
x1, x2 = symbols('x1 x2', real=True)
a, lambda1 = symbols('a lambda1',  positive=True)
def f1(x1, x2, a = S(1/2)):
    return a * x1**2 + 1 * x2**2

def f2(x1, x2, lambda1 = 0.5):
    Q = 1/sqrt(2) * Matrix([[1,1],[1,-1]])
    A = Q.T * Matrix([[lambda1,0],[0,1]]) * Q
    b = Matrix([-2,4])
    x = Matrix([x1,x2])
    q = x.T * A * x + x.T * b
    return q[0] 

def f3(x1, x2):
    return (1 - x1)**2 + 100*(x2 - x1**2)**2

In the Theme Exercise we used the gradient method to search for the minimum point with the minimum value. This is a fine method, in particular when the function has many (maybe infinitely many) points at which it is not differentiable. For “nicer” functions, though, such as smooth functions (meaning functions that are infinitely-many-times differentiable), it is much easier to simply find the points where their gradient is zero. The three functions we considered here are smooth.

Question a#

Find all stationary points and the minimum values for each of the three functions. Although the functions are given in Python code above, you should do it by hand - it won’t take much longer.

Question b#

State the image set of each function.

6: Global Maximum and Global Minimum once again#

Consider the function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) given by

\[\begin{equation*} f(x,y)=x^2-3y^2-3xy \end{equation*}\]

as well as the set \(A=\lbrace(x,y) \in \mathbb{R}^2 \,| \, x^2+y^2\leq 1\rbrace\).

Justify that \(f\) has both a global maximum and a global minimum on \(A\), and determine these values as well as the points where they are attained.

Use Sympy for the boundary investigation of \(f\vert_{\partial A}\), but do the rest by hand.

7: Stationary Points for Quadratic Forms#

Let \(q : \mathbb{R}^n \to \mathbb{R}\) be a quadratic form. In other words, \(q\) has the functional expression

\[\begin{equation*} q(\pmb{x}) = \pmb{x}^T A \pmb{x} + \pmb{x}^T \pmb{b} + c, \end{equation*}\]

where \(A\) is a (non-zero) \(n \times n\) matrix, \(\pmb{b} \in \mathbb{R}^n\) is a column vector, and \(c \in \mathbb{R}\).

It holds that \(q\) is a differentiable function with \(\nabla q(\pmb{x}) = (A + A^T) \pmb{x} + \pmb{b}\), according to this example. You will not be showing that.

Question a#

Write out a system of equations whose solution describes the stationary points. Argue that \(q\) can have either zero, one, or infinitely many stationary points.

Question b#

Assume that \((A + A^T)\) is invertible. Argue that \(q\) has precisely one stationary point. Find this stationary point (that is, find a formula or expression for the stationary point).

Question c#

Assume that \(A\) is symmetric. Argue that \(q\) has precisely one stationary point if and only if \(\lambda=0\) is not an eigenvalue of \(A\).

Question d (Optional)#

Derive the formula that we started out by taking for granted: \(\nabla q(\pmb{x}) = (A + A^T) \pmb{x} + \pmb{b}\)

8: A Challenge in Linear Algebra#

Let \(A\) be an \(n \times n\) matrix. Is it true that the symmetric matrix \((A + A^T)\) is invertible, if \(A\) is invertible? Prove this statement, or disprove it with a counter-example.


Execises – Short Day#

1: Usage of Hessian Matrix#

Consider the function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) given by

\[\begin{equation*} f(x,y)=x^2+4y^2-2x-4y. \end{equation*}\]

Question a#

Argue that the function \(f\) has exactly one extremum. Determine this extremum point and the corresponding extremum value.

Question b#

What is the difference between an extremum and a strict extremum, also sometimes called a proper extremum? Is the extremum we found a strict extremum?

2: Local Extrema and Approximating Second-Degree Polynomial#

Given function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) with the expression

\[\begin{equation*} f(x,y)=x^3+2y^3+3xy^2-3x^2. \end{equation*}\]

Question a#

Show that the points \(A = (2, 0)\), \(B = (1, -1)\), and \(C = (0, 0)\) are stationary points of \(f\), and determine for each of them whether there is a local maximum or a local minimum (or neither) there. If so, indicate the local maximum/minimum value attained at the point, and determine if it is strict.

Question b#

Show that the approximating quadratic polynomial for \(f\) with the expansion point \(A\) can be written as an equation in the unknowns \(x\), \(y\), and \(z\) in the following form:

\[\begin{equation*} z-c_3=\frac 12\lambda_1(x-c_1)^2+\frac 12\lambda_2(y-c_2)^2. \end{equation*}\]

What surface does this equation describe, and what do the constants represent?

Question c#

Plot the graph of \(f\) along with the graph of the approximating second-degree polynomials for \(f\) with the expansion points \(A\), \(B\), and \(C\). Discuss whether the eigenvalues of the Hessian matrices at the three points can determine the type of quadric surface described by the second-degree polynomials.

3: A Return to Theme Exercise 1 once more#

We consider the quadratic form \(f_2: \mathbb{R}^2 \to \mathbb{R}\) from Theme 1: The Gradient Method. It is given by \(q: \mathbb{R}^2 \to \mathbb{R}\)

\[\begin{equation*} q(\pmb{x}) = \pmb{x}^T A \pmb{x} + \pmb{b}^T \pmb{x}, \end{equation*}\]

where \(A\) is a \(2 \times 2\) matrix that depends on \(\lambda_1 \in \mathbb{R}\),

\[\begin{equation*} A = Q^T \Lambda Q, \quad Q = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}, \quad \Lambda = \begin{bmatrix} \lambda_1 & 0 \\ 0 & 1 \end{bmatrix}, \end{equation*}\]

and where \(\pmb{b} = - 2 A [1,2]^T\). We will change the following from the Theme exercise: 1) \(\lambda_1\) may be zero or negative, and 2) we use a new definition of \(\pmb{b}\).

Question a#

Find the eigenvalues of \(A\).

Question b#

Find all stationary points of \(q\) when \(\lambda_1 \neq 0\).

Question c#

How is \(A\) and the Hessian matrix \(\pmb{H}_f\) related? Find the result in the book if you cannot remember. Describe the stationary point for each of the three cases \(\lambda_1 > 0\), \(\lambda_1 = 0\) and \(\lambda_1 < 0\).

Question d#

How is \(q\) and the approximating second-degree polynomial (with an arbitrary expansion point) related? Plot \(q\) for each of the three cases \(\lambda_1 > 0\), \(\lambda_1 = 0\) and \(\lambda_1 < 0\). Which normal forms are we dealing with (see https://en.wikipedia.org/wiki/Quadric#Euclidean_space)?

4: Global Extrema of Function of Three Variables#

We consider the function \(f:\mathbb{R}^3\rightarrow \mathbb{R}\) given by

\[\begin{equation*} f(x,y,z)=\sin(x^2+y^2+z^2-1)-x^2+y^2-z^2 \end{equation*}\]

as well as the solid unit sphere

\[\begin{equation*} \mathcal{K}=\left\{(x,y,z)\in \mathbb{R}^3 \mid x^2+y^2+z^2\leq 1\right\}. \end{equation*}\]

Question a#

Show that \(f\) in the interior of \(\mathcal{K}\) only has one stationary point, that being \(O=(0,0,0)\), and investigate whether \(f\) has an extremum at \(O\).

Question b#

Determine the global maximum value and the global minimum value of \(f\) on \(\mathcal{K}\) as well as the points where these values are attained.

Question c#

Determine the image set of \(f\) on \(\mathcal{K}\).

5: Where are the Global Extrema?#

Given function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) with the expression

\[\begin{equation*} f(x,y)=\exp(x^2+y^2)-4xy. \end{equation*}\]

Keep in mind that \(\exp(x^2+y^2) = \operatorname{e}^{x^2+y^2}\).

Question a#

Find all stationary points of \(f\).

Question b#

Find all local extrema.

Question c#

Determine whether the function \(f\) has a global maximum or minimum. If so, state the values of them.

Question d#

State the range of the function.