Week 6: Exercises#
Exercises – Long Day#
1: Quadratic \(\mathrm{ReLU}\)#
The quadratic \(\mathrm{ReLU}\) is an activation function \(\sigma: \mathbb{R} \to \mathbb{R}\) given by
Question a#
Argue that \(\sigma\) is continuously differentiable, and find an expression for \(\sigma'(x)\).
Question b#
Find all stationary points for \(\sigma\). Determine the function value at all stationary points. Does the function attain its global minimum or maximum values at any of the stationary points?
2: Extremum or not#
Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be given by
Determine all local extrema of \(f\).
Hint
As the function is defined on all of \(\mathbb{R}^2\), which has no boundary points, and as the function is differentiable everywhere, extremum points can only be found at stationary points.
Answer
The function has no stationary points, and hence no extrema.
3: Stationary Points in a Neural Network with Quadratic ReLU#
In this exercise we consider a simple “shallow” neural network \(\pmb{\Phi}: \mathbb{R}^2 \to \mathbb{R}\) with one hidden layer. The network uses the quadratic \(\mathrm{ReLU}\) function as activation function in the hidden layer:
The network is defined by the following weight matrices and bias vectors:
The activation function in the hidden layer is applied element by element, \(\pmb{\sigma}_1(\pmb{z}) = \begin{bmatrix} \sigma_1(z_1) \\ \sigma_1(z_2) \end{bmatrix}\), while the output layer is linear (meaning, \(\sigma_2(z)=z\) is the identity function). The network function is given by:
Question a#
State an explicit functional expression for \(\pmb{\Phi}(x_1, x_2)\) that depends on \(x_1\), \(x_2\), and \(\sigma_1\).
Hint
Begin by calculating the affine part \(A_1 \pmb{x} + \pmb{b}_1 = \begin{bmatrix} 1 & 1 \\ -1 & -1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + \begin{bmatrix} -2 \\ 0 \end{bmatrix}\). Then, apply the activation function on each row and multiply the result by the row vector \(A_2\).
Question b#
Find the gradient \(\nabla \pmb{\Phi}(x_1, x_2)\) in the regions of \(\mathbb{R}^2\) where, respectively, the first neuron is, the second neuron is, or both neurons are “active” (meaning, where the argument of \(\sigma_1\) is positive).
Hint
Remember from 1: Quadratic \mathrm{ReLU} that \(\sigma_1'(z) = 2 \operatorname{ReLU}(z)\). Use the chain rule on the individual terms in your expression from Question a.
Question c#
Show that the set of stationary points of \(\pmb{\Phi}\) constitutes a “strip” (a set of infinitely many points) in the \((x_1, x_2)\) plane. Find the inequalities that describe this strip.
Hint
The “strip” contains the entire line \(x_2 = -x_1\), \(x_1 \in \mathbb{R}\), which is easily confirmed by insertion of \(x_2 = -x_1\) in the equations for the stationary points.
Question d#
Determine whether the stationary points are local minima, local maxima, or saddle points. You may use a plot of the function to support your conclusion.
Answer
The network can be written as \(\pmb{\Phi}(x_1, x_2) = \sigma_1(x_1 + x_2 - 2) + \sigma_1(-x_1 - x_2)\). The stationary points are found in the region \(0 \le x_1 + x_2 \le 2\), where both terms are zero. As \(\sigma_1(z) \ge 0\), all of these points are global minima with the value \(3\). The function has no maximum value, since \(\pmb{\Phi}(t, t) \to \infty\) for \(t \to \infty\).
4: Global Maximum and Global Minimum#
Let \(f: A \to \mathbb{R}\) be given by:
where \(A \subset \mathbb{R}^2\) denotes the region in the \((x,y)\) plane where \(x\in\left[ 0,1\right]\), and \(y\in\left[ 0,1\right]\).
Question a#
Find all stationary points of \(f\) in the interior of \(A\). You may use SymPy to find the stationary points of \(f\) on \(\mathbb{R}^ 2\) (there are four), as the two equations \(\frac{\partial f}{\partial x}(x,y)=0\) and \(\frac{\partial f}{\partial y}(x,y)=0\) are not easy to solve - you may also try this out by hand, but don’t hold back from asking a TA for a hint.
Answer
In the interior of \(A\) we find one stationary point, that being \((\frac{2}{3},\frac{2}{3})\).
Question b#
Determine the global maximum and minimum values of \(f\) as well as the points at which these values are attained.
Hint
To start with, is it possible to know whether \(f\) has a global maximum and minimum?
Hint
Yes, since the function is continuous and \(A\) is bounded and closed we can be sure that \(f\) has both a global maximum and minimum. Find this theorem in the textbook.
Hint
The global extrema may be found:
on the boundary of \(A\) (but within \(A\))
at exceptional points where the function is not differentiable
at stationary points in the interior of the function
Hint
The boundary investigation is most easily done by considering the restriction of \(f\) to the relevant parts of the line segments \((x,0)\), \((0,y)\), \((x,1)\), and \((1,y)\). By “relevant” we mean that only \(x\) and \(y\) values between \(0\) and \(1\) are to be considered, as we otherwise would no longer be on the boundary of \(A\)).
Hint
When you have found stationary points and local extrema along the boundary restrictions, you are to calculate the function values at all of these points as well as at the line segment end points \((0,0)\), \((1,0)\), \((0,1)\), and \((1,1)\). The largest among these function values is the global maximum of \(f\) and the smallest is its global minimum.
Answer
The global maximum value is \(\frac{35}{27}\), attained at \((\frac{2}{3},\frac{2}{3})\). The global minimum value is \(1\), attained on the entire line segment \((x,0)\) for \(0\leq x \leq 1\), the entire line segment \((0,y)\) for \(0\leq y \leq 1\), as well as at the point \((x,y)=(1,1)\).
Question c#
This exercise concerns a differentiable function of two variables defined on \([0,1]^2\). How would you attack the exercise, had it been about a differentiable function of five variables defined on \([0,1]^5\)? Discuss a possible approach with a fellow student. You may bring an AI chatbot into your discussion.
Hint
The first step would be to find stationary points within \(]0,1[^5\), the interior of \([0,1]^5\). As the function is differentiable, there are no exceptional points.
Hint
The boundary consists of the points \((x_1,x_2,\dots,x_5)\) with one of the coordinates being \(0\) or \(1\). So, the boundary consists of \(2 \cdot 5=10\) different four-dimensional surfaces.
Hint
For each of the ten different boundary surfaces, we are to consider the restriction of \(f\) to the surface, e.g. \((x_1,x_2,x_3,x_4,0)\) for \(x_i \in [0,1]\). Here, \(f\) becomes a function of four variables, which we then again must find extrema of.
Hint
With this new function we again have \(2 \cdot 3 = 6\) boundary surfaces, this time of dimension three. And we have ten of these functions. The approach quickly becomes overwhelming - not only to do by hand but also for a computer.
Note
A “dice” in \(\mathbb{R}^n\), also called a hyper-cube, has \(2 \, n\) boundary surfaces and \(2^n\) corner points. Note that this is not a universal rule for “square domains” in higher dimensions. For instance, a “diamond” in \(\mathbb{R}^n\), called a hyper-octahedron, given by \(|x_1|+ |x_2| + \cdots + |x_n| \le 1\), has \(2^ n\) boundary surfaces and \(2 \, n\) corner points.
Question d#
Determine the range of \(f\).
Answer
As \(A\) is connected, we conclude that the range (or image set) is \(\operatorname{im}(f) = [f_\text{min},f_\text{max}] = [1,35/27]\).
Question e#
Plot the graph of \(f\) along with points to show where on the graph the maximum and minimum values are attained. Check that your results look reasonable.
5: A Return to Theme 1#
In Theme 1: The Gradient Method we considered three functions of the form \(f_i: \mathbb{R}^2 \to \mathbb{R}\). All of the functions had precisely one minimum but no maximum since they grew towards infinity. You may use this piece of information without proof.
We here use the functions (with their standard values) given in Python by:
# Defining variables and parameters
x1, x2 = symbols('x1 x2', real=True)
a, lambda1 = symbols('a lambda1', positive=True)
def f1(x1, x2, a = S(1/2)):
return a * x1**2 + 1 * x2**2
def f2(x1, x2, lambda1 = 0.5):
Q = 1/sqrt(2) * Matrix([[1,1],[1,-1]])
A = Q.T * Matrix([[lambda1,0],[0,1]]) * Q
b = Matrix([-2,4])
x = Matrix([x1,x2])
q = x.T * A * x + x.T * b
return q[0]
def f3(x1, x2):
return (1 - x1)**2 + 100*(x2 - x1**2)**2
In the Theme Exercise we used the gradient method to search for the minimum point with the minimum value. This is a fine method, in particular when the function has many (maybe infinitely many) points at which it is not differentiable. For “nicer” functions, though, such as smooth functions (meaning functions that are infinitely-many-times differentiable), it is much easier to simply find the points where their gradient is zero. The three functions we considered here are smooth.
Question a#
Find all stationary points and the minimum values for each of the three functions. Although the functions are given in Python code above, you should do it by hand - it won’t take much longer.
Hint
The function \(f_3\) is known as Rosenbrock’s banana function. The stationary point can be a little hard to find. From \(\frac{\partial f_3}{\partial x_2}(x_1,x_2) = 200(y-x^2)\) we do though see that \(\frac{\partial f_3}{\partial x_2}(x_1,x_2) = 0\) implies \(x_2=x_1^2\). Use this fact in the equation \(\frac{\partial f_3(x)}{\partial x_1} = 0\).
Question b#
State the image set of each function.
6: Global Maximum and Global Minimum once again#
Consider the function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) given by
as well as the set \(A=\lbrace(x,y) \in \mathbb{R}^2 \,| \, x^2+y^2\leq 1\rbrace\).
Justify that \(f\) has both a global maximum and a global minimum on \(A\), and determine these values as well as the points where they are attained.
Use Sympy for the boundary investigation of \(f\vert_{\partial A}\), but do the rest by hand.
Hint
The function is continuous, and \(A\) is bounded and closed.
Hint
The candidates for largest and smallest values are constituted by stationary points of the function on \(A\) as well as local extrema on the boundary of \(A\).
Hint
The only stationary point of \(f\) within the domain is \((0,0)\). Next step is to investigate the restriction of \(f\) to the boundary of \(A\).
Hint
The boundary of \(A\) can be parametrized by \((x,y)=(\cos (t),\sin(t))\) where \(t\in\left[ 0,2\pi\right] \).
Hint
The restriction of \(f\) to the boundary of \(A\) is then \(g(t)=f(\cos (t), \sin( t))\) where \(t\in\left[ 0,2\pi\right]\). Plot the graph of \(g'(t)\), and determine its zeroes (you may use Sympy for this).
Hint
The candidates for global maximum and minimum points are the point \((0,0)\), points that correspond to the solution to \(g'(t)=0\), and the end points of the boundary curve (in fact, there is only one end point; why?). Compute the function values of \(f\) at each of these points. The largest value among them is the global maximum of the function on \(A\), and the smallest value the global minimum.
Answer
The global minimum is \(-\frac{7}{2}\), attained at \(\left(\frac{\sqrt{10}}{10},\frac{3\sqrt{10}}{10}\right)\) and \(\left(-\frac{\sqrt{10}}{10},-\frac{3\sqrt{10}}{10}\right)\).
The global maximum is \(\frac{3}{2}\) attained at \(\left(\frac{3\sqrt{10}}{10},\frac{-\sqrt{10}}{10}\right)\) and \(\left(\frac{-3\sqrt{10}}{10},\frac{\sqrt{10}}{10}\right)\).
7: Stationary Points for Quadratic Forms#
Let \(q : \mathbb{R}^n \to \mathbb{R}\) be a quadratic form. In other words, \(q\) has the functional expression
where \(A\) is a (non-zero) \(n \times n\) matrix, \(\pmb{b} \in \mathbb{R}^n\) is a column vector, and \(c \in \mathbb{R}\).
It holds that \(q\) is a differentiable function with \(\nabla q(\pmb{x}) = (A + A^T) \pmb{x} + \pmb{b}\), according to this example. You will not be showing that.
Question a#
Write out a system of equations whose solution describes the stationary points. Argue that \(q\) can have either zero, one, or infinitely many stationary points.
Answer
The stationary points are given by: \(\nabla q(\pmb{x}) = (A + A^T) \pmb{x} + \pmb{b} = \pmb{0}\), which correspond to \((A + A^T) \pmb{x} = - \pmb{b}\).
Hint
The linear equation system has either zero, one, or infinitely many solutions. Remember from the topic of linear algebra from Mathematics 1a when the three scenarios will appear.
Question b#
Assume that \((A + A^T)\) is invertible. Argue that \(q\) has precisely one stationary point. Find this stationary point (that is, find a formula or expression for the stationary point).
Answer
\(\pmb{x} = - (A + A^T)^{-1} \pmb{b}\).
Question c#
Assume that \(A\) is symmetric. Argue that \(q\) has precisely one stationary point if and only if \(\lambda=0\) is not an eigenvalue of \(A\).
Hint
If \(A\) is symmetric, then \(2A = (A + A^T)\). So, the equation system is \(2 A \pmb{x} = -\pmb{b}\), which has precisely one solution if and only if \(A\) is invertible, which happens if and only if \(\lambda=0\) is not an eigenvalue. Remember that \(A = (1/2) \pmb{H}_q\), which we know due to to what the Hessian matrix means for stationary points.
Question d (Optional)#
Derive the formula that we started out by taking for granted: \(\nabla q(\pmb{x}) = (A + A^T) \pmb{x} + \pmb{b}\)
Hint
Begin by showing the proposition for \(n=1\). Then see Example 3.6 in the Additional Notes
8: A Challenge in Linear Algebra#
Let \(A\) be an \(n \times n\) matrix. Is it true that the symmetric matrix \((A + A^T)\) is invertible, if \(A\) is invertible? Prove this statement, or disprove it with a counter-example.
Hint
We provide no hints for this exercise but be welcome to discuss it on Ed.
Execises – Short Day#
1: Usage of Hessian Matrix#
Consider the function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) given by
Question a#
Argue that the function \(f\) has exactly one extremum. Determine this extremum point and the corresponding extremum value.
Hint
Any potential extremum points can only be found at the function’s stationary points.
Hint
Determine the Hessian matrix and its eigenvalues. What significance do the eigenvalues have? Check Theorem 5.2.4, Second partial derivative test.
Answer
The function has one stationary point, namely \((x, y) = (1, \frac{1}{2})\). The Hessian matrix is constant and has positive eigenvalues everywhere (and therefore also at this point), so there is a local minimum at this point. The minimum value is \(f(1, \frac{1}{2}) = -2\).
Question b#
What is the difference between an extremum and a strict extremum, also sometimes called a proper extremum? Is the extremum we found a strict extremum?
Answer
The answer to the last part of the question is yes. See Theorem 5.2.4, Second partial derivative test.
2: Local Extrema and Approximating Second-Degree Polynomial#
Given function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) with the expression
Question a#
Show that the points \(A = (2, 0)\), \(B = (1, -1)\), and \(C = (0, 0)\) are stationary points of \(f\), and determine for each of them whether there is a local maximum or a local minimum (or neither) there. If so, indicate the local maximum/minimum value attained at the point, and determine if it is strict.
Hint
For \(A\) and \(B\), the question can be answered using the eigenvalues of the Hessian matrix at the points. \(C\) requires further investigation; for example, a sign analysis of \(f\) along the line \(x = 0\).
Hint
More specifically: What happens with \(f(0, y) = 2y^3\) when passing \(y = 0\)? What does this indicate about the possibility of an extremum?
Answer
There is a strict minimum at point \(A\) with the minimum value \(f(2, 0) = -4\). There is no extremum at \(B\) and \(C\).
Question b#
Show that the approximating quadratic polynomial for \(f\) with the expansion point \(A\) can be written as an equation in the unknowns \(x\), \(y\), and \(z\) in the following form:
What surface does this equation describe, and what do the constants represent?
Hint
Now, bring the equation to the form
The surface is called a quadric surface (it is a higher-dimensional conic section – a section through higher-dimensional “cones” – in this case producing a surface rather than a curve). Find the standard equation (also called a normal form) here: https://en.wikipedia.org/wiki/Quadric#Euclidean_space.
Hint
Consider the equations and figures in the table titled Non-degenerate real quadric surfaces.
Answer
\(\lambda_1\) and \(\lambda_2\) are eigenvalues of the Hessian matrix \(\pmb{H}_f\). The equation describes an upward-facing elliptical paraboloid with the vertex at \(T = (c_1, c_2, c_3) = (2, 0, -4)\). Note: The first two coordinates of \(T\) constistute \(A\), while the last one is the minimum value of \(f\) at point \(A\).
Question c#
Plot the graph of \(f\) along with the graph of the approximating second-degree polynomials for \(f\) with the expansion points \(A\), \(B\), and \(C\). Discuss whether the eigenvalues of the Hessian matrices at the three points can determine the type of quadric surface described by the second-degree polynomials.
3: A Return to Theme Exercise 1 once more#
We consider the quadratic form \(f_2: \mathbb{R}^2 \to \mathbb{R}\) from Theme 1: The Gradient Method. It is given by \(q: \mathbb{R}^2 \to \mathbb{R}\)
where \(A\) is a \(2 \times 2\) matrix that depends on \(\lambda_1 \in \mathbb{R}\),
and where \(\pmb{b} = - 2 A [1,2]^T\). We will change the following from the Theme exercise: 1) \(\lambda_1\) may be zero or negative, and 2) we use a new definition of \(\pmb{b}\).
Question a#
Find the eigenvalues of \(A\).
Hint
The matrix \(Q\) is real orthogonal.
Answer
Due to the spectral theorem we easily read the eigenvalues from the diagonal of \(\Lambda\) to be \(\lambda_1\) and 1.
Question b#
Find all stationary points of \(q\) when \(\lambda_1 \neq 0\).
Hint
The matrix \(A\) is symmetric. Also, it is invertible when \(\lambda_1 \neq 0\). Remember the formula for the gradient of a quadratic form. You do not need to do any calculations.
Answer
The gradient at \(\pmb{x}\) is \(2 A \pmb{x} + \pmb{b}\). So, the stationary point is \(-(1/2) A^{-1} \pmb{b}\), which with the definition \(\pmb{b} = - 2 A [1,2]^T\) becomes
Question c#
How is \(A\) and the Hessian matrix \(\pmb{H}_f\) related? Find the result in the book if you cannot remember. Describe the stationary point for each of the three cases \(\lambda_1 > 0\), \(\lambda_1 = 0\) and \(\lambda_1 < 0\).
Question d#
How is \(q\) and the approximating second-degree polynomial (with an arbitrary expansion point) related? Plot \(q\) for each of the three cases \(\lambda_1 > 0\), \(\lambda_1 = 0\) and \(\lambda_1 < 0\). Which normal forms are we dealing with (see https://en.wikipedia.org/wiki/Quadric#Euclidean_space)?
4: Global Extrema of Function of Three Variables#
We consider the function \(f:\mathbb{R}^3\rightarrow \mathbb{R}\) given by
as well as the solid unit sphere
Question a#
Show that \(f\) in the interior of \(\mathcal{K}\) only has one stationary point, that being \(O=(0,0,0)\), and investigate whether \(f\) has an extremum at \(O\).
Answer
The Hessian matrix shows that there is not extremum at \((0,0,0)\).
Question b#
Determine the global maximum value and the global minimum value of \(f\) on \(\mathcal{K}\) as well as the points where these values are attained.
Answer
Global maximum \(= 1\) and is attained at \((0,1,0)\) and \((0,-1,0)\). Global minimum \(= -1\) and is attained on the circle \(\lbrace(x,y,z) \mid y=0 \text{ and } x^2+z^2=1\rbrace\).
Question c#
Determine the image set of \(f\) on \(\mathcal{K}\).
5: Where are the Global Extrema?#
Given function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) with the expression
Keep in mind that \(\exp(x^2+y^2) = \operatorname{e}^{x^2+y^2}\).
Question a#
Find all stationary points of \(f\).
Hint
Set up the two equations for the stationary points. SymPy has trouble solving the equations, so we will need to help it along.
Hint
We first notice that \(x = y = 0\) is a solution. We need to check if there are any others. Let’s assume, for example, that \(x \neq 0\). From the two equations, we can conclude that \(y^2 = x^2\), i.e., \(y = \pm x\). How? (Try isolating an expression in one equation and substituting it into the other. Remember that you can divide by \(x\) since it’s never zero.)
Hint
Since \(\exp(x^2 + y^2) \ge 0\), then we can actually conclude based on \(y^2 = x^2\), i.e., \(y = \pm x\) that \(y = x\).
Hint
Substitute \(y = x\) into the two equations and find the solution. This can be done using SymPy or by hand.
Answer
The three stationary points are
Question b#
Find all local extrema.
Answer
There are strict minima at the points \(\left(\sqrt{\frac{1}{2} \ln{2}},\sqrt{\frac{1}{2} \ln{2}}\right) \) and \( \left(-\sqrt{\frac{1}{2} \ln{2}},-\sqrt{\frac{1}{2} \ln{2}}\right)\) with the minimum value \(2-2 \ln 2\).
Question c#
Determine whether the function \(f\) has a global maximum or minimum. If so, state the values of them.
Answer
There is no global maximum. Global minimum is attained at the points \(\left(\sqrt{\frac{1}{2} \ln{2}},\sqrt{\frac{1}{2} \ln{2}}\right)\) and \(\left(-\sqrt{\frac{1}{2} \ln{2}},-\sqrt{\frac{1}{2} \ln{2}}\right)\) with the value \(2-2 \ln 2\).
Question d#
State the range of the function.
Answer
\(\operatorname{im}(f)=[2-2 \ln 2,\infty[\).