Week 1: Exercises#

The exercises are intended to be done by hand unless otherwise stated (such as when you are asked to plot a graph or run a script).

Exercises – Long Day#

1: Function or Not?#

Consider the following correspondence between \(a\) and \(b\) values:

\(a\)

\(b\)

1

0

2

1

0

3

1

2

We consider functions whose domain is a subset of \(\{0,1,2,3\}\) and whose co-domain is \(\{0,1,2,3\}\).

We need to determine whether \(f\) and \(g\) define functions. We let \(f\) follow the rule that the first column (the \(a\) values) contains the input and the second column (the \(b\) values) contains the output of the function \(f\), with the domain being \(\{0,1,2\}\). And we let \(g\) follow the rule that the second column is the input and the first column should be the output of the function \(g\), with the domain being \(\{0,1,2,3\}\).

Does \(f\) define a function? Does \(g\)? If so, determine the range/image of the function, and decide whether the function is injective and/or surjective.

\(f\) is not well-defined because \(a=1\) is associated with two different outputs (0 and 2).

The function \(g\) has \(\operatorname{im}(g) = \{0,1,2\} \subset \{0,1,2,3\}\) and is therefore not surjective (no output is 3). Since \(g(0)=1\) and \(g(2)=1\), it is also not injective.

2: Same Functional Expressions?#

We consider functions \(f_i : \mathbb{R} \to \mathbb{R}\) given by:

\[\begin{gather*} f_1(x) = |x| \\ f_2(x) = \begin{cases} x & ,x > 0 \\ -x & ,x \le 0 \end{cases} \\ f_3(x) = \max(x,0) \\ f_4(x) = \operatorname{ReLU}(x) + \operatorname{ReLU}(-x) \end{gather*}\]

where \(x \in \mathbb{R}\).

Some of the functions are the same function. Which?

  • \(f_1(x) = |x|\)

  • \(f_2(x)\) is also \(|x|\)

  • \(f_3(x) = \max(x,0)\) (i.e., \(0\) for \(x \le 0\) and \(x\) for \(x>0\))

  • \(f_4(x) = \operatorname{ReLU}(x) + \operatorname{ReLU}(-x) = |x|\).

Thus: \(f_1, f_2, f_4\) are the same, while \(f_3\) is a different function.

Plot with dtuplot:

x = symbols('x')
f1 = abs(x)
f2 = Piecewise((x, x>0),(-x,x<=0))
f3 = Max(x,0)
f4 = Max(x,0) + Max(-x,0)
dtuplot.plot(f1);
dtuplot.plot(f2);
dtuplot.plot(f3);
dtuplot.plot(f4);
../_images/b0998d02ae58e981e419a4974450140c48784777f88ea0e2c4d7fcc1815a03fb.png ../_images/b0998d02ae58e981e419a4974450140c48784777f88ea0e2c4d7fcc1815a03fb.png ../_images/2417ea1c649155800cd068579b1a33c918a0199a4440c08ed1bf6313aaa0593f.png ../_images/b0998d02ae58e981e419a4974450140c48784777f88ea0e2c4d7fcc1815a03fb.png

3: Unknown Functional Expression#

Consider a function \(f: \mathbb{R} \to \mathbb{R}\) for which \(\lim_{x \to 2} f(x) = 5\) and \(f(2) = 3\). Which of the following propositions is true?

  1. The function is continuous at \(x=2\).

  2. The function is differentiable at \(x=2\).

  3. The function is discontinuous at \(x=2\).

  4. The function is not well-defined at \(x=2\).

  5. The truth value of the above propositions cannot be determined without knowing the functional expression.

The true option is bullet 3 (discontinuous).

4: Non-Linearity of ReLU#

Consider the ReLU function \(\operatorname{ReLU}: \mathbb{R}^n \to \mathbb{R}^n\). Show that this function is not linear.

The students should simply find examples of explicit vectors and scalars with which at least one of the linearity requirements (written in one as \(L(c \pmb{x} + d\pmb{y}) = c L(\pmb{x}) + d L(\pmb{y})\)) is not fulfilled. A possible choice could be \(c=-1, d=0, \pmb{x}=[1,1,\dots, 1]^T\).

5: Possible Visualizations#

Discuss whether it is possible to visualize the following functions – if so, plot them with Python:

  1. A scalar function of two variables \(f: \mathbb{R}^2 \to \mathbb{R}, \, f(x_1,x_2) = \sqrt{\vert x_1 x_2 \vert}\)

  2. A scalar function of four variables \(f: \mathbb{R}^4 \to \mathbb{R}, \, f(x_1,x_2,x_3,x_4) = \sqrt{\vert x_1 x_2 x_3 x_4 \vert}\)

  3. A complex(-valued) scalar function of two variables \(f: \mathbb{R}^2 \to \mathbb{C}, \, f(x_1,x_2) = \sqrt{\vert x_1 x_2 \vert} + i \cos(x_1 + x_2)\)

  4. A vector field in 2D \(\pmb{f}: \mathbb{R}^2 \to \mathbb{R}^2, \, \pmb{f}(x_1,x_2) = (-x_2/3, x_1/3)\)

  5. A vector field in 3D \(\pmb{f}: \mathbb{R}^3 \to \mathbb{R}^3, \, \pmb{f}(x,y,z)= (x^3+yz^2, y^3-xz^2, z^3)\)

  6. A function of the form \(\pmb{r}: [0,10] \to \mathbb{R}^3, \, \pmb{r}(t) = (\cos(t),\sin(t),t)\)

Note

The following Python commands from the dtumathtools package are useful: dtuplot.plot3d, dtuplot.plot_vector, dtuplot.plot3d_parametric_line.

  1. A scalar function of two variables \(f: \mathbb{R}^2 \to \mathbb{R}, \, f(x_1,x_2) = \sqrt{\vert x_1 x_2 \vert}\). Can be visualized (the graph is a surface in 3D).

x1, x2 = symbols('x_1 x_2')
f = sqrt(abs(x1*x2))
dtuplot.plot3d(f,(x1,-5,5),(x2,-5,5));
../_images/d453744e27adbd2a97761b4dcb87db559bd8b557efafce1a942e03300b14b7aa.png
  1. A scalar function of four variables \(f: \mathbb{R}^4 \to \mathbb{R}, \, f(x_1,x_2,x_3,x_4) = \sqrt{\vert x_1 x_2 x_3 x_4 \vert}\). The graph exists in 5D, and level sets exist in 4D, so the function cannot be “plotted.” However, projections can be plotted, where, for example, two variables at a time are fixed, such as \(x_3 = x_4 = 0\), or one variable is fixed and level sets are plotted over the remaining variables.

  1. A complex scalar function of two real variables \(f: \mathbb{R}^2 \to \mathbb{C}, \, f(x_1,x_2) = \sqrt{\vert x_1 x_2 \vert} + i \cos(x_1 + x_2)\). The graph is a subset of \(\mathbb{R}^2 \times \mathbb{C} \cong \mathbb{R}^4\), so the function cannot be “plotted.” However, the real and imaginary parts can be plotted separately (or possibly the modulus/argument).

x1, x2 = symbols('x_1 x_2', real=True)
f = sqrt(abs(x1*x2)) + I*cos(x1+x2)
dtuplot.plot3d(re(f),(x1,-5,5),(x2,-5,5));
re(f)
../_images/d453744e27adbd2a97761b4dcb87db559bd8b557efafce1a942e03300b14b7aa.png ../_images/577ad967f5f1c7b762ad1e319e208a473143621ae8b0f01e6813d5783bbb8f25.png
dtuplot.plot3d(im(f),(x1,-5,5),(x2,-5,5));
im(f)
../_images/c2362575abad0b84154caaa4f820c1a8ecd9a8c4cd1d9172e3f61ffc5de47058.png ../_images/6673450cad12c5322439a02e6b437a7879fdb6d5fb4f0d097defa21fd0c2397f.png
  1. A vector field in 2D \(\pmb{f}: \mathbb{R}^2 \to \mathbb{R}^2, \, \pmb{f}(x_1,x_2) = (-x_2/3, x_1/3)\). It can be illustrated with arrows in 2D. The graph exists in 4D, so it cannot be directly plotted.

x1, x2 = symbols("x_1 x_2",real=True)
V = Matrix([-x2/3,x1/3])
dtuplot.plot_vector(V,(x1,-1,1),(x2,-1,1),show=True,quiver_kw={"alpha":0.5,"color":"black"},n=15);
../_images/1f378302a8e253762f13b7fa501b21de2aedce51bccb7b68d1e1afa4abffa9f4.png
  1. A vector field in 3D \(\pmb{f}: \mathbb{R}^3 \to \mathbb{R}^3, \, \pmb{f}(x,y,z)= (x^3+yz^2, y^3-xz^2, z^3)\). It can be illustrated with arrows in 3D. The graph exists in 6D, so it cannot be plotted.

x, y, z = symbols("x y z",real=True)
V = Matrix([x**3+y*z**2,y**3-x*z**2,z**3])
dtuplot.plot_vector(0.05*V,(x,-5,5),(y,-5,5),(z,-5,5),show=True,quiver_kw={"alpha":0.5,"length":0.1,"color":"black"},n=8);
../_images/211d0a3afa5e24afda551842ae614040453dc78bc2abaf1656da499fd0c779f3.png
  1. A function of the form \(\pmb{r}: [0,10] \to \mathbb{R}^3, \, \pmb{r}(t) = (\cos(t),\sin(t),t)\). The image set is a 3D curve (helix). The graph exists in 4D, so it cannot be plotted in 3D.

t = symbols('t')
r = Matrix([cos(t), sin(t), t])
dtuplot.plot3d_parametric_line(r[0],r[1],r[2], (t,0,19), use_cm=False, label="r(u)");
../_images/7220116a58ede2423a93bb145a04e23976b9e9324f81e80107ffa62134dbb992.png

6: Evaluating a Neural Network#

Consider a simple “shallow” neural network \(\pmb{\Phi}: \mathbb{R}^2 \to \mathbb{R}\) with one hidden layer (\(L=2\)). The network is defined by the parameters:

\[\begin{split} A_1 = \begin{bmatrix} 2 & 0 \\ -1 & 1 \end{bmatrix}, \quad \pmb{b}_1 = \begin{bmatrix} -1 \\ 0 \end{bmatrix}, \quad A_2 = \begin{bmatrix} -1 & 2 \end{bmatrix}, \quad b_2 = 0, \end{split}\]

where the activation function in the hidden layer is the ReLU function, \(\pmb{\sigma}(\pmb{z}) = \operatorname{ReLU}(\pmb{z})\), while the activation function in the last layer is the identity map \(\pmb{\sigma}(\pmb{z}) = \pmb{z}\). The network function is thus given by:

\[ \pmb{\Phi}(\pmb{x}) = A_2 \, \operatorname{ReLU}(A_1 \pmb{x} + \pmb{b}_1) + b_2. \]

Question a#

Calculate the value of the network at the point \(\pmb{x} = \begin{bmatrix} 0.5 \\ 1 \end{bmatrix}\).

Question b#

Find a point \(\pmb{x}\) where the output \(\pmb{\Phi}(\pmb{x})\) of the network is negative. Justify your answer.

The first neuron in the hidden layer must be positive (active) as it is multiplied by -1 in the output. The first neuron is calculated as \(z_1 = 2x_1 - 1\). If we choose, e.g., \(x_1 = 2\), then we get \(z_1 = 3\).

At the same the second neuron should be \(0\) (inactive) such that it has no positive contribution. The second neuron is \(z_2 = -x_1 + x_2\). With \(x_1 = 2\) we simply have to choose \(x_2\) such that \(z_2 \le 0\), e.g., \(x_2 = 0\).

Let’s test the point \(\pmb{x} = (2, 0)\):

  1. \(A_1 \pmb{x} + \pmb{b}_1 = [2(2)+0, -1(2)+0]^T + [-1, 0]^T = [4, -2]^T + [-1, 0]^T = [3, -2]^T\).

  2. ReLU converts negative numbers to zero: \([3, 0]^T\).

  3. Output: \([-1, 2] \cdot [3, 0]^T = -3 + 0 = -3\).

Hence, \(\pmb{x} = (2, 0)\) gives the result \(-3\).

Question c#

How many adjustable parameters (so-called weights and bias-values) does this network have in total?

Question d#

We now replace the activation function (ReLU) with a hard-limiter function (a variant of the signum function), which we will call \(\sigma_{\text{step}}\).

As a scalar function, \(\sigma_{\text{step}}: \mathbb{R} \to \mathbb{R}\), it is defined by:

\[\begin{split} \sigma_{\text{step}}(x) = \begin{cases} 1 & \text{if } x \ge 0 \\ -1 & \text{if } x < 0 . \end{cases} \end{split}\]

As a vector function, \(\pmb{\sigma}_{\text{step}}: \mathbb{R}^n \to \mathbb{R}^n\), it is defined by the scalar function being applied on each coordinate:

\[\begin{split} \pmb{\sigma}_{\text{step}}(\pmb{z}) = \begin{bmatrix} \sigma_{\text{step}}(z_1) \\ \vdots \\ \sigma_{\text{step}}(z_n) \end{bmatrix}. \end{split}\]

The network with this new activation function looks like this: \(\pmb{\Phi}(\pmb{x}) = A_2 \pmb{\sigma}_{\text{step}}(A_1 \pmb{x} + \pmb{b}_1) + b_2\).

State the subset of the domain over which the network function \(\pmb{\Phi}(\pmb{x})\) is discontinuous.

7: Visualizing the Network#

In this exercise you should use Python to visualize the function \(\pmb{\Phi}(\pmb{x})\) from the previous Exercise 6: Evaluating a Neural Network (with ReLU as the activation function).

You are to plot the graph of the network over the region \(x_1, x_2 \in [-2, 2]\). We will here use a numerical approach (with the matplotlib package) instead of SymPy in order to achieve a better 3D visualization. Run the following code (you do not have to understand all of the Python-technical details in these code lines).

import numpy as np
import matplotlib.pyplot as plt

# Parameters
A1 = np.array([[2, 0], 
               [-1, 1]])
b1 = np.array([[-1], 
               [0]])
A2 = np.array([[-1, 2]]) 
b2 = 0

def ReLU(x):
    return np.maximum(x, 0)

def shallow_nn(x1, x2):
    # Gathering the input in a vector x
    vals = np.zeros(x1.shape)
    
    for i in range(x1.shape[0]):
        for j in range(x1.shape[1]):
            x_vec = np.array([[x1[i,j]], [x2[i,j]]])
            
            # Layer 1: Affine transformation + ReLU
            z1 = A1 @ x_vec + b1
            a1 = ReLU(z1)
            
            # Layer 2: Affine transformation
            output = A2 @ a1 + b2
            
            vals[i,j] = output[0,0]
            
    return vals

x_range = np.linspace(-2, 2, 80)
y_range = np.linspace(-2, 2, 80)
X, Y = np.meshgrid(x_range, y_range)
Z = shallow_nn(X, Y)

# Plot
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')

surf = ax.plot_surface(X, Y, Z, cmap='viridis', 
                       edgecolor='k', linewidth=0.3, alpha=0.5,
                       rcount=20, ccount=20, antialiased=True)

# Lines where the ReLU has a "kink" (where it changes suddenly from 0 to positive values)
y_vals = np.linspace(-2, 2, 100)
x_vals_1 = np.full_like(y_vals, 0.5)
z_vals_1 = shallow_nn(np.array([x_vals_1]), np.array([y_vals]))[0]
x_vals_2 = np.linspace(-2, 2, 100)
y_vals_2 = x_vals_2
z_vals_2 = shallow_nn(np.array([x_vals_2]), np.array([y_vals_2]))[0]

ax.plot(x_vals_1, y_vals, z_vals_1, color='red', linewidth=3, label='$2x_1 - 1 = 0$ (Kink line 1)')
ax.plot(x_vals_2, y_vals_2, z_vals_2, color='orange', linewidth=3, label='$-x_1 + x_2 = 0$ (Kink line 2)')

ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.set_zlabel('$\Phi(x)$')
ax.legend()
ax.view_init(elev=15, azim=-100) 

plt.tight_layout()
plt.show()
Hide code cell output
../_images/96a9a7ae44b8de9f43fff471609a45e6bf47da75321f91be44fcb07a300fd3b8.png

Note

Note how the graph consists of plane surfaces that are “cut off” and stitched together. This is due to the ReLU function that is piecewise linear. In large networks used in real-life applications, millions of such subspaces are stitched together.

Can you plot the graph of the same neural network where \(\pmb{\sigma}_{\text{step}}\) is used in place of ReLU? The planes in this new graph should not tough (why not?).

8: Linear Vector Function#

Let \(A \in \mathsf{M}_{3 \times 5}(\mathbb{R})\) be given by

\[\begin{equation*} A = \begin{bmatrix} 1 & 0 & 2 & 3 & 4 \\ 0 & -1 & 5 & 6 & 7 \\ 0 & 0 & -3 & 8 & 9 \end{bmatrix}. \end{equation*}\]

Consider the vector function \(\pmb{f}: \mathbb{R}^5 \to \mathbb{R}^3\) given by \(\pmb{f} = \pmb{x} \mapsto A\pmb{x}\), where \(\pmb{x}\) is a column vector in \(\mathbb{R}\)^5.

Question a#

State the three coordinate functions of \(\pmb{f}\).

  1. \(f_1(x_1,x_2,x_3,x_4,x_5) = x_1 + 2x_3 + 3x_4 + 4x_5\)

  2. \(f_2(x_1,x_2,x_3,x_4,x_5) = -x_2 + 5x_3 + 6x_4 + 7x_5\)

  3. \(f_3(x_1,x_2,x_3,x_4,x_5) = -3x_3 + 8x_4 + 9x_5\)

In Python:

x1, x2, x3, x4, x5 = symbols('x_1 x_2 x_3 x_4 x_5')
x = Matrix([x1,x2,x3,x4,x5])
A = Matrix([[1, 0, 2, 3, 4],[0, -1, 5, 6, 7],[0, 0, -3, 8, 9]])
x, A
\[\begin{split}\displaystyle \left( \left[\begin{matrix}x_{1}\\x_{2}\\x_{3}\\x_{4}\\x_{5}\end{matrix}\right], \ \left[\begin{matrix}1 & 0 & 2 & 3 & 4\\0 & -1 & 5 & 6 & 7\\0 & 0 & -3 & 8 & 9\end{matrix}\right]\right)\end{split}\]
A * x
\[\begin{split}\displaystyle \left[\begin{matrix}x_{1} + 2 x_{3} + 3 x_{4} + 4 x_{5}\\- x_{2} + 5 x_{3} + 6 x_{4} + 7 x_{5}\\- 3 x_{3} + 8 x_{4} + 9 x_{5}\end{matrix}\right]\end{split}\]

Question b#

Provide the image set \(\operatorname{im}(\pmb{f})\) of \(\pmb{f}\).

A.columnspace()
\[\begin{split}\displaystyle \left[ \left[\begin{matrix}1\\0\\0\end{matrix}\right], \ \left[\begin{matrix}0\\-1\\0\end{matrix}\right], \ \left[\begin{matrix}2\\5\\-3\end{matrix}\right]\right]\end{split}\]

Since \(\text{rang}(A)=3\), it follows that \(\text{im}(f)=\text{col }A=\mathbb{R}^3\).

Question c#

Is the vector function \(\pmb{f}\) surjective and/or injective?

9: Next-Prime Function (Optional)#

This is an optional extra exercise for those who seek extra training.

Let \(f: \mathbb{N} \to \mathbb{N}\) be a function that returns the next prime number (strictly) greater than a given natural number \(n\). In this question, you should first determine the value of the function for two specific inputs, and then show that the function is well-defined before implementing it in Python.

Question a#

Find, with simple reasoning, \(f(10)\) and \(f(13)\).

Question b#

Argue that the function \(f(n)\) is well-defined.

Question c#

Can a functional expression for \(f(n)\) be formulated? Discuss why it is or isn’t possible.

Question d#

Implement the function \(f(n)\) in Python, such that it takes an integer \(n\) as input and returns the next prime number greater than \(n\). Define an auxiliary function (a “helper function”) is_prime(x) to determine if a number is a prime.

from math import sqrt  

def is_prime(x: int) -> bool:  
    """Returns True if x is a prime, otherwise False."""  
    if x < 2:  
        return False  
    # Three lines of code missing 
    return True  

def f(n: int) -> int:  
    """Returns the next prime number greater than n."""  
    candidate = n + 1  
    # Two lines of code missing
    return candidate  

Question e#

Can you update your Python function from the previous question so that the domain of \(f\) is extended from \(\mathbb{N}\) to \(\mathbb{R}\)? E.g., \(f(\pi)\) should return 5.

from math import sqrt, floor

def is_prime(x: int) -> bool:  
    """Returns True if x is a prime number, otherwise False."""  
    if x < 2:  
        return False  
    for i in range(2, int(sqrt(x)) + 1):  # We only need to check up to the square root of x
        if x % i == 0:  
            return False  
    return True  

def f(n: float) -> int:  
    """Returns the next prime number greater than n."""  
    candidate = floor(n) + 1  
    while not is_prime(candidate):  
        candidate += 1  
    return candidate  

f(10), f(13), f(3.14)
../_images/d72a41139783f290b45d2977aa1bd58d99264c23250f53ec131bf0167c851d19.png

Exercises – Short Day#

1: Magnitude of Vectors#

Consider the following three vectors in \(\mathbb{R}^3\):

\[\begin{equation*} \pmb{v}_1 = \left[\begin{matrix}-10\\ -10\\ -10\end{matrix}\right], \, \pmb{v}_2 = \left[\begin{matrix}-10\\ -4\\ 14\end{matrix}\right], \, \pmb{v}_3 = \left[\begin{matrix}-10\\ -8\\-12\end{matrix}\right] . \end{equation*}\]

Which vector is the longest? Which vectors are orthogonal to each other? Which two vectors are closest to each other?

Note

We can imagine the vectors as (geometrical) position vectors with their origin at \(\pmb{0}=[0,0,0]^T\) and the endpoint at \(\pmb{v}_i\) for \(i = 1, 2, 3\). Sometimes this is written as \(\overrightarrow{\pmb{0}\pmb{v}_i}\).

v1 = Matrix([-10,-10,-10])
v2 = Matrix([-10,-4,14])
v3 = Matrix([-10,-8,-12])
v1.norm(), v2.norm(), v3.norm()
../_images/4ab515d11ae3d9e01a48807143e33f88b4e26d5a61aa3acb3c3944ec85578b13.png
N(v1.norm()), N(v2.norm()), N(v3.norm())
../_images/ca67132ad943b19c830e5802eb117169d2cba927d4754b7050220022538f2cbf.png
  • Orthogonality is checked using the dot product: \(\pmb{v}_1\cdot \pmb{v}_2 = 100+40-140=0\), so \(\pmb{v}_1\) and \(\pmb{v}_2\) are orthogonal.

Checking with Python:

v1.dot(v2), v2.dot(v3), v3.dot(v1)
../_images/7ec9ada1517a5ec1669a87ef90289c89732ca6f34277d502f68cfffaa6055198.png
  • Closest to each other. We have to calculate \(\Vert\pmb{v}_1-\pmb{v}_2\Vert\), \(\Vert\pmb{v}_1-\pmb{v}_3\Vert\), \(\Vert\pmb{v}_2-\pmb{v}_3\Vert\):

(v1-v2).norm(), (v2-v3).norm(), (v3-v1).norm()
../_images/ac0c8e0c66f290253263e2785774a2cc01471b580992a661f6a05cdd9de189d3.png
N((v1-v2).norm()), N((v2-v3).norm()), N((v3-v1).norm())
../_images/96827cb3c060f1251af1969603ec294240a6796811a05e727118fd8426711260.png

Conclusion: \(\bm{v}_1\) and \(\bm{v}_3\) are closest to each other.

2: Partial Derivatives of a Simple Scalar Function#

Find the partial derivatives \(\frac{\partial f}{\partial x_1}\) and \(\frac{\partial f}{\partial x_2}\) of \(f(x_1, x_2) = x_1^3 + 3x_1 x_2 + x_2^3\). Determine the value of the partial derivatives at the point \((x_1, x_2) = (1, 2)\).

x1, x2 = symbols('x1 x2', real=True) 
f = x1**3 + 3 * x1 * x2 + x2**3 
diff_x1 = f.diff(x1) 
diff_x2 = f.diff(x2) 
(diff_x1.subs({x1:1, x2:2}), diff_x2.subs({x1:1, x2:2})) 
../_images/6149a1daaa256ac187e2b6f787c4e6a9216258508571d0ff34e1516bf0f4bfd8.png

3: Different(?) Quadratic Forms#

Let \(\pmb{x} = [x_1,x_2]^T\) be a column vector in \(\mathbb{R}^2\). Define:

\[\begin{equation*} A_1 =\left[\begin{matrix}11 & -12 \\ -12 & 4\end{matrix}\right], \, A_2 =\left[\begin{matrix}11 & 0 \\ -24 & 4\end{matrix}\right], \, A_3 =\left[\begin{matrix} 73/5 & -36/5 \\ -36/5 & 52/5 \end{matrix}\right], \, \end{equation*}\]

and

\[\begin{equation*} \pmb{b}_1 = \left[\begin{matrix}-20\\ 40\end{matrix}\right], \, \pmb{b}_2 = \pmb{b}_1, \, \pmb{b}_3 = \left[\begin{matrix}-44\\ 8\end{matrix}\right], \, c = -60. \end{equation*}\]

Let \(q_i: \mathbb{R}^2 \to \mathbb{R}\) be given by:

\[\begin{equation*} q_i(\pmb{x}) = \pmb{x}^T A_i \pmb{x} + \pmb{b}_i^T \pmb{x} + c \end{equation*}\]

for \(i=1,2,3\). Such functions are called quadratic forms, see this definition.

x1, x2 = symbols('x_1 x_2')
x = Matrix([x1,x2])
A1 = Matrix([[11,-12],[-12,4]])
A2 = Matrix([[11,0],[-24,4]])
A3 = Matrix([[73,-36],[-36,52]])/S(5)
b1 = Matrix([-20,40])
b2 = Matrix([-20,40])
b3 = Matrix([-44,8])
c = Matrix([-60])
x, A1, A2, A3
\[\begin{split}\displaystyle \left( \left[\begin{matrix}x_{1}\\x_{2}\end{matrix}\right], \ \left[\begin{matrix}11 & -12\\-12 & 4\end{matrix}\right], \ \left[\begin{matrix}11 & 0\\-24 & 4\end{matrix}\right], \ \left[\begin{matrix}\frac{73}{5} & - \frac{36}{5}\\- \frac{36}{5} & \frac{52}{5}\end{matrix}\right]\right)\end{split}\]

Comment on the exercise (which goes beyond what is expected of the students today):

\(q_1\) and \(q_2\) are the same, but \(q_1\) and \(q_3\) are different. However, \(q_1\) and \(q_3\) have the same symmetry axes because \(A_1\) and \(A_3\) have the same eigenvectors (and the same stationary point). Only the sign of one eigenvalue differs:

A1.eigenvects(), A3.eigenvects()
\[\begin{split}\displaystyle \left( \left[ \left( -5, \ 1, \ \left[ \left[\begin{matrix}\frac{3}{4}\\1\end{matrix}\right]\right]\right), \ \left( 20, \ 1, \ \left[ \left[\begin{matrix}- \frac{4}{3}\\1\end{matrix}\right]\right]\right)\right], \ \left[ \left( 5, \ 1, \ \left[ \left[\begin{matrix}\frac{3}{4}\\1\end{matrix}\right]\right]\right), \ \left( 20, \ 1, \ \left[ \left[\begin{matrix}- \frac{4}{3}\\1\end{matrix}\right]\right]\right)\right]\right)\end{split}\]

We will later see that the gradient vector for quadratic forms is \((A + A^T)x + b\). Therefore, the stationary points are found by \(x = -(A + A^T)^{-1} b\).

- A1.inv() * b1 / 2
\[\begin{split}\displaystyle \left[\begin{matrix}2\\1\end{matrix}\right]\end{split}\]
- A3.inv() * b3 / 2
\[\begin{split}\displaystyle \left[\begin{matrix}2\\1\end{matrix}\right]\end{split}\]

It is not a coincidence that they have the same stationary points. We have indeed defined \(b_3\) as:

A3 * A1.inv() * b1 
\[\begin{split}\displaystyle \left[\begin{matrix}-44\\8\end{matrix}\right]\end{split}\]

Question a#

Expand the expression for \(q_1(x_1,x_2)\). First by hand, then using Python. Also expand the expressions for \(q_2(x_1,x_2)\) and \(q_3(x_1,x_2)\) (either by hand or using Python).

q1 = expand((x.T*A1*x+b1.T*x+c)[0])
q2 = expand((x.T*A2*x+b2.T*x+c)[0])
q3 = expand((x.T*A3*x+b3.T*x+c)[0])
q1, q2, q3
../_images/305bbea6bd678e6235b2198be3fb783266d0076e794b29f3ddd05f8659ab9c8e.png

We conclude that \(q_1=q_2\), but we also note that \(A_1\neq A_2\).

Question b#

Is the square matrix \(A\) in a quadratic form (such as \(\pmb{x}^T A \pmb{x}\)) uniquely determined?

Question c#

Plot the graph of the function \(q_1\). Then plot some level curves. What geometric shape do the level curves have? Do the same for \(q_3\).

dtuplot.plot3d(q1);
../_images/24ed0ed0a378c8833dde5f9af50832742268f3cc85c0ae08bb9592463667b427.png
dtuplot.plot3d(q3);
../_images/0e5162c04a602610dbebaa710153fbf248d35c6eb4822b13718b15f973fddf12.png
dtuplot.plot_contour(q1,(x1,-5,8),(x2,-5,8),is_filled=False);
../_images/a119df72a09149556489681e8596767ccf64aac5f92f28fcf01bd10dee12e843.png
dtuplot.plot_contour(q3,(x1,-2,5),(x2,-2,5),is_filled=False);
../_images/63ca9927e63af25204eadf327c80ba767e5ba966c28547a4e26d6ecb9cb7a2fb.png

Question d#

One of the functions has a minimum. Which one? Where is it approximately located? What is this point called for the functions that do not have a minimum?

4: The Softmax Function#

In this exercise we will investigate the softmax function (see the textbook section that covers this function).

Question a#

Calculate softmax of the following three vectors in \(\mathbb{R}^3\). You may do this by hand (use a calculator for values of the exponential function) or by use of Python. State the answers with at least three decimals.

  1. \(\pmb{x}_1=[1,2,-5]^T\)

  2. \(\pmb{x}_2=[10,2,-5]^T\)

  3. \(\pmb{x}_3=[100,2,-5]^T\)

Question b#

What do you observe when the difference between the largest value in the input and the other values increases (such as at the shift from \(\pmb{x}_1\) to \(\pmb{x}_2\) and \(\pmb{x}_3\))? Why might the function be called softmax?

Question c#

Is the softmax function continuous?

Question d#

Consider softmax as a map from \(\mathbb{R}^n\) to \(\mathbb{R}^n\). Is the function injective?

Is the function surjective if we consider the co-domain \(\mathbb{R}^n\)?

  1. Not injective: Softmax is invariant w.r.t. translation. If we add the same constant \(c\) to all entries in \(\pmb{x}\), the numerator will change by a factor of \(\mathrm e^c\) and the denominator by this same factor \(\mathrm e^c\), so they cancel out:

    \[ \frac{\mathrm e^{x_i + c}}{\sum \mathrm e^{x_j + c}} = \frac{\mathrm e^c \mathrm e^{x_i}}{\mathrm e^c \sum \mathrm e^{x_j}} = \frac{\mathrm e^{x_i}}{\sum \mathrm e^{x_j}}. \]

    Hence, \(\text{softmax}([1, 2, 3]) = \text{softmax}([11, 12, 13])\). As two different inputs give the same output, the function is not injective.

  2. Not surjective with co-domain \(\mathbb{R}^n\): The output from softmax is always within the set of probability vectors (all \(y_i > 0\) and \(\sum y_i = 1\)). Thus:

    • The output will never be a vector with negative entries.

    • The output will never be a vector with exact zeroes (as \(\mathrm e^x > 0\) for all real \(x\)), although it can be arbitrarily close.

    (The function is surjective on the open standard simplex, but not on all of \(\mathbb{R}^n\)).

5: Quadratic Forms with Symmetric Matrices#

Let \(A\) be an arbitrary \(n \times n\) matrix, and let \(\pmb{x}\) be a column vector in \(\mathbb{R}^n\). Define \(B\) by \(B = (A + A^T)/2\).

Question a#

Show that the matrix \(B\) is symmetric.

Question b#

Show that \(\pmb{x}^T A \pmb{x} = \pmb{x}^T B \pmb{x}\).

Question c#

Conclude that it is always possible to define quadratic forms of the type \(q(\pmb{x}) = \pmb{x}^T A \pmb{x} + \pmb{b}^T \pmb{x} + c\) using a symmetric matrix \(A\).