Homework Assignment 1#

Read the rules of homework assignments here: Homework Assignments. This assignment is intended to be done by hand without electronic aids unless specifically stated.

Remember to justify all answers.

Exercise 1: Jacobian Matrix of a Neural Network#

We consider a simple neural network consisting of two layers. The network takes an input vector \(\pmb{x}\in \mathbb{R}^{2}\) and produces a probability vector \(\pmb{p}\in \mathbb{R}^{3}\).

The network is described as a composite function \(\pmb{h}=\pmb{g}\circ \pmb{f}\), where:

  1. \(\pmb{f}:\mathbb{R}^{2} \to \mathbb{R}^{3}\) is an affine function \(\pmb{f}(\pmb{x}) = A \pmb{x} + \pmb{b}\) (which in the textbook is written as \(T_{A,\pmb{b}}\); it is the “hidden” layer before activation).

  2. \(\pmb{g}:\mathbb{R}^{3} \to \mathbb{R}^{3}\) is the Softmax function.

We are informed that the functional expression of \(\pmb{f}\) is given by:

\[\begin{equation*} \pmb{f}(x_1, x_2) = \begin{bmatrix} 2x_1 + x_2 \\ x_1 - x_2 \\ x_2 \end{bmatrix} + \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix}. \end{equation*}\]

The function \(\pmb{g}\) (Softmax) takes a vector \(z=\left[z_{1},z_{2},z_{3}\right]^{T}\) and returns:

\[\begin{equation*} \pmb{g}(\pmb{z}) = \frac{1}{\sum_{k=1}^3 \mathrm e^{z_k}} \begin{bmatrix} \mathrm e^{z_1} \\ \mathrm e^{z_2} \\ \mathrm e^{z_3} \end{bmatrix}. \end{equation*}\]

We are furthermore informed (we will prove this claim in the next assignment) that the Jacobian matrix for the Softmax function \(\pmb{g}\) at an arbitrary point \(\pmb{z}\) can be expressed in terms of the output vector \(\pmb{p}=\pmb{g}\left(\pmb{z}\right)\) as:

\[\begin{equation*} \pmb{J}_{\pmb{g}}(\pmb{z}) = \begin{bmatrix} p_1(1-p_1) & -p_1 p_2 & -p_1 p_3 \\ -p_2 p_1 & p_2(1-p_2) & -p_2 p_3 \\ -p_3 p_1 & -p_3 p_2 & p_3(1-p_3) \end{bmatrix} . \end{equation*}\]

We wish to determine the Jacobian matrix \(\pmb{J}_{\pmb{h}}\) for the entire network at the point: \(\pmb{x}_0 = (1, -1)\).

Question a#

Determine the in-between vector \(\pmb{z}_{0}=\pmb{f}\left(1,-1\right)\) as well as the final output of the network \(\pmb{p}_{0}=\pmb{g}\left(\pmb{z}_{0}\right)\). You may use Python/calculator to evaluate \(\mathrm e^{z}\) but should also state the exact expression.

Question b#

Determine the Jacobian matrix \(\pmb{J}_{\pmb{f}}\left(1,-1\right)\). Calculate the numerical values of the Jacobian matrix \(\pmb{J}_{\pmb{g}}\left(\pmb{z}_{0}\right)\) by substituting in the values for \(\pmb{p}_{0}\) as found in Question a.

Question c#

Argue that the composite function \(\pmb{h}=\pmb{g}\circ \pmb{f}\) is differentiable. Use the chain rule to determine the Jacobian matrix \(\pmb{J}_{\pmb{h}}\left(1,-1\right)\).

This corresponds to finding the network’s “gradients” with respect to the input, which is used to understand the model sensitivity against changes in the input.

Exercise 2: Differentiability of Softmax#

In this exercise you are to prove the formula for the Jacobian matrix of the Softmax function that was used in the previous exercise. We define the Softmax function as a vector function \(g: \mathbb{R}^{n} \to \mathbb{R}^{n}\). We will call the input vector \(\pmb{z}=\left[z_{1},\dots,z_{n}\right]^{T}\) and the output vector \(\pmb{p}=g\left(\pmb{z}\right) =\left[p_{1},\dots,p_{n}\right]^{T}\).

Question a#

Let \(S(\pmb{z}) = \sum_{k=1}^{n} \mathrm e^{z_{k}}=\mathrm e^{z_{1}}+\mathrm e^{z_{2}}+ \dots +\mathrm e^{z_{n}}\). Find the partial derivatives of \(S\) with respect to \(z_{j}\), meaning find \(\frac{\partial S}{\partial z_{j}}(\pmb{z})\) for \(j=1,\dots,n\).

Question b#

Use the quotient rule to differentiate \(p_{i}=\frac{\mathrm e^{z_{i}}}{S(\pmb{z}) }\) with respect to \(z_{i}\) (so, where the output index and input index are the same). Show that the result can be written as:

\[ \frac{\partial p_{i}}{\partial z_{i}}=p_{i}\left(1-p_{i}\right). \]

Question c#

Use the quotient rule to differentiate \(p_{i}=\frac{\mathrm e^{z_{i}}}{S(\pmb{z}) }\) with respect to \(z_{j}\) where \(j\ne i\) (so, the elements outside the diagonal of the Jacobian matrix). Show that the result can be written as:

\[ \frac{\partial p_{i}}{\partial z_{j}}=-p_{i}p_{j}. \]

Question d#

State the Jacobian matrix \(\pmb{J}_{\pmb{g}}\left(\pmb{z}\right)\) for the case \(n=4\) by use of the results from Questions b and c, where the Jacobian matrix should only be expressed in terms of the values \(p_1, p_2, p_3, p_4\).

It is a bit unusual to express \(\pmb{J}_{\pmb{g}}\left(z\right)\) in terms of the output variable \(\pmb{p}\) rather than the input variable \(\pmb{z}\). One would here prefer using \(\pmb{p}\), since the formula becomes very simple.

Exercise 3: A New Inner Product#

Consider \(\mathbb{C}^n\) with the usual inner product \(\langle \cdot, \cdot \rangle\). Let

  • \(U \in \mathbb{C}^{n \times n}\) be a unitary matrix (so, \(U^*U=I\)),

  • \(\Lambda \in \mathbb{R}^{n \times n}\) be a diagonal matrix with strictly positive diagonal elements, so \(\lambda_i>0\) for \(i=1,\ldots,n\).

We define

\[\begin{equation*} B = U\,\Lambda\,U^*. \end{equation*}\]

We also define

\[\begin{equation*} A = U\,D\,U^*, \end{equation*}\]

where \(D\) is a diagonal matrix with the elements

\[\begin{equation*} d_i = \sqrt{\lambda_i}, \quad i=1,\ldots,n. \end{equation*}\]

Question a#

Show that

\[\begin{equation*} A^*A = B, \end{equation*}\]

and that \(A\) and \(B\) are Hermitian

Question b#

Determine the inverse matrix \(A^{-1}\).

Question c#

We define for the column vectors \(\pmb{x}, \pmb{y} \in \mathbb{C}^n\) the inner product

\[\begin{equation*} \langle \pmb{x}, \pmb{y} \rangle_B = \langle B\,\pmb{x}, \pmb{y} \rangle. \end{equation*}\]

Show that

\[\begin{equation*} \langle \pmb{x}, \pmb{y} \rangle_B = \langle A\,\pmb{x}, A\,\pmb{y} \rangle. \end{equation*}\]

Question d#

Show that \(\langle \pmb{x}, \pmb{y} \rangle_B\) actually is an inner product on \(\mathbb{C}^n\). You may use, without proof, that the usual inner product \(\langle \pmb{x}, \pmb{y} \rangle\) is an inner product on \(\mathbb{C}^n\).

Question e#

Let \(B = \operatorname{diag}(2,5)\). State two vectors \(\pmb{x}, \pmb{y} \in \mathbb{C}^2\) that are orthogonal on each other with respect to \(\langle \cdot, \cdot \rangle_B\). The vectors must not be the zero vector in \(\mathbb{C}^2\).