Homework Assignment 1#
Read the rules of homework assignments here: Homework Assignments. This assignment is intended to be done by hand without electronic aids unless specifically stated.
Remember to justify all answers.
Exercise 1: Jacobian Matrix of a Neural Network#
We consider a simple neural network consisting of two layers. The network takes an input vector \(\pmb{x}\in \mathbb{R}^{2}\) and produces a probability vector \(\pmb{p}\in \mathbb{R}^{3}\).
The network is described as a composite function \(\pmb{h}=\pmb{g}\circ \pmb{f}\), where:
\(\pmb{f}:\mathbb{R}^{2} \to \mathbb{R}^{3}\) is an affine function \(\pmb{f}(\pmb{x}) = A \pmb{x} + \pmb{b}\) (which in the textbook is written as \(T_{A,\pmb{b}}\); it is the “hidden” layer before activation).
\(\pmb{g}:\mathbb{R}^{3} \to \mathbb{R}^{3}\) is the Softmax function.
We are informed that the functional expression of \(\pmb{f}\) is given by:
The function \(\pmb{g}\) (Softmax) takes a vector \(z=\left[z_{1},z_{2},z_{3}\right]^{T}\) and returns:
We are furthermore informed (we will prove this claim in the next assignment) that the Jacobian matrix for the Softmax function \(\pmb{g}\) at an arbitrary point \(\pmb{z}\) can be expressed in terms of the output vector \(\pmb{p}=\pmb{g}\left(\pmb{z}\right)\) as:
We wish to determine the Jacobian matrix \(\pmb{J}_{\pmb{h}}\) for the entire network at the point: \(\pmb{x}_0 = (1, -1)\).
Question a#
Determine the in-between vector \(\pmb{z}_{0}=\pmb{f}\left(1,-1\right)\) as well as the final output of the network \(\pmb{p}_{0}=\pmb{g}\left(\pmb{z}_{0}\right)\). You may use Python/calculator to evaluate \(\mathrm e^{z}\) but should also state the exact expression.
Question b#
Determine the Jacobian matrix \(\pmb{J}_{\pmb{f}}\left(1,-1\right)\). Calculate the numerical values of the Jacobian matrix \(\pmb{J}_{\pmb{g}}\left(\pmb{z}_{0}\right)\) by substituting in the values for \(\pmb{p}_{0}\) as found in Question a.
Question c#
Argue that the composite function \(\pmb{h}=\pmb{g}\circ \pmb{f}\) is differentiable. Use the chain rule to determine the Jacobian matrix \(\pmb{J}_{\pmb{h}}\left(1,-1\right)\).
This corresponds to finding the network’s “gradients” with respect to the input, which is used to understand the model sensitivity against changes in the input.
Exercise 2: Differentiability of Softmax#
In this exercise you are to prove the formula for the Jacobian matrix of the Softmax function that was used in the previous exercise. We define the Softmax function as a vector function \(g: \mathbb{R}^{n} \to \mathbb{R}^{n}\). We will call the input vector \(\pmb{z}=\left[z_{1},\dots,z_{n}\right]^{T}\) and the output vector \(\pmb{p}=g\left(\pmb{z}\right) =\left[p_{1},\dots,p_{n}\right]^{T}\).
Question a#
Let \(S(\pmb{z}) = \sum_{k=1}^{n} \mathrm e^{z_{k}}=\mathrm e^{z_{1}}+\mathrm e^{z_{2}}+ \dots +\mathrm e^{z_{n}}\). Find the partial derivatives of \(S\) with respect to \(z_{j}\), meaning find \(\frac{\partial S}{\partial z_{j}}(\pmb{z})\) for \(j=1,\dots,n\).
Question b#
Use the quotient rule to differentiate \(p_{i}=\frac{\mathrm e^{z_{i}}}{S(\pmb{z}) }\) with respect to \(z_{i}\) (so, where the output index and input index are the same). Show that the result can be written as:
Question c#
Use the quotient rule to differentiate \(p_{i}=\frac{\mathrm e^{z_{i}}}{S(\pmb{z}) }\) with respect to \(z_{j}\) where \(j\ne i\) (so, the elements outside the diagonal of the Jacobian matrix). Show that the result can be written as:
Question d#
State the Jacobian matrix \(\pmb{J}_{\pmb{g}}\left(\pmb{z}\right)\) for the case \(n=4\) by use of the results from Questions b and c, where the Jacobian matrix should only be expressed in terms of the values \(p_1, p_2, p_3, p_4\).
It is a bit unusual to express \(\pmb{J}_{\pmb{g}}\left(z\right)\) in terms of the output variable \(\pmb{p}\) rather than the input variable \(\pmb{z}\). One would here prefer using \(\pmb{p}\), since the formula becomes very simple.
Exercise 3: A New Inner Product#
Consider \(\mathbb{C}^n\) with the usual inner product \(\langle \cdot, \cdot \rangle\). Let
\(U \in \mathbb{C}^{n \times n}\) be a unitary matrix (so, \(U^*U=I\)),
\(\Lambda \in \mathbb{R}^{n \times n}\) be a diagonal matrix with strictly positive diagonal elements, so \(\lambda_i>0\) for \(i=1,\ldots,n\).
We define
We also define
where \(D\) is a diagonal matrix with the elements
Question a#
Show that
and that \(A\) and \(B\) are Hermitian
Question b#
Determine the inverse matrix \(A^{-1}\).
Question c#
We define for the column vectors \(\pmb{x}, \pmb{y} \in \mathbb{C}^n\) the inner product
Show that
Question d#
Show that \(\langle \pmb{x}, \pmb{y} \rangle_B\) actually is an inner product on \(\mathbb{C}^n\). You may use, without proof, that the usual inner product \(\langle \pmb{x}, \pmb{y} \rangle\) is an inner product on \(\mathbb{C}^n\).
Question e#
Let \(B = \operatorname{diag}(2,5)\). State two vectors \(\pmb{x}, \pmb{y} \in \mathbb{C}^2\) that are orthogonal on each other with respect to \(\langle \cdot, \cdot \rangle_B\). The vectors must not be the zero vector in \(\mathbb{C}^2\).