Section 7 Problem Set 7
- Last one! You can drop your lowest problem set, so if you are happy with 6 problem set scores you can drop this one. However, I recommend you work on and think about these problems, since these ideas will be on the final quiz.
- Due: Wednesday July 07 by 11:59am CST (last day of class).
- Upload your solutions to Moodle in a PDF.
- Please feel free to use RStudio for all calculations, including row reduction, matrix multiplication, eigenvector calculation and inverse matrices.
- You can download the Rmd source file for this problem set.
This problem set covers Network Centralities and Sections 6.1, 6.2, 6.3, 6.5 on Orthogonal Projections.
7.1 Orthogonal Complements
(Work on in class on Tuesday June 29)
Here are two subspaces of \(\mathbb{R}^5\) that we have seen before. (See PS4.2 and PS5.2) \[ \begin{align} \mathsf{Z} & = \left\{ \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{bmatrix} \ \bigg\vert \ x_1 + x_2 + x_3 + x_4 + x_5 = 0, x_4 = 2 x_2 \right\}. \\ \mathsf{F} & = \left\{ \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{bmatrix} \ \bigg\vert \ x_3 = x_1 + x_2, x_4 = x_2 + x_3, x_5 = x_3 + x_4 \right\}. \end{align} \] Find the orthogonal complement of each subspace in \(\mathbb{R}^5\). For each example, compute \(\dim(W) + \dim(W^\perp)\).
7.2 Cosine Similarity
(Work on in class on Tuesday June 29)
In high dimensional space \(\mathbb{R}^n\) a common measure of similarity between two vectors is cosine similarity: the cosine of the angle \(\theta\) between the vectors. We calculate this value as shown below. This formula was derived in the video for 5.1. \[ \cos(\theta) = \frac{ \mathsf{u} \cdot \mathsf{v}} {\| \mathsf{u}\| \, \|\mathsf{v}\|} = \frac{ \mathsf{u} \cdot \mathsf{v}} {\sqrt{\mathsf{u} \cdot \mathsf{u}} \sqrt{\mathsf{v} \cdot \mathsf{v}}}. \] This measure has the following nice properties:
- \(-1 \le \cos(\theta) \le 1\),
- \(\cos(\theta)\) is close to 1 if \(\mathsf{u}\) and \(\mathsf{v}\) are closely aligned,
- \(\cos(\theta)\) is close to 0 if \(\mathsf{u}\) and \(\mathsf{v}\) are are orthogonal,
- \(\cos(\theta)\) is close to \(-1\) if \(\mathsf{u}\) and \(\mathsf{v}\) are polar opposites.
- \(\cos(\theta)\) is positive if \(\theta\) is acute (less than \(90^o\)).
- \(\cos(\theta)\) is negative if \(\theta\) is obtuse (greater than \(90^o\)).
- Write a function
cosine_similarity
that takes as input two vectors \(\mathsf{u}\) and \(\mathsf{v}\) and returns the value of \(\cos(\theta)\) for the angle \(\theta\) between \(\mathsf{u}\) and \(\mathsf{v}\). Below is a shell of the code that you need. Right now it always just returns 1.You need to fix that up. Demonstrate that your code works on some vectors in \(\mathbb{R}^5\). Use vectors that are orthogonal, closely aligned, and close to polar opposites.
<- function(u, v) {
cosine_similarity # your code goes here!
# find the cosine of the angle between u and v
= 1
cosine return (cosine)
}
In the file US_Senate_s21.Rmd you will find vectors of the voting record of the Senate in the 109th US Congress (2007-2008). You will see how to take the dot product of the voting record of two different senators. The dot product is always an integer. Explain what it counts. It is the number of something or possibly the difference of two things.
Use your
cosine_similary
function to find the cosine similarity between every pair of the following senators:- Hilary Clinton (D, NY), presidential candidate 2016
- John McCain (R, AZ), presidential candidate 2008
- Barack Obama (D, IL), president 2008-2016
- Susan Collins (R, ME), moderate Republican
Does the cosine similarity pick up on the fact that Senator Collins is a “moderate Republican”?
The senate majority leader of the 109th Congress was Bill Frist (R, TN), and the senate minority leader was Harry Reid (D, NV).
Create a function
classify_senator(senator)
that returns “R” or “D” depending on the cosine similarity ofsenator
tofrist
and toreid
. You will have to write an “if … else statement” (here is the syntax).There is a chunk of code in the R file I’ve given you that gets you started.
Then run the my_classification code that we have given. I dentify any senators that have been misclassified using this method, meaning that their votes are more similar to the leader of the opposing party.
Jim Jeffords (I, VT) was a Republican who became and Independent in 2001 and then caucused with the Democrats. How does your classifier handle Jeffords?
7.3 Orthogonal Diagonalization
(Work on in class on Wednesday June 30)
Recall that a square \(n \times n\) matrix is symmetric when \(A^{\top} = A\). It is an amazing property that the eigenvectors of a symmetric matrix form an orthogonal basis of \(\mathbb{R}^n\). In this problem, you will confirm that this holds for the following symmetric matrix, \[ A = \begin{bmatrix} 0 & 8 & 10 & -4 \\ 8 & 4 & 28 & 6 \\ 10 & 28 & 3 & -4 \\ -4 & 6 & -4 & -7 \end{bmatrix}. \]
= cbind(c(0,8,10,-4),c(8,4,28,6),c(10,28,3,-4),c(-4,6,-4,-7)) A
Find the eigenvalues and eigenvectors of \(A\).
Let \(P\) be the matrix of the eigenvectors output by R:
P = eigen(A)$vectors
. Confirm that these eigenvectors are an orthonormal set using a single matrix calculation.Let \(\mathsf{v} = \begin{bmatrix} 2 & -4 & -9 & -2 \end{bmatrix}^{\top}\). Find the coordinates of \(\mathsf{v}\) in the eigenbasis in 3 different ways:
- Use the transpose of \(P\). Recall that, in R, the transpose of a matrix is
t(A)
. - Augment and row reduce.
- Use the dot product formula. Note that the command
v1 = P[,1]
will extract the first eigenvector from the matrixP
.
- Use the transpose of \(P\). Recall that, in R, the transpose of a matrix is
Diagonalize \(A\) using \(P\). To demonstrate this compute both \(A= P D P^{-1}\) and \(D =P^{-1} A P\). Congratulations: you have orthogonally diagonalized the symmetric matrix \(A\)!
Turn in: Your R code and the output for each part.
7.4 Fibonacci Orthogonality
(Work on in class on Thursday July 1)
In problem 4.2, you saw that the vector space of Fibonacci vectors \(\mathsf{F} \subseteq \mathbb{R}^5\) is a 2-dimensional subspace of \(\mathbb{R}^5\).
Below is an orthogonal basis of \(\mathsf{F}\). Check that the basis vectors \(\{\mathsf{f}_1, \mathsf{f}_2\}\) are in \(\mathsf{F}\) and are orthogonal. \[ \mathsf{F} = \mathsf{span} \left\{ \mathsf{f}_1 = \begin{bmatrix} 1 \\ 0 \\ 1 \\ 1 \\ 2 \end{bmatrix}, \mathsf{f}_2 = \begin{bmatrix} -9 \\ 7 \\ -2 \\ 5 \\ 3\end{bmatrix} \right\}, \hskip.5in \mathsf{v} = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 5 \\ 7\end{bmatrix}.\hskip.5in \]
The vector \(\mathsf{v}\) above is not in \(\mathsf{F}\) (check that!). Its projection \(\mathsf{p}\) onto \(\mathsf{F}\) is given by the formula below. Compute \(\mathsf{p}\). This is closest approximation of \(\mathsf{f}\) with a Fibonacci vector. \[ \mathsf{p} = \frac{ (\mathsf{v} \cdot \mathsf{f}_1)}{ ( \mathsf{f}_1 \cdot \mathsf{f}_1)} \mathsf{f}_1 + \frac{ (\mathsf{v} \cdot \mathsf{f}_2)}{ ( \mathsf{f}_2 \cdot \mathsf{f}_2)} \mathsf{f}_2. \] This formula requires that the basis be orthogonal.
The residual vector is the vector \(\mathsf{r} = \mathsf{v} - \mathsf{p}\). Compute \(\mathsf{r}\).
Show that \(\mathsf{r}\) is orthogonal to \(\mathsf{f}_1\) and \(\mathsf{f}_2\).
Compute \(||\mathsf{r}||\). This is the distance from \(\mathsf{v}\) to \(\mathsf{F}\) (i.e., how far \(\mathsf{v}\) is from being Fibonacci?).
7.5 Least Squares Solution to \(\mathsf{A} x = \mathsf{b}\)
(Work on in class on Tuesday July 6)
Find the least-squares solution to the following inconsistent matrix equation \(\mathsf{A} x = \mathsf{b}\). \[ \left[ \begin{array}{ccc} 1 & 1 & 2 \\ 1 & 0 & 2 \\ -1 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \\ \end{array} \right] \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 7 \\ 11 \\ -3 \\ 10 \\ 7 \end{bmatrix}. \hskip5in \] Here are the steps you should follow:
- Show that this system of equations is inconsistent.
- Make the normal equations: \(\mathsf{A}^T \mathsf{A} x = \mathsf{A}^T \mathsf{b}\).
- Solve the normal equations for \(\hat{x}\) (by hand or software).
- Get the projection \(\hat{\mathsf{b}}\) of \(\mathsf{b}\) onto \(\mathrm{Col}(\mathsf{A})\) by \(\hat{\mathsf{b}} = \mathsf{A} \hat{x}\).
- Get the residual \(\mathsf{r} = \mathsf{b} - \hat{\mathsf{b}}\) and compute its length.
7.6 Least-Squares Polynomials
(Work on in class on Tuesday July 6)
Here is a quadratic least-squares polynomial fit to some data. Make a cubic, quartic, and quintic fit to this same data. Turn in a plot of each. Compute the length of the residual in each case. Which do you think is the best model of the data?
= c(1,2,3,4,5,6)
x = c(7,2,1,3,7,7)
y A = cbind(x^0,x,x^2)) (
## x
## [1,] 1 1 1
## [2,] 1 2 4
## [3,] 1 3 9
## [4,] 1 4 16
## [5,] 1 5 25
## [6,] 1 6 36
= solve(t(A)%*%A,t(A)%*%y)
xhat = A %*% xhat
yhat = y - yhat
r t(r) %*% r
## [,1]
## [1,] 11.26429
#plot the original set of points
plot(x,y,pch=19,xlim=c(0,7),ylim=c(0,10), main='the best-fit quadratic function')
# generate points for the fitted line and plot it
= seq(0,7,len=100)
tt lines(tt,xhat[1]+xhat[2]*tt+xhat[3]*tt^2,col='blue')
# add the residuals to the plot
for (i in 1:length(x)) {lines(c(x[i],x[i]),c(y[i],yhat[i]), col='red')}
#add yhat to the plot
points(x,yhat,pch=19,col='orange')
#put the original points back on the plot last so we can see them
points(x,y,pch=19,col="black")
grid()
7.7 Fuel Efficiency
(Work on in class on Wednesday July 7)
Below is a classic data set of fuel efficiency in 38 different automobiles.
## lbs HP Cyl MPG
## BuickEstateWagon 3967.60 155 8 16.9
## FordCountrySquireWagon 3689.14 142 8 15.5
## ChevyMalibuWagon 3280.55 125 8 19.2
## ChryslerLeBaronWagon 3585.40 150 8 18.5
## Chevette 1961.05 68 4 30.0
## ToyotaCorona 2329.60 95 4 27.5
## Datsun510 2093.00 97 4 27.2
## DodgeOmni 2029.30 75 4 30.9
## Audi5000 2575.30 103 5 20.3
## Volvo240GL 2857.40 125 6 17.0
## Saab99GLE 2543.45 115 4 21.6
## Peugeot694SL 3103.10 133 6 16.2
## BuickCenturySpecial 3075.80 105 6 20.6
## MercuryZephyr 2793.70 85 6 20.8
## DodgeAspen 3294.20 110 6 18.6
## AMCConcordD/L 3103.10 120 6 18.1
## ChevyCapriceClassic 3494.40 130 8 17.0
## FordLTD 3389.75 129 8 17.6
## MercuryGrandMarquis 3599.05 138 8 16.5
## DodgeStRegis 3485.30 135 8 18.2
## FordMustang4 2352.35 88 4 26.5
## FordMustangGhia 2648.10 109 6 21.9
## MazdaGLC 1797.25 65 4 34.1
## DodgeColt 1742.65 80 4 35.1
## AMCSpirit 2429.70 80 4 27.4
## VWScirocco 1810.90 71 4 31.5
## HondaAccordLX 1942.85 68 4 29.5
## BuickSkylark 2429.70 90 4 28.4
## ChevyCitation 2361.45 115 6 28.8
## OldsOmega 2457.00 115 6 26.8
## PontiacPhoenix 2325.96 90 4 33.5
## PlymouthHorizon 2002.00 70 4 34.2
## Datsun210 1838.20 65 4 31.8
## FiatStrada 1938.30 69 4 37.3
## VWDasher 1992.90 78 4 30.5
## Datsun810 2561.65 97 6 22.0
## BMW320i 2366.00 110 4 21.5
## VWRabbit 1925.00 71 4 31.9
- Fit a linear model of the form \[ mpg = a_0 + a_1 lbs + a_2 HP + a_3 Cyl. \] Find the coefficients \(a_0,a_1,a_2,a_3\) and the length of the residual. If you have taken Stat 155, you can see that we are doing the exact same thing by comparing your results with
##
## Call:
## lm(formula = MPG ~ lbs + HP + Cyl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4669 -1.6011 -0.3246 1.0759 6.6231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.644579 1.992433 24.917 < 2e-16 ***
## lbs -0.008288 0.002316 -3.579 0.00106 **
## HP -0.073961 0.043862 -1.686 0.10091
## Cyl 0.791590 0.730326 1.084 0.28604
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.791 on 34 degrees of freedom
## Multiple R-squared: 0.833, Adjusted R-squared: 0.8183
## F-statistic: 56.55 on 3 and 34 DF, p-value: 2.656e-13
Add the cars weight in tons to your model and solve \(mpg = a_0 + a_1 (lbs) + a_2 (HP) + a_3 (Cyl) + a_4 (tons).\) Compare the coefficients you get with those that you got in part a. Give a short explanation of what you see using some of the linear algebra language that we have learned in the course.
The residual vector \(\mathsf{r}\) measures the quality of fit of our model. But how do we turn this into a meaningful quantity? One method is to look at the coefficient of determination, which is more commonly refered to as the “\(R^2\) value.”
You can see the \(R^2\) value of your fit in part (a) under the “Multiple R-squared” output in the linear model summary above.
If \(\mathsf{y} = [ y_1, y_2, \ldots, y_n ]^{\top}\) is our target vector with least-squares solution \(\hat{\mathsf{y}} = A \hat{\mathsf{x}}\) and residual vector is \(\mathsf{r} = \mathsf{y} - \hat{\mathsf{y}}\). Let \[ a = \frac{1}{n} ( y_1 + y_2 + \cdots + y_n) \] be the average or mean of the entries of target vector \(\mathsf{y}\) and let \(\mathsf{y}^* = [a, a, \ldots, a]\). (We call this vector “y star”, so
ystar
would be a fine name in R.) The \(R^2\) value is \[ R^2 = 1 - \frac{\| \mathsf{y} - \hat{\mathsf{y}} \|^2 }{\| \mathsf{y} - \mathsf{y}^* \|^2} = 1 - \frac{\| \mathsf{r} \|^2}{\| \mathsf{y} - \mathsf{y}^* \|^2}. \]The \(R^2\) value is a number in \([0,1]\). The squared-length \(|| \mathsf{y} -\mathsf{y}^*||^2\) is the total variance: that is, how much the data varies from the mean, and \(\frac{\| \mathsf{r} \|^2}{\| \mathsf{y} - \mathsf{y}^* \|^2}\) tells us the fraction of the total variance that is explained by our model. Thus, if \(R^2\) is near 1, then our model does a good job at “explaining” the behavior of \(\mathsf{y}\) via a linear combination of the columns of \(A\).
To do: Find the \(R^2\) value for our least squares solution to the cars data in part (a). Here are some helpful functions: +
mean(vec)
returns the mean (average) of the entries of the vectorvec
+rep(a, n)
creates a constant vector of length \(n\) where every entry is \(a\). +Norm(vec)
from thepracma
package returns the magnitude (Euclidean length) of the vectorvec
. To learn more, you should take STAT 155: Introduction to Statistical Modeling.
7.8 Fourier Analysis
(Work on in class on Wednesday July 7)
In Fourier analysis one uses trigonometric functions to model oscillatory behavior in data. These methods have important applications in the study of sound or video signals, financial data, medicine, and engineering (to mention just a few). For example, consider the following set of 200 data points.
A first Fourier approximation would fit a model of the form \[ f_1(t) = c_0 + c_1 \sin(t) + c_2 \cos(t). \] Thus, we make the following matrix (we show here only the first 10 rows; there are 200 rows).
## [,1] [,2] [,3]
## [1,] 1 0.00000000 1.0000000
## [2,] 1 0.09983342 0.9950042
## [3,] 1 0.19866933 0.9800666
## [4,] 1 0.29552021 0.9553365
## [5,] 1 0.38941834 0.9210610
## [6,] 1 0.47942554 0.8775826
## [7,] 1 0.56464247 0.8253356
## [8,] 1 0.64421769 0.7648422
## [9,] 1 0.71735609 0.6967067
## [10,] 1 0.78332691 0.6216100
Now we solve the normal equations
## [,1]
## [1,] 3.9971143
## [2,] 1.0207277
## [3,] -0.4486618
and plot the solution Your task:
- Update this to add the second Fourier coefficient terms by fitting the following function to the data. Plot your result. \[ f_2(t) = c_0 + c_1 \sin(t) + c_2 \cos(t) + c_3 \sin(2t) + c_4 \cos(2t) \]
- Compute the length of the residul vector for both the \(f_1(t)\) and the \(f_2(t)\) model. Which approximation looks better visually. That is, does the second approximation capture more of the shape of the data, or do you think that the first is a better model?