Math 152 - Statistical Theory - Homework 7
Important Note:
You should work to turn in assignments that are clear, communicative, and concise. Part of what you need to do is not print pages and pages of output. Additionally, you should remove these exact sentences and the information about HW scoring below.
Click on the Knit to PDF icon at the top of R Studio to run the R code and create a PDF document simultaneously. [PDF will only work if either (1) you are using R on the network, or (2) you have LaTeX installed on your computer. Lightweight LaTeX installation here: https://yihui.name/tinytex/]
Either use the college’s RStudio server (https://rstudio.pomona.edu/) or install R and R Studio on to your personal computer. See: https://m152-stat-theory.netlify.app/syllabus.html for resources.
Assignment
Goals:
In this assignment, the fun will include:
- making frequentist confidence intervals
- making Bayesian posterior (credible) intervals
Book problems
- Feel free to do the book problems with a pencil or in LaTeX (RMarkdown supports writing mathematics using LaTeX).
- If you use a pencil, you can take a picture of the problem(s), and include the image(s) using (remove the tick marks to make it work):

Note that myimage.jpeg needs to live in the same folder as the relevant .Rmd file (maybe you called the folder “math 152 hw” and put it on your desktop?)
Saving as jpg, jpeg, png, or pdf should work, but make sure to specify the exact name of the file.
If you have the 3rd edition of the book, the problems will be the same unless they don’t exist – that is, the 4th edition added problems but didn’t change the order of them. Ask me if you want to see the 4th edition problems.
Assignment
1: Community Q
Describe one thing you learned (not from lecture, maybe from working in pairs during class) from a member of the class (student, mentor, professor) – it could be: content, logistical help, background material, R information, etc. 1-3 sentences.
2: 8.5.6
Suppose that
Hint: Determine constants
Also, see Theorem 5.7.7 and Definition 8.2.1. And Exercise 5.7.1 could be helpful.
3: 8.5.7
In the June 1986 issue of Consumer Reports, some data on the calorie content of beef hot dogs is given. Here are the numbers of calories (kcal) in 20 different hot dog brands:
hotdogs <- c(186, 181, 176, 149, 184, 190, 158, 139, 175, 148,
152, 111, 141, 153, 190, 157, 131, 149, 135, 132)Assume that these numbers are the observed values from a random sample of twenty independent normal random variables with mean mean() and sd().)
4: 8.6.5
Suppose that two random variables
5: 8.6.8
Suppose that two random variables
6: 8.6.9
Using the prior and data in the numerical example on nursing homes in New Mexico in this section, find
- the shortest possible interval such that the posterior probability that
lies in the interval is 0.90, and - the shortest possible confidence interval for
for which the confidence coefficient is 0.90.
7: R - confidence interval coverage
Note Because of this week’s exam, I have included all of the R code for this problem. There is no R code for you to write! However, you should practice interpreting the results in a few sentences for each question.
How well do frequentist confidence intervals actually capture the parameter of interest? What happens when we forget to use a t-multiplier and use a standard normal multiplier instead? First, let’s see what happens when we correctly use the t-multiplier. Remember, we’re talking about sampling distributions which means we’ll have to take LOTS OF SAMPLES and look at many different confidence intervals.
- Comment on the coverage rate of a standard t-interval for the population mean,
. [Bigger n.samps will probably give you more information.]
set.seed(47)
n.samps <- 10000 # you might get more info by taking more samples
n.obs <- 10 # what happens if you increase the sample size?
mymean <- c() # place holder
myvar <- c() # place holder
conf.level <- 0.95
mu <- 47
sigma <- 4
for (i in 1:n.samps) {
mysample <- rnorm(n.obs, mu, sigma) #note, mean is mu, sd is sigma
mymean <- c(mymean, mean(mysample))
myvar <- c(myvar, var(mysample))
}
upper.CI <- mymean - qt((1 - conf.level)/2, n.obs - 1) * sqrt(myvar)/sqrt(n.obs)
lower.CI <- mymean + qt((1 - conf.level)/2, n.obs - 1) * sqrt(myvar)/sqrt(n.obs)
sum(upper.CI < mu) # inside is a TRUE/FALSE vector, `sum()` counts the number of TRUE statements[1] 244
sum(lower.CI > mu)[1] 249
- Repeat a. above but change qt to use the quantile (multiplier) for a normal distribution instead of a t distribution. What is the new coverage rate? Why does that make sense?
set.seed(7474)
n.samps <- 10000 # you might get more info by taking more samples
n.obs <- 10 # what happens if you increase the sample size?
mymean <- c() # place holder
myvar <- c() # place holder
conf.level <- 0.95
mu <- 47
sigma <- 4
for (i in 1:n.samps) {
mysample <- rnorm(n.obs, mu, sigma) #note, mean is mu, sd is sigma
mymean <- c(mymean, mean(mysample))
myvar <- c(myvar, var(mysample))
}
upper.CI <- mymean - qnorm((1 - conf.level)/2, 0, 1) * sqrt(myvar)/sqrt(n.obs)
lower.CI <- mymean + qnorm((1 - conf.level)/2, 0, 1) * sqrt(myvar)/sqrt(n.obs)
sum(upper.CI < mu) # inside is a TRUE/FALSE vector, `sum()` counts the number of TRUE statements[1] 389
sum(lower.CI > mu)[1] 460
- Repeat a. and b. for a sample of size 100 (
n.obs <- 100). Also, report the actual multipliers (the output ofqt()andqnorm()). How does sample size play a role in coverage rate?
set.seed(4774)
n.samps <- 10000 # you might get more info by taking more samples
n.obs <- 100 # what happens if you increase the sample size?
mymean <- c() # place holder
myvar <- c() # place holder
conf.level <- 0.95
mu <- 47
sigma <- 4
for (i in 1:n.samps) {
mysample <- rnorm(n.obs, mu, sigma) #note, mean is mu, sd is sigma
mymean <- c(mymean, mean(mysample))
myvar <- c(myvar, var(mysample))
}
upper.CI <- mymean - qt((1 - conf.level)/2, n.obs - 1) * sqrt(myvar)/sqrt(n.obs)
lower.CI <- mymean + qt((1 - conf.level)/2, n.obs - 1) * sqrt(myvar)/sqrt(n.obs)
sum(upper.CI < mu)[1] 240
sum(lower.CI > mu)[1] 236
(n.samps - sum(upper.CI < mu) - sum(lower.CI > mu) )/ n.samps[1] 0.9524
n.samps <- 10000 # you might get more info by taking more samples
n.obs <- 100 # what happens if you increase the sample size?
mymean <- c() # place holder
myvar <- c() # place holder
conf.level <- 0.947
mu <- 47
sigma <- 4
for (i in 1:n.samps) {
mysample <- rnorm(n.obs, mu, sigma) #note, mean is mu, sd is sigma
mymean <- c(mymean, mean(mysample))
myvar <- c(myvar, var(mysample))
}
upper.CI <- mymean - qnorm((1 - conf.level)/2, 0, 1) * sqrt(myvar)/sqrt(n.obs)
lower.CI <- mymean + qnorm((1 - conf.level)/2, 0, 1) * sqrt(myvar)/sqrt(n.obs)
sum(upper.CI < mu)[1] 282
sum(lower.CI > mu)[1] 264
(n.samps - sum(upper.CI < mu) - sum(lower.CI > mu) )/ n.samps[1] 0.9454