Math 152 - Statistical Theory - Homework 5
Important Note:
You should work to turn in assignments that are clear, communicative, and concise. Part of what you need to do is not print pages and pages of output. Additionally, you should remove these exact sentences and the information about HW scoring below.
Click on the Knit to PDF icon at the top of R Studio to run the R code and create a PDF document simultaneously. [PDF will only work if either (1) you are using R on the network, or (2) you have LaTeX installed on your computer. Lightweight LaTeX installation here: https://yihui.name/tinytex/]
Either use the college’s RStudio server (https://rstudio.pomona.edu/) or install R and R Studio on to your personal computer. See: https://research.pomona.edu/johardin/math152f20/ for resources.
Assignment
Goals:
In this assignment, the fun will include:
- practicing that functions of MLEs are MLEs
- finding statistics with chi-square distributions
- revisiting the tanks
Book problems
- Feel free to do the book problems with a pencil or in LaTeX (RMarkdown supports writing mathematics using LaTeX).
- If you use a pencil, you can take a picture of the problem(s), and include the image(s) using (remove the tick marks to make it work):

Note that myimage.jpeg needs to live in the same folder as the relevant .Rmd file (maybe you called the folder “math 152 hw” and put it on your desktop?)
Saving as jpg, jpeg, png, or pdf should work, but make sure to specify the exact name of the file.
If you have the 3rd edition of the book, the problems will be the same unless they don’t exist – that is, the 4th edition added problems but didn’t change the order of them. Ask me if you want to see the 4th edition problems.
Assignment
1: Community Q
Describe one thing you learned (not from lecture, maybe from working in pairs during class) from a member of the class (student, mentor, professor) – it could be: content, logistical help, background material, R information, etc. 1-3 sentences.
2: 7.6.5
Suppose that
3: 7.6.6
Suppose that
4. 7.6.21
Show that the method of moments estimators of the mean and variance of a normal distribution are also the MLEs. (You will need to calculate both the MLE and the MOM for both
5: 7.6.23
Suppose that
- Find the method of moments estimator for
. (The solution is provided because it’s mostly algebra, and the more important question here is part b.)
Solution to a.:
- The expectations of
and (i.e., the first two theoretical moments) are respectively and . We set these equal to the sample moments and and solve for and .
Setting the sample moments equal to the theoretical moments we get,
- Show that the method of moments estimator is not the MLE. (Note: you do not need to actually solve for the MLE, just convince your reader that they can’t be the same.)
6: 8.1.1
Suppose that a random sample
for all possible
7: 8.1.6
For the conditions of Exercise 5, use the central limit theorem in Sec. 6.3 to find approximately the size of a random sample that must be taken in order that
8: 8.2.10
Suppose that six random variables
Determine a value of
9: R - Tanks
Consider the tank problem encountered in class. Your task at hand is to provide the best possible estimator for the true number of tanks. Consider the R code below that analyzes two estimators (
- a comparison of at least 5 estimators (of your choice!, add to the two I gave, or replace them and use 5 of your own)
- consideration of sample bias, sample variance, sample median, sample mean, sample MSE
- run the entire analysis twice. Once with a sample size of n=5, once with a sample size of n=100
Hint on the R code: create new estimators one by one by adding them on to the two which I wrote.
- add more functions inside of the
calculate_Nfunction - make sure that you output (output of the
calculate_Nfunction) all of the new statistics - when you
pivot_longerbe sure to get all of the columns. The colon gives a range of columns (which will be fromxbar2tosamp.maxif you add the statistics between them). Ask if this doesn’t make sense.
# a function for calculating the estimates of N based on a sample of size nsamp
calculate_N <- function(nsamp,npop){
mysample <- sample(1:npop,nsamp,replace=F) # what does this line do?
xbar2 <- 2 * mean(mysample) - 1
# add more functions here between xbar2 and samp.max
samp.max <- max(mysample)
data.frame(xbar2, samp.max, nsamp, npop) # output: make sure to include all of your statistics
}
reps <- 2
nsamp_try <- 5
npop_try <- 447
map_df(1:reps, ~map2(nsamp_try, npop_try, calculate_N)) # running the function 2 times. xbar2 samp.max nsamp npop
1 517.8 409 5 447
2 425.8 297 5 447
# what do you see?reps <- 1000 # now let's do it for real and save the results
results <- map_df(1:reps, ~map2(nsamp_try, npop_try, calculate_N))
# what do the results look like? let's look at the top 10 rows:
results %>% head(10) xbar2 samp.max nsamp npop
1 615.0 410 5 447
2 647.4 384 5 447
3 439.4 304 5 447
4 526.2 393 5 447
5 428.6 411 5 447
6 375.0 290 5 447
7 509.4 421 5 447
8 542.6 379 5 447
9 430.6 366 5 447
10 335.4 323 5 447
# making the results long instead of wide:
results_long <- results %>%
pivot_longer(cols = xbar2:samp.max, # a range of columns (from xbar2 to samp.max)
names_to = "estimator", # the column names will go into a new column called estimator
values_to = "estimate") # the values will go into a new column called estimate
# how is results different from results_long? let's look at the top 10 rows:
results_long %>% head()# A tibble: 6 × 4
nsamp npop estimator estimate
<dbl> <dbl> <chr> <dbl>
1 5 447 xbar2 615
2 5 447 samp.max 410
3 5 447 xbar2 647.
4 5 447 samp.max 384
5 5 447 xbar2 439.
6 5 447 samp.max 304
results_long %>%
group_by(nsamp, npop, estimator) %>%
summarize(mean = mean(estimate), # calculating the evaluation metrics, do you
median = median(estimate), # understand these lines of code?
bias = mean(estimate - npop),
var = var(estimate),
mse = (mean(estimate - npop))^2 + var(estimate))# A tibble: 2 × 8
# Groups: nsamp, npop [1]
nsamp npop estimator mean median bias var mse
<dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5 447 samp.max 374. 392 -72.6 3897. 9173.
2 5 447 xbar2 443. 443 -3.60 12945. 12957.
results_long %>%
ggplot(aes(x = estimate)) +
geom_histogram() +
geom_vline(aes(xintercept = npop), color = "red") +
facet_grid(nsamp ~ estimator) +
ggtitle("sampling distributions of estimators of N, pop size = 447")