Math 152 - Statistical Theory - Homework 9

Author

write your name here

Published

Invalid Date

Important Note:

You should work to turn in assignments that are clear, communicative, and concise. Part of what you need to do is not print pages and pages of output. Additionally, you should remove these exact sentences and the information about HW scoring below.

Click on the Knit to PDF icon at the top of R Studio to run the R code and create a PDF document simultaneously. [PDF will only work if either (1) you are using R on the network, or (2) you have LaTeX installed on your computer. Lightweight LaTeX installation here: https://yihui.name/tinytex/]

Either use the college’s RStudio server (https://rstudio.pomona.edu/) or install R and R Studio on to your personal computer. See: https://m152-stat-theory.netlify.app/syllabus.html for resources.

Assignment

Goals:

In this assignment, the fun will include:

calculating Fisher information
setting up hypotheses
calculationg power and error rates.

Book problems

Feel free to do the book problems with a pencil or in LaTeX (RMarkdown supports writing mathematics using LaTeX).
If you use a pencil, you can take a picture of the problem(s), and include the image(s) using (remove the tick marks to make it work):

![](myimage.jpeg)

Note that myimage.jpeg needs to live in the same folder as the relevant .Rmd file (maybe you called the folder “math 152 hw” and put it on your desktop?)
Saving as jpg, jpeg, png, or pdf should work, but make sure to specify the exact name of the file.
If you have the 3rd edition of the book, the problems will be the same unless they don’t exist – that is, the 4th edition added problems but didn’t change the order of them. Ask me if you want to see the 4th edition problems.

Assignment

1: Community Q

Describe one thing you learned (not from lecture, maybe from working in pairs during class) from a member of the class (student, mentor, professor) – it could be: content, logistical help, background material, R information, etc. 1-3 sentences.

2: 8.8.4

Suppose that a random variable has the normal distribution with mean 0 and unknown standard deviation $σ > 0$ . Find the Fisher information $I (σ)$ in $X$ .

3: 8.8.5

Suppose that a random variable X has the normal distribution with mean 0 and unknown variance $σ^{2} > 0$ . Find the Fisher information $I (σ^{2})$ in $X$ . Note that in this exercise the variance $σ^{2}$ is regarded as the parameter, whereas in Exercise 8.8.4 the standard deviation $σ$ is regarded as the parameter.

Also, show that the unbiased estimator of $σ^{2}$ ( $s^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \overset{―}{X})^{2}$ ) is not efficient.

Note to self: there is no simple way to get from problem 8.8.4 to 8.8.5. That is, we know the MLE of a function is that function of the MLE. Not true for information.

4: 8.8.16

Suppose that $X_{1}, \dots, X_{n}$ form a random sample from the Bernoulli distribution with unknown parameter $p$ , and the prior pdf of $p$ is a positive and differentiable function over the interval $0 < p < 1$ . Suppose, furthermore, that $n$ is large, the observed values of $X_{1}, \dots, X_{n}$ are $x_{1}, \dots, x_{n}$ , and $0 < \overset{―}{x} < 1$ . Show that the posterior distribution of $p$ will be approximately a normal distribution with mean $\overset{―}{x}$ and variance $\overset{―}{x} (1 - \overset{―}{x}) / n$ .

Note to self: this is a Bayesian problem. See the connection between Fisher Information and asymptotic / approximate Bayesian distributions in the text.

5: 8.9.15

Suppose that $X_{1}, \dots, X_{n}$ form a random sample from a distribution for which the pdf is as follows:

$f_{x} (x | θ) = {\begin{cases} θ x^{θ - 1} & 0 < x < 1 \\ 0 & otherwise \end{cases}$

where the value of $θ$ is unknown ( $θ > 0$ ). Determine the asymptotic distribution of the MLE of $θ$ .

6: 9.1.1

Let $X$ have the exponential distribution with parameter $β$ . Suppose that we wish to test the hypotheses $H_{0} : β \geq 1$ versus $H_{1} : β < 1.$

$f (x | β) = β e^{- β x} x > 0$

Consider the test procedure $δ$ that rejects $H_{0}$ if $X \geq 1.$

Determine the power function of the test.
Compute the size of the test.

7: 9.1.6

Suppose that a single observation $X$ is to be taken from the uniform distribution on the interval $[θ - \frac{1}{2}, θ + \frac{1}{2}]$ , and suppose that the following hypotheses are to be tested:

$H_{0} : θ \leq 3,$
$H_{1} : θ \geq 4.$

Construct a test procedure $δ$ for which the power function has the following values: $π (θ | δ) = 0$ for $θ \leq 3$ and $π (θ | δ) = 1$ for $θ \geq 4$ .

8: 9.1.9

Assume that $X_{1}, \dots, X_{n}$ are i.i.d. with the normal distribution that has mean $μ$ and variance 1. Suppose that we wish to test the hypotheses:

$H_{0} : μ \geq μ_{0},$
$H_{1} : μ < μ_{0} .$

Find a test statistic $T$ such that, for every $c$ , the test $δ_{c}$ that rejects $H_{0}$ when $T \geq c$ has power function $π (μ | δ_{c})$ that is decreasing in $μ$ .

9: R - $N (θ, θ^{2})$

We are going to study almost, but not exactly, the same model as from the exam. The model for this problem is normal with mean and standard deviation both $θ$ (i.e., variance $θ^{2}$ , not $θ$ as in the example from the exam). Therefore, we know $θ \geq 0$ .

The results from class about properties of MLEs are asymptotic. What happens in small samples?

The estimators of $θ$ we wish to compare are:

the sample median
the sample mean
the sample standard deviation times the sign of the sample mean
the MLE

The MLE of $θ$ is $\hat{θ} = - \overset{―}{x} / 2 + \sqrt{(\sum x_{i}^{2}) / n + {\overset{―}{x}}^{2} / 4}$ . Show (with pencil / LaTeX) that the Fisher Information in $θ$ is $3 / θ^{2}$ .
Use a simulation to compare the four estimators above with respect to bias, variance, and MSE. Answer the following questions in your comparison:
1. Which estimator is (empirically) least biased?
2. Which estimator has lowest empirical variability? Do any of the estimators reach the CRLB (assume unbiasedness)?
3. Which estimator has lowest empirical MSE?
4. Are you comfortable with the idea of using a normal distribution to describe the sampling distribution for any/all of the estimators? Explain.
5. Which estimator would you recommend for the given setting? Explain.

# Use sample size n = 15. keep this set-up in the first 3 lines
n.obs <- 15
n.samps <- 10000
theta <- exp(1)


means <- numeric(n.samps)
medians <- numeric(n.samps)
sds <- numeric(n.samps)
MLEs <- numeric(n.samps)


for (i in 1:n.samps){
  # generate some data from the given model
  # means[i] <- mean(the data you generated)
  # etc.
}

You can write alternative code for calculating relevant characteristics of the distribution and displaying it, but I choose to put it together in a tidy framework like this:

Note: in the code below I’ve created smoothed histograms (called density plots) so as to plot the empirical distributions on top of one another.

library(tidyverse)

est.info <- data.frame(value = c(means , ___, ...), 
                       type = c(rep("mean", n.samps), ____, ...) )

est.info %>%
  group_by(type) %>%
  summarize(est.bias = mean(value) - theta, est.mean = mean(value), 
            est.var = var(value), est.sd = sd(value)) %>%
  mutate(est.mse = est.var + est.bias^2)


est.info %>%
  ggplot(aes(x=value, color = type)) + geom_density() +
  geom_vline(xintercept = exp(1))

Reuse

https://creativecommons.org/licenses/by/4.0/

Important Note:

Assignment

Goals:

Assignment

1: Community Q

2: 8.8.4

3: 8.8.5

4: 8.8.16

5: 8.9.15

6: 9.1.1

7: 9.1.6

8: 9.1.9

9: R - N(θ,θ2)

Reuse

9: R - $N (θ, θ^{2})$