Clicker Q
to go with Probability & Statistics by DeGroot and Schervish. Math 152 - Statistical Theory.
- The Central Limit Theorem (CLT) says:1
- The sample average (statistic) converges to the true average (parameter)
- The sample average (statistic) converges to some point
- The distribution of the sample average (statistic) converges to a normal distribution
- The distribution of the sample average (statistic) converges to some distribution
- I have no idea what the CLT says
- Which cab company was involved (see example 2.2 in the notes)?2
- Very likely the Blue Cab company
- Sort of likely the Blue Cab company
- Equally likely Blue and Green Cab companies
- Sort of likely the Green Cab company
- Very likely the Green Cab company
- Consider a continuous probability density function (pdf) given by \(f( x | \theta ).\) Which of the following is FALSE:3
- \(f( x | \theta ) = P(X = x | \theta)\)
- \(f( x | \theta )\) provides info for calculating probabilities of X.
- \(P(X = x) = 0\) if X is continuous.
- \(f( x | \theta ) = L(\theta | x)\) is the likelihood function
- To find a marginal distribution of X from a joint distribution of X & Y you should (assume everything is continuous),4
- differentiate the joint distribution with respect to X.
- differentiate the joint distribution with respect to Y.
- integrate the joint distribution with respect to X.
- integrate the joint distribution with respect to Y.
- I have no idea what a marginal distribution is.
- A continuous pdf (of a random variable \(X\) with parameter \(\theta\)) should5
- Integrate to a constant (\(dx\))
- Integrate to a constant (\(d\theta\))
- Integrate to 1 (\(dx\))
- Integrate to 1 (\(d\theta\))
- not need to integrate to anything special.
- R / R Studio
- all good
- started, progress is slow and steady
- started, very stuck
- haven’t started yet
- what do you mean by “R”?
- In terms of the R for the homework…
- I was able to do the whole thing.
- I understood the code part, but I couldn’t get the Markdown file to compile.
- I didn’t understand the code at all.
- I couldn’t get R or R Studio installed.
- I haven’t tried to work on the homework yet.
- A beta distribution6
- has support on [0,1]
- has parameters \(\alpha\) and \(\beta\) which represent, respectively, the mean and variance
- is discrete
- has equal mean and variance
- has equal mean and standard deviation
- What types of distributions are the following?7
- prior = marginal & posterior = joint
- prior = joint & posterior = conditional
- prior = conditional & posterior = joint
- prior = marginal & posterior = conditional
- prior = joint & posterior = marginal
- Which of these are incorrect conclusions?8
- \(\theta | \underline{X} \sim\) Beta (4,12)
- \(\xi(\theta | \underline{X}) \sim\) Beta (4,12)
- \(\xi(\theta | \underline{X}) \propto\) Beta (4,12)
- \(\xi(\theta | \underline{X}) \propto \theta^{4-1} (1-\theta)^{12-1}\)
- \(\xi(\theta | \underline{X}) = \frac{1}{B(4,12)} \theta^{4-1}(1-\theta)^{12-1}\)
- What is the integrating constant for the pdf, \(h(w)\)?9
- \(\frac{\Gamma(w+k)}{\Gamma(w)\Gamma(k)}\)
- 1/[\(w^k \Gamma(k)\)]
- 1 / \(\sqrt{2\pi k^2}\)
- 1/[\(\Gamma(k/2)\)]
- 1/[\(2^{k/2} \Gamma(k/2)\)]
\[h(w) \propto w^{k/2-1}e^{-w/2} \ \ \ \ \ \ \ \ \ w>0\]
- Suppose the data come from an exponential distribution with a parameter whose prior is given by a gamma distribution. The posterior is known to be conjugate, so its distribution must be in what family?10
- exponential
- gamma
- normal
- beta
- Poisson
- A prior is improper if11
- it conveys no real information.
- it isn’t conjugate.
- it doesn’t integrate to one.
- it swears a lot.
- it isn’t on your distribution sheet.
- Given a prior: \(\theta \sim N(\mu_0, \nu_0^2)\)
And a data likelihood: \(X | \theta \sim N(\theta, \sigma^2)\)
You collect n data values, what is your best guess of \(\theta?\)12- \(\overline{X}\)
- \(\mu_0\)
- \(\mu_1 = \frac{\sigma^2 \mu_0 + n \nu_0^2 \overline{X}}{\sigma^2 + n \nu_0^2}\)
- median of \(N(\mu_1, \nu_1^2 = \frac{\sigma^2 \nu_0^2}{\sigma^2 + n \nu_0^2})\)
- 47
- The Bayes estimator is sensitive to13
- the posterior mean
- the prior mean
- the sample size
- the data values
- some of the above
- The range (output) of the Bayesian MSE includes:14
- theta
- the data
- The range (output) of the frequentist MSE includes:15
- theta
- the data
- To find the maximum likelihood estimator, we take the derivative of the likelihood16
- with respect to \(X\)
- with respect to \(\underline{X}\)
- with respect to \(\theta\)
- with respect to \(f\)
- with respect to \(\ln(f)\)
- Consider an MLE, \(\hat{\theta},\) and the related log likelihood function \(L = \ln(f).\) \(\delta(X)\) is another estimate of \(\theta\). Which statement is necessarily false:17
- L(\(\delta(X)\)) < L(\(\theta\))
- L(\(\hat{\theta}\)) < L(\(\theta\))
- L(\(\theta\)) < L(\(\delta(X)\))
- L(\(\delta(X)\)) < L(\(\hat{\theta}\))
- L(\(\theta\)) < L(\(\hat{\theta}\))
- The MLE is popular because it18
- maximizes \(R^2\)
- minimizes the sum of squared errors
- has desirable sampling distribution properties
- maximizes both the likelihood and the log likelihood
- always exists
- MOM is popular because it:19
- has desirable sampling properties
- is often straightforward to compute
- always produces values inside the parameter space (e.g., in [0,1] for prob)
- always exists
- The Central Limit Theorem (CLT) says:20
- The sample average (statistic) converges to the true average (parameter)
- The sample average (statistic) converges to some point
- The distribution of the sample average (statistic) converges to a normal distribution
- The distribution of the sample average (statistic) converges to some distribution
- I have no idea what the CLT says
- A sampling distribution is21
- the true distribution of the data
- the estimated distribution of the data
- the distribution of the population
- the distribution of the statistic in repeated samples
- the distribution of the statistic from your one sample of data
- The distribution of a random variable can be uniquely determined by22
- the cdf: F(x)
- the pdf (pmf): f(x)
- the moment generating function (mgf), if it exists: \(\Psi(t) = E[e^{tX}]\)
- the mean and variance of the distribution
- more than one of the above (which ones??)
- A moment generating function23
- gives the probability of the RV at any value of X
- gives all theoretical moments of the distribution
- gives all sample moments of the data
- gives the cumulative probability of the RV at any value of X
- The sampling distribution is important because24
- it describes the behavior (distribution) of the statistic
- it describes the behavior (distribution) of the data
- it gives us the ability to measure the likelihood of the statistic or more extreme under particular settings (i.e. null)
- it gives us the ability to make inferences about the population parameter
- more than one of the above (which ones??)
- The following result: \(\frac{\sum_{i=1}^n (X_i - \overline{X})^2}{\sigma^2} \sim \chi^2_{n-1}\) allows us to isolate and conduct inference on what parameter?25
- \(\overline{X}\)
- \(s\)
- \(\mu\)
- \(\sigma^2\)
- \(\chi\)
- The following result: \(\frac{\overline{X} - \mu}{s/\sqrt{n}} \sim t_{n-1}\) allows us to isolate and conduct inference on what parameter?26
- \(\overline{X}\)
- \(s\)
- \(\mu\)
- \(\sigma^2\)
- \(\chi\)
- What would you expect the standard deviation of the t statistic to be?27
- a little bit less than 1
- 1
- a little bit more than 1
- unable to tell because it depends on the sample size and the variability of the data
- You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples. What is the sample size of each bootstrap sample?28
- 50
- 1000
- You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples. How many bootstrap statistics will you have?29
- 50
- 1000
- The bootstrap distribution of \(\hat{\theta}\) is centered around the30
- population parameter
- sample statistic
- bootstrap statistic
- bootstrap parameter
- The bootstrap theory relies on31
- Resampling with replacement from the original sample.
- Resampling from the original sample, leaving one observation out each time (e.g., cross validation)
- Estimating the population using the sample.
- Permuting the data values within the sample.
- Bias of a statistic refers to32
- The difference between a statistic and the actual parameter
- Whether or not questions were worded fairly.
- The difference between a sampling distribution mean and the actual parameter.
- The mean of a sample is 22.5. The mean of 1000 bootstrapped samples is 22.491. The bias of the bootstrap mean is33
- -0.009
- -0.0045
- -0.09
- 0.009
- 0.09
- The following result: \(\frac{\sum_{i=1}^n (X_i - \overline{X})^2}{\sigma^2} \sim \chi^2_{n-1}\) allows us to isolate and conduct inference on what parameter?34
- \(\overline{X}\)
- \(s\)
- \(\mu\)
- \(\sigma^2\)
- \(\chi\)
- The following result: \(\frac{\overline{X} - \mu}{s/\sqrt{n}} \sim t_{n-1}\)
allows us to isolate and conduct inference on what parameter?35- \(\overline{X}\)
- \(s\)
- \(\mu\)
- \(\sigma^2\)
- \(\chi\)
- Consider an asymmetric confidence interval for \(\sigma\) which is derived using:
\(P(c_1 \leq \frac{\sum_{i=1}^{n}(X_i - \overline{X})^2}{\sigma^2} \leq c_2) = 0.95\)
The resulting 95% interval with the shortest width has:36- \(c_1\) and \(c_2\) as the .025 & .975 quantiles
- \(c_1\) set to zero
- \(c_2\) set to infinity
- \(c_1\) and \(c_2\) as different quantiles than (a) but that contain .95 probability.
- Find \(c_1\) and let \(c_2 = -c_1\)
- A 90% CI for the average number of chocolate chips in a Chips Ahoy cookie is: [3.7 chips, 17.2 chips]
What is the correct interpretation?37- There is a 0.9 prob that the true average number of chips is between 3.7 & 17.2.
- 90% of cookies have between 3.7 & 17.2 chips.
- We are 90% confident that in our sample, the sample average number of chips is between 3.7 and 17.2.
- In many repeated samples, 90% of sample averages will be between 3.7 and 17.2.
- In many repeated samples, 90% of intervals like this one will contain the true average number of chips.
- A 90% CI for the average number of chocolate chips in a Chips Ahoy cookie: [3.9 chips, \(\infty\))
What is the correct interpretation?38- There is a 0.9 prob that the true average number of chips is bigger than 3.9
- 90% of cookies have more than 3.9 chips
- We are 90% confident that in our sample, the sample average number of chips is bigger than 3.9.
- In many repeated samples, 90% of sample averages will be bigger than 3.9
- In many repeated samples, 90% of intervals like this one will contain the true average number of chips.
- Consider a Bayesian posterior interval for \(\mu\) of the form: \(\overline{X} \pm t^*_{n-1} s / \sqrt{n}\)
What was the prior on \(\mu\)?39- N(0,0)
- N(\(\overline{X}\),0)
- N(0, 1/0)
- N(\(\overline{X}\),1/0)
- N(1/0, 0)
Some review questions:
- If we need to find the distribution of a function of one variable (g(X) = X), the easiest route is probably:40
- find the pdf
- find the cdf
- find the MGF
- find the expected value and variance
- If we need to find the distribution of a sum of random variables, the easiest route is probably:41
- find the pdf
- find the cdf
- find the MGF
- expected value and variance
- FREQUENTIST: consider the sampling distribution of \(\hat{\theta}.\) The parameters in the sampling distribution are given by:42
- the data
- the parameters from the likelihood
- the prior parameters
- the statistic
- \(\theta\)
- BAYESIAN: consider the posterior distribution of \(\theta | \underline{X}.\) The parameters in the posterior distribution are a function of:43
- the data
- the parameters from the likelihood
- the prior parameters
- the statistic
- \(\theta\)
- A sample of size 8 had a mean of 22.5. It was bootstrapped 1000 times and the mean of the bootstrap distribution was 22.491. The standard deviation of the bootstrap was 2.334. The 95% BS SE confidence interval for the population mean is44
- 22.491 \(\pm\) z(.975) * 2.334
- 22.491 \(\pm\) z(.95) * 2.334
- 22.5 \(\pm\) z(.975) * 2.334
- 22.5 \(\pm\) z(.95) * 2.334
- 22.5 \(\pm\) z(.975) * 2.334 / \(\sqrt{8}\)
- Which is most accurate?45
- A BS SE confidence interval
- A bootstrap-t confidence interval
- A bootstrap percentile interval
- A bootstrap BCa interval
- What is the primary reason to bootstrap a CI (instead of creating a CI from calculus)?46
- larger coverage probabilities
- narrower intervals
- more resistant to outliers
- can be done for statistics with unknown sampling distributions
- What does the Fisher Information tell us?47
- the variability of the MLE from sample to sample.
- the bias of the MLE from sample to sample.
- the variability of the data from sample to sample.
- the bias of the data from sample to sample.
- Why do we care about the variability of the MLE?48
- determines whether MOM or MLE is better.
- determines whether Bayes’ estimator or MLE is better.
- determines how precise the estimator is.
- allows us to do inference (about the population value).
- Why do we care about the sampling distribution of the MLE?49
- determines whether MOM or MLE is better.
- determines whether Bayes’ estimator or MLE is better.
- determines how precise the estimator is.
- allows us to do inference (about the population value).
- Consider an estimator, \(\hat{\theta}\), such that \(E[\hat{\theta}] = m(\theta)\).
\(\hat{\theta}\) is unbiased for \(\theta\) if:50- \(m(\theta)\) is a function of \(\theta\).
- \(m(\theta)\) is NOT a function of \(\theta\).
- \(m(\theta)= \theta\).
- \(m(\theta)= 0\).
- \(m(\theta)\) is the expected value of \(\hat{\theta}\).
- If \(\hat{\theta}\) is unbiased, \(m'(\theta)\) is51
- zero
- one
- \(\theta\)
- \(\theta^2\)
- some other function of \(\theta\), depending on \(m(\theta)\)
- The MLE is52
- consistent
- efficient
- asymptotically normally distributed
- all of the above
- Why don’t we set up our test as: always reject \(H_0?\)53
- type I error too high
- type II error too high
- level of sig too high
- power too high
- Why do we care about the distribution of the test statistic?54
- Better estimator
- To find the rejection region / critical region
- To minimize the power
- Because we love the Central Limit Theorem
- Given a statistic T = r(X), how do we find a (good) test?55
- Maximize power when \(H_1\) is true
- Minimize type I error
- Control type I error
- Minimize type II error
- Control type II error
- We can find the probability of type II error (at a given \(\theta \in \Omega_1)\) as56
- a value of the power curve (at \(\theta)\)
- 1 – P(type I error at \(\theta)\)
- \(\pi(\theta | \delta)\)
- 1- \(\pi(\theta | \delta)\)
- we can’t ever find the probability of a type II error
- Why don’t we use the power function to also control the type II error?57 (We want the power to be big in \(\Omega_1\), so we’d control it by keeping the power from getting too small.)
- \(\inf_{\theta \in \Omega_1} \pi(\theta | \delta)\) does not exist
- \(\inf_{\theta \in \Omega_1} \pi(\theta | \delta)\) =0
- \(\inf_{\theta \in \Omega_1} \pi(\theta | \delta)\) = always really big
- \(\inf_{\theta \in \Omega_1} \pi(\theta | \delta)\) =1
- \(\inf_{\theta \in \Omega_1} \pi(\theta | \delta)\) = always really small
- With two simple hypotheses, hypothesis testing simplifies because we can now control (i.e., compute):58
- the size of the test.
- the power of the test.
- the probability of type I error.
- the probability of type II error.
- a rejection region.
- The likelihood ratio is super awesome because59
- it provides the test statistic
- it provides the critical region
- it provides the type I error
- it provides the type II error
- it provides the power
- A uniformly most powerful (UMP) test60
- has the highest possible power in \(\Omega_1\).
- has the lowest possible power in \(\Omega_1\).
- has the same power over all \(\theta \in \Omega_1\).
- has the highest possible power in \(\Omega_1\) subject to controlling \(\alpha(\delta).\)
- is a test we try to avoid.
- A monotone likelihood ratio statistic is awesome because61
- it is the MLE
- it is easy to compute
- its distribution is known
- it is unbiased
- it is monotonic with respect to the likelihood ratio
- Likelihood Ratio Test62
- gives a statistic for comparing likelihoods
- is always UMP
- works only with some types of hypotheses
- works only with hypotheses about one parameter
- gives the distribution of the test statistic
- Increasing sample size63
- Increases power (over \(\Omega_1\))
- Decreases power (over \(\Omega_1\))
- Making significance level more stringent (\(\alpha_0\) smaller)64
- Increases power (over \(\Omega_1\))
- Decreases power (over \(\Omega_1\))
- A more extreme alternative is true65
- Increases power (over \(\Omega_1\))
- Decreases power (over \(\Omega_1\))
- Given the situation where \(H_1: \mu_1 - \mu_2 \ne 0\) is TRUE. Consider 100 CIs (for \(\mu_1 - \mu_2\)), the power of the test can be approximated by:66
- The proportion that contain the true mean.
- The proportion that do not contain the true mean.
- The proportion that contain zero.
- The proportion that do not contain zero.
- It is hard to find the power associated with the t-test because:67
- the non-central t-distribution is tricky.
- two-sided power is difficult to find.
- we don’t know the variance.
- the t-distribution isn’t integrable.
- Consider the likelihood ratio statistic: \[\Lambda(x) = \frac{\sup_{\Omega_1} f(\underline{x} | \theta)}{\sup_{\Omega_0} f(\underline{x} | \theta)}\] Why do we assume that the MLE maximizes the numerator?68
- The MLE is always in the alternative space.
- The MLE is always in the null space.
- If the MLE is in the alternative space, we won’t reject \(H_0\).
- If the MLE is in the null space, we won’t reject \(H_0\).
- If the MLE is in the alternative space, we will reject \(H_0\).
- Consider the likelihood ratio statistic:69 \[\Lambda(x) = \frac{\sup_{\Omega_1} f(\underline{x} | \theta)}{\sup_{\Omega_0} f(\underline{x} | \theta)}\]
- \(\Lambda(x) \geq 1\)
- \(\Lambda(x) \leq 1\)
- \(\Lambda(x) \geq 0\)
- \(\Lambda(x) \leq 0\)
- bounds on \(\Lambda(x)\) depend on hypotheses
- When using the chi-square goodness of fit test, the smaller the value of the chi-square test statistic, the more likely we are to reject the null hypothesis.70
- True
- False
- A chi-square test is71
- one-sided, and we only consider the upper end of the sampling distribution
- one-sided, and we consider both ends of the sampling distribution
- two-sided, and we only consider the upper end of the sampling distribution
- two-sided, and we consider both ends of the sampling distribution
- To test whether the data are Poisson, why can’t we use the Poisson likelihood instead of the multinomial?72
- Likelihood under \(H_0\) is too hard to write down
- Likelihood under \(H_1\) is too hard to write down
- Don’t know the distribution of the corresponding test statistic
- Don’t have any data to use
- The \(\chi^2\) test statistic is being used to test whether the assumption of normality is reasonable for a given population distribution. The sample consists of 5000 observations and is divided into 6 categories (intervals). What are the degrees of freedom associated with the test statistic?73
- 4999
- 6
- 5
- 4
- 3
- For a chi-square test for independence, the null hypothesis states that the two variables74
- are mutually exclusive.
- form a contingency table with r rows and c columns.
- have (r –1) and (c –1) degrees of freedom where r and c are the number of rows and columns, respectively.
- are statistically independent.
- are normally distributed.
- You read a paper where a chi-square test produces a p-value of 0.999 (not 0.001). You think:75
- \(H_0\) is definitely true
- \(H_0\) is definitely not true
- The authors’ hypothesis is in the wrong direction.
- Maybe they falsified their data?
Footnotes
- The distribution of the sample average (statistic) converges to a normal distribution
- Sort of likely the Green Cab company
- \(f( x | \theta ) = P(X = x | \theta)\)
- integrate the joint distribution with respect to Y.
- Integrate to 1 (\(dx\))
- has support on [0,1]
- prior = marginal & posterior = conditional
Both (b) \(\xi(\theta | \underline{X}) \sim\) Beta (4,12) and (c) \(\xi(\theta | \underline{X}) \propto\) Beta (4,12) are incorrect. (b) because the value to the left of the \(\sim\) must be a random variable. (c) because the value to the right of the \(\propto\) must be a function.↩︎
- 1/[\(2^{k/2} \Gamma(k/2)\)]
- gamma
- it doesn’t integrate to one.
\(\mu_1 = \frac{\sigma^2 \mu_0 + n \nu_0 \overline{X}}{\sigma^2 + n \nu_0^2}\)↩︎
- some of the above (the Bayes estimator is the posterior mean, it is sensitive to the rest of it.)
- the data
- theta
with respect to \(\theta\)↩︎
L(\(\hat{\theta}\)) < L(\(\theta\))↩︎
- has desirable sampling distribution properties and (d) maximizes both the likelihood and the log likelihood (although (c) is really the reason it is popular)
- is often straightforward to compute (it does not always exist, look at Cauchy. it does not always produce estimates inside the parameter space.)
- The distribution of the sample average (statistic) converges to a normal distribution
- the distribution of the statistic in repeated samples
- the cdf, the pdf/pmf, and the mgf
- gives all theoretical moments of the distribution
(e): (a), (c), (d)↩︎
- \(\sigma^2\) (the first two are statistics, not parameters, we can’t isolate \(\mu\) because it isn’t involved, and \(\chi\) also isn’t a parameter)
- \(\mu\) (the first two are statistics, not parameters, we can’t isolate \(\sigma^2\) because it isn’t involved, and \(\chi\) also isn’t a parameter)
- a little bit more than 1 (dividing by \(s\) instead of \(\sigma\) adds variability to the distribution)
- 50 observations in each bootstrap sample
- 1000
- the sample statistic
- Resampling with replacement from the original sample. Although I suppose (c) is also true.
- The difference between a sampling distribution mean and the actual parameter.
- -0.009 Bias is what the statistic is (on average) minus the true value. Recall, we are using the data as a proxy for the population, so the “truth” is the data. So in the bootstrap setting, the average is over the bootstrapped values and the true value is the sample mean.
- \(\sigma^2\) (the first two are statistics, not parameters, we can’t isolate \(\mu\) because it isn’t involved, and \(\chi\) also isn’t a parameter)
- \(\mu\) (the first two are statistics, not parameters, we can’t isolate \(\sigma^2\) because it isn’t involved, and \(\chi\) also isn’t a parameter)
- \(c_2\) set to infinity
- In many repeated samples, 90% of intervals like this one will contain the true average number of chips.
- In many repeated samples, 90% of intervals like this one will contain the true average number of chips.
- N(0,1/0). Or rather, to get the frequentist result, you need the joint improper priors to have \(\mu_0 = \lambda_0 = \beta_0 = 0\) and \(\alpha_0 = -1/2\).
- The MGF is usually easiest if g is any kind of linear combination. If not, you might need (b) find the cdf. You’ll need to find the cdf to get the pdf, which you might need to identify the distribution. (note: can’t identify a distribution using only the first two moments, (d))
- find the MGF (note: can’t identify a distribution using only the first two moments, (d))
- the parameters from the likelihood
- the data and (c) the prior parameters
- 22.5 \(\pm\) z(.975) * 2.334
- A bootstrap BCa interval (although out of the ones we’ve covered, (b) A bootstrap-t confidence interval is most accurate)
- can be done for statistics with unknown sampling distributions
- the variability of the MLE from sample to sample.
- determines how precise the estimator is.
- allows us to do inference (about the population value).
- \(m(\theta)= \theta\).
- one
- all of the above
- type I error too high
- To find the rejection region / critical region
- Control type I error
- 1- \(\pi(\theta | \delta)\)
- \(\inf_{\theta \in \Omega_1} \pi(\theta | \delta)\) = always really small
- the power of the test. or (d) the probability of type II error. (they are functions of one another)
- it provides the test statistic
- has the highest possible power in \(\Omega_1\) subject to controlling \(\alpha(\delta).\)
- it is monotonic with respect to the likelihood ratio
- gives the distribution of the test statistic
- Increases power (over \(\Omega_1\))
- Decreases power (over \(\Omega_1\))
- Increases your power (over \(\Omega_1\))
- The proportion that do not contain zero.
- the non-central t-distribution is tricky.
- If the MLE is in the null space, we won’t reject \(H_0\).
\(\Lambda(x) \geq 1\)↩︎
- False
- two-sided, and we only consider the upper end of the sampling distribution
- Likelihood under H1 is too hard to write down (what likelihood would we use for the situation of “not Poisson”?)
- 5
- are statistically independent.
- Maybe they falsified their data?