1 Initialize R

Initialize R with the following commands:

# Initialize R -------------------------------------

# I set the working directory to be the folder that includes this document
graphics.off() # clear all graphs
options(digits=6,width=80) # R console display parameters
options(contrasts=c("contr.sum","contr.poly") ) # set up coding scheme for factors
op <- par(no.readonly = TRUE) # save copy of current parameters
# usually a good idea to CLEAR WORKSPACE MEMORY
# use command in RStudio "Session" menu

2 z tests

2.1 z test for a single score

In the US, the weight of full-term Caucasian infants is distributed approximately normally with a mean of 3480 g and a standard deviation of 462 g. A full-term baby is born with a weight of 2910 g. Is this weight unusually low? To answer this question, we need to define what we mean by unusual. For this question we will define an unusual weight is one that falls in the lower or upper 2.5% of the distribution of weights.

Another way of saying this is to define a normal (i.e., not unusual) weight as one that falls within the middle 95% of the distribution of birth weights. With this definition in hand, we will use the R command pnorm to compute the probability of randomly selecting a birth weight ≤ 2910 from a normal distribution with mu=3480 and SD = 462.

( p <- pnorm(q=2910,mean=3480,sd=462,lower.tail=TRUE) ) # pnorm returns probabilty <= 2910
## [1] 0.108645

The probability of randomly selecting an infant from the typical population whose weight is \(\le 2910\) is 0.11. In other words, this infant’s weight falls within the bottom 11% of the population and therefore (by our definition) is not unusual.

Question: Suppose a baby’s weight is 4310 g. The following code computes the probability of randomly selecting a baby whose weight is \(\ge 4310\). Is this baby unusually heavy?

( p <- pnorm(q=4310,mean=3480,sd=462,lower.tail=FALSE) ) # pnorm returns probabilty >= 4310

2.2 standard (z) scores

What do these probabilities have to do with z? Any score in a sample, \(Y_i\), can be converted to a \(z\) score using the formula \(z_i = \frac{Y_i - \bar{Y}}{s}\), where \(\bar{Y}\) and \(s\) are the sample mean and standard deviation, respectively. Therefore, \(z_i\) represents the number of standard deviations that \(Y_i\) is below or above the mean \(\bar{Y}\). We can convert our observed means of 2910 and 4310 to z scores by subtracting the population mean (3480) and dividing by the population standard deviations. What are the z scores for our two group means?

If a set of scores is distributed normally, then the distribution of \(z\) scores also is normal with a mean of zero and a standard deviation of 1. Therefore, the probabilities that we calculated above are the same as the probability of randomly selecting a score from a standard normal distribution (mean = 0, sd = 1) that is below \(z=-1.2337\) or above \(z=1.7965\). We compute those probabilities here:

pnorm(-1.2337,0,1,lower.tail=TRUE)
## [1] 0.108657
pnorm(1.7965,0,1,lower.tail=FALSE)
## [1] 0.0362075

Note that these probabilities are the same as those we calculated before.

2.3 drawing a normal distribution

The following code draws a normal distribution (\(\mu=3480\) & \(\sigma=462\)) with the weights 2910 and 4310 indicated by two vertical dotted lines.

w <- seq(2000,5000,1) # will be x axis
p <- dnorm(x=w,mean=3480,sd=462) # probability density
plot(x=w,y=p,type="l",xlab="weight (g)",ylab="density") # draw a line plot
abline(v=c(2910,4310),lty=3,lwd=2)
text(x=c(2910,4310),y=c(7e-04,7e-04), pos=c(2,4), labels=c("2910","4310") )

2.4 calculating critical values that define the normal range

The previous examples calculated the probability of getting a weight, \(w\), that was at least as low as 2910, \(p(w\le 2910)\), or at least as high as 4310, \(p(w\ge 4310)\). However, we often want to compare observed weights (i.e., 2910 and 4310) to values that define the lower and upper limits of the range of weights that are not unusual.

Question: In this case we want to calculate the weights that cutoff the lower 2.5% and the upper 2.5% from the middle 95%. We can use qnorm to calculate these birth weights. What are they?

lower.p.value <- 0.025
upper.p.value <- 0.975
qnorm(lower.p.value,mean=3480,sd=462) # lower cutoff value
qnorm(upper.p.value,mean=3480,sd=462) # upper cutoff value

The interval between 2574.5 and 4385.5 contains 95% of the weights in this population, and weights falling outside of this range would be declared unusual (using our definition of unusual as being in the lower or upper 2.5%).

2.5 z test for a single mean

Consider a situation where we want to evaluate a group mean. For example, suppose we measure birth weights of 100 Native-American full-term babies. The mean of our sample is 3350 g and the standard deviation is 375 g. Is the mean of our sample mean is unusually low, assuming that the babies were selected from a typical, normally distributed population (\(\mu=3480\), \(\sigma=462\))?

CORRECTION: The following paragraph contained an error: it said that the mean of the sampling distribution was equal to \(\mu=100\) when in fact it is equal to the mean of the population of birth weights, 3480. The correction is given below.

To answer this question we need to know how the means of samples of birth weights (\(n=100\)) are distributed when the weights are selected from a normal, population distribution N(\(\mu=3480\), \(\sigma^2=462^2\)). It can be shown that the distribution of means will be normal with a mean of \(\mu_{\bar{Y}}=3480\) and variance of \(\sigma^2_{\bar{Y}} = \sigma^2/n\). Finally, note that the standard deviation of the distribution of means, referred to as the Standard Error of the Mean or SEM, is \(\sigma/\sqrt{n}\).

Now that we know that the means are distributed normally (\(\mu_{\bar{Y}}=3480\), \(\sigma_{\bar{Y}}=46.2\)), we can ask if our group mean is unusual assuming it came from the typical birth weight population. As before, we will define normal as means that fall within the middle 95%.


Question: Use qnorm to calculate the birth weights that define the lower and upper boundaries of the middle 95% of a normal distribution with a mean of 3480 and a standard deviation of 46.2.

The following code plots the sampling distribution of means, two black vertical lines defining the boundaries between normal (middle 95%) and unusual sample means, and a red vertical line indicating our sample mean. In this case, the lower and upper bounds are 3389.45 g and 3570.55 g, respectively. Our observed group mean (\(\bar{Y}=3350\)) falls outside that range therefore is deemed to be unusual.

Question: Convert the birth weights that define the lower and upper boundaries of the middle 95% into z scores.

Question: In the previous question you calculated lower-bound and upper-bound z scores. Now consider a situation where you are randomly selecting a z score from a standard normal distribution. What are the probabilities of selecting a z that is less than the lower bound? Greater than the upper bound? Check your answers by using pnorm to calculate the probabilities.

Question: Convert the observed group mean to a z score (assuming it was drawn from the typical population of values). What is the z score? Does it fall within or outside the bounds calculated above? What implications does this have for our decision about whether the sample mean is unusual?

2.6 Null hypothesis testing

In the previous section we started by assuming that our sample of babies (\(n=100\)) was drawn from a population whose weights were distributed normally with a mean of 3480 and a standard deviation of 462. We derived the sampling distribution of the mean for sample sizes of 100 drawn from that population: the distribution was normal, with a mean of 3480 and a standard deviation of 46.2. Finally, we defined a typical value as any sample mean that fell within the middle 95% of that distribution (i.e., between 3389 and 3571). Our sample mean (3350) fell outside that range and therefore was declared unusual assuming the birth weights came from the population of normal-weight babies. Given this conclusion, we might want to decide that this assumption is incorrect, and that the sample of babies came from a population with a mean that is not 3480 g.

Our decision to reject the null hypothesis might be incorrect: the sample that yielded our observed group mean of 3350, which is unusual, may have come from the population of typical birth weights. If we repeatedly sampled groups of 100 babies from the typical population, we would expect to obtain a group mean outside the 3389-3571 range approximately 5% of the time. That might have happened in this case: we may have mistakenly rejected a true H0. Such an error is called a Type I error. In our case, if H0 is true then (in the long run) we would mistakenly reject it 5% of the time. Therefore, when we report our result by saying that a z test was used to reject the null hypothesis \(\mu=3480\), \(p<0.05\), the p value represents the probability that we are making a Type I error.

2.7 using z.test

The command z.test in the BSDA package can be used to perform a z test. First we install and load the BSDA package.

# install.packages('BSDA') # do this only if you haven't installed BSDA yet
library(BSDA) # load library
  1. Load the bweights variable.
# load bweights variable:
load(file=url("http://pnb.mcmaster.ca/bennett/psy710/labs/L1/birthweights.rda"))
  1. Calculate the number of observations in bweights, the mean, and the standard deviation.
c(length(bweights),mean(bweights),sd(bweights)) # sample size, mean, standard dev
## [1]  100 3350  375
  1. Create a boxplot. Look for outliers, skewness, spread.
boxplot(bweights,ylab="weight (g)") # check for spread, skew, and outliers
  1. Create a histogram with hist and check for skewness and normality.
hist(bweights,ylab="frequency",xlab="weight (g)") # check for skewness and normality
  1. Use qqnorm and qqline to create a quantile-quantile plot. Normally distributed data fall along a line in this type of plot.
qqnorm(bweights) # normally distributed points fall on a line
qqline(bweights) # deviation from line suggests slightly heavy tails
  1. Use shapiro.test to evaluate normality with the Shapiro-Wilks test. The null hypothesis is that the data are distributed normally. We will set our significance level (alpha, Type I error rate) to 0.05. Do we reject the null hypothesis that the data are distributed normally?
shapiro.test(bweights) # shapiro-wilks test
  1. Finally,perform a z test using z.test. Note that we note that we use the population standard deviation because it is known. What null hypothesis is evaluated by this test? Do we reject that null hypothesis?
z.test(x=bweights,alternative="two.sided",mu=3480,sigma.x=462,conf.level=0.95)
  1. Based on the results of your z test, what are the best point and interval estimates of the true mean of the population from which we selected our sample?

3 1-sample t test

Our previous \(z\) test on birth weights used the population standard deviation \(\sigma = 462\) because it was known (from many previous studies). How would we proceed if we did not know the population standard deviation? In this case we would use the sample standard deviation, \(s\) as an estimate of the population standard deviation, \(\sigma^\prime = s\), and use that to evaluate our null hypothesis \(\mu=3480\) using a pseudo-z score, \(z^\prime=(\bar{Y}-\mu)/(\sigma^\prime/\sqrt{n})\). However, this introduces a problem: \(\sigma^\prime\) is a biased estimate of \(\sigma\): it tends to be too low, especially for small sample sizes. This means that \(\sigma^\prime/\sqrt{n}\) will tend to be less than \(\sigma/\sqrt{n}\) and, consequently, \(z^\prime\) will be inflated. In other words, \(z^\prime\) is not distributed as a standard, normal variable: extreme values of \(z^\prime\) occur more frequently than expected. Therefore our \(z\) test will be biased: it will become easier to reject H0 and our Type I error rate will be greater than expected.

Fortunately, z’ does follow a so-called t distribution: the distribution is symmetrical around zero and is defined by a single parameter, the so-called degrees of freedom. For a 1-sample \(t\) test, the degrees of freedom equal is \(n-1\). Evaluating our null hypothesis with a \(t\) statistic instead of a \(z\) statistic eliminates the bias introduced into our calculations by using an estimate of the population standard deviation. Note that the t distribution is very similar to a z distribution when the degrees-of-freedom are > 33, so the results of z and t tests become increasingly similar as sample size increases. In the birth weights analysis, sample size was 100 and therefore we would expect to get similar results with a \(t\) test.

The following code evaluates the null hypothesis that our sample of birth weights came from a population with a mean of 3480 (H0: \(\mu=3480\)) using our sample standard deviation as an estimate of the population standard deviation. We define unusual means as being in the lower or upper 2.5% of the theoretical distribution of means.

( mu <- 3480 ) # pop mean when H0 is true
## [1] 3480
( ybar <- mean(bweights)) # sample mean
## [1] 3350
( n <- length(bweights)) # sample size
## [1] 100
( SD <- sd(bweights) ) # sample standard deviation
## [1] 375
( eSEM <- SD/sqrt(n)) # estimated SEM (stand dev of sample means)
## [1] 37.5
( t.observed <- (ybar - mu)/eSEM) # number of SEM from mu
## [1] -3.46667
( t.critical.lower <- qt(.025,df=99))
## [1] -1.98422
( t.critical.upper <- qt(.975,df=99))
## [1] 1.98422

Our observed \(t\), -3.476, falls outside the range [-1.98, 1.98] that bounds the middle 95% of our distribution, and therefore we reject the null hypothesis, \(\mu=3480\), in favor of the alternative, \(\mu\ne 3480\). We also can calculate the probability of getting a value of \(t\) that is more extreme than \(\pm 3.467\) when the null hypothesis is true: \(p(t\le -3.467) + p(t\ge 3.467) = 0.00078\).

( p.lower <- pt(t.observed,df=99,lower.tail=T)) # prob of being less than lower bound
## [1] 0.000390493
( p.upper <- pt(abs(t.observed),df=99,lower.tail=F)) # prob of being more than upper bound
## [1] 0.000390493
( p.total <- p.lower + p.upper) # total probability
## [1] 0.000780985

3.1 using t.test

Since we have the actual data set, we can perform the \(t\) test using t.test. Run the test and examine the results.

t.test(x=bweights,mu=3480,alternative="two.sided",conf.level=0.95)

3.2 one-sided t test

So far we have conducted so-called two-sided z and t tests. We rejected the null hypothesis if our observed score or mean was unusually low OR high. However, in many cases our hypotheses are directional. For example, in the birth weight tests we might suspect that our babies came from an at-risk population and therefore came from a population whose mean birth weight was less than the typical value of 3480. To test this idea, we would try to use our data to reject a null hypothesis H0 \(\mu \ge 3480\) in favor of the alternative H1 \(\mu < 3480\). In this case, we would reject the null hypothesis only if our observed mean was unusually low assuming H0 is true.

Question: How should we define an unusual sample mean in this case?

Question: One-sided \(t\) tests are performed with t.test and setting the alternative parameter to less. Perform the one sided test. How do the values of \(t\), \(p\), and the degrees of freedom compare to the values from the two-sided \(t\) test? How does the confidence interval differ from the one returned by the two-sided \(t\) test?

t.test(x=bweights,mu=3480,alternative="less",conf.level=0.95)