Chapter 5 Testing hypotheses

5.1 formula

When testing hypotheses, and building regression models, we need to specify the relations between variables. This is done in R by means of a formula, which is needed in many statistical functions. In general, such a formula consists of a response variable, followed by the tilde symbol ~, followed by a list of independent variables and/or factors (Wilkinson and Rogers 1973). In this list, the colon : indicates an interaction effect (instead of the sequence operator), and the asterisk * is shorthand for main effects plus interactions (instead of the multiplication operator). By default, the intercept ~1 is included in the formula, unless suppressed explicitly (-1). We have already encountered such a formula in the boxplot example above.

y ~ x1 + x2 # only main effects 
y ~ x1 * x2 # shorthand for x1 + x2 + (x1:x2) 

Consult the help files for further information on how to specify complex models.

5.2 \(t\) test

There are three ways to use the t test.

In a one-sample \(t\) test, the sample mean is compared against an expected mean mu, with

t.test( x1, mu=0.80 )
## 
##  One Sample t-test
## 
## data:  x1
## t = 1.8691, df = 99, p-value = 0.06456
## alternative hypothesis: true mean is not equal to 0.8
## 95 percent confidence interval:
##  0.7878051 1.2082871
## sample estimates:
## mean of x 
## 0.9980461

In a two-sample test with independent observations, we compare the same dependent variable, in two groups of sampling units; these groups are defined by an independent variable.

t.test( y[x1<median(x1)], y[x1>median(x1)] ) # groups by median split of x1
## 
##  Welch Two Sample t-test
## 
## data:  y[x1 < median(x1)] and y[x1 > median(x1)]
## t = -3.3044, df = 94.766, p-value = 0.001345
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.0476023 -0.2612324
## sample estimates:
## mean of x mean of y 
##  2.749787  3.404204

This could also be achieved by specifying the dependent and independent variables in a formula:

t.test( y ~ (x1<median(x1)) ) # equivalent
## 
##  Welch Two Sample t-test
## 
## data:  y by x1 < median(x1)
## t = 3.3044, df = 94.766, p-value = 0.001345
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  0.2612324 1.0476023
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            3.404204            2.749787

In a two-sample test with paired observations, we compare the same construct, but observed under two conditions, which were “paired” within the same sampling units. The two observations are typically stored in two different variables.

t.test( x1, x2, paired=TRUE )
## 
##  Paired t-test
## 
## data:  x1 and x2
## t = -14.581, df = 99, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.1074340 -0.8421383
## sample estimates:
## mean of the differences 
##              -0.9747861

The small \(p\)-value reported here is \(p < 2.2 * 10^{-16}\) in scientific notation8.

Note that the number of sampling units (e.g. participants) and of observations varies in these three \(t\) tests, yielding different degrees of freedom.

5.3 chisq.test

First, let us create two categorical variables, derived from a speaker’s age (in years) and average phraselength (in syllables), for 80 speakers in the Corpus of Spoken Dutch (talkers data set; (Quené 2014)). Categorical variables are created here with the cut function, to create breaks=2 categories of age (young and old) and of phraselength (short and long).

require(hqmisc)
data(talkers)
age.cat <- cut( talkers$age, breaks=2 )
phraselength.cat <- cut( talkers$nsyl, breaks=2 )

The hypothesis under study is that older speakers tend to produce shorter phrases. This hypothesis may be tested with a \(\chi^2\) (chi square) test.

table( age.cat, phraselength.cat ) # show 2x2 table
##          phraselength.cat
## age.cat   (4.44,9] (9,13.6]
##   (21,40]       28       12
##   (40,59]       32        8
chisq.test( age.cat, phraselength.cat ) 
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  age.cat and phraselength.cat
## X-squared = 0.6, df = 1, p-value = 0.4386

The data in the table show that, as hypothesized, the odds of older talkers producing short phrases (\(32/8\) or \(4.0:1\)) are indeed higher than the odds of younger talkers producing short phrases (\(28/12\) or \(2.3:1\)). The effect is far from significant, however, and \(H_0\) is not rejected.

5.4 aov

This function performs a between-subjects analysis of variance, with only fixed factors (Johnson 2008) (For more complex analyses of variance having repeated measures, see Johnson 2008; for mixed effects models, see Chapter 7 and references cited there.) In the example below we create a response variable aa which is not normally distributed (check with hist, qqnorm, etc).

a1<-rpois(20,lambda=2)
a2<-rpois(20,lambda=4) 
a3<-rpois(20,lambda=6) 
aa <- c(a1,a2,a3) 
x1 <- as.factor(rep(1:3,each=20)) 
# x1 corresponds with the three different poisson distributions within aa
x2 <- as.factor(rep( rep(1:2,each=10), 3)) # no effect expected

Thus the dependent variable aa intentionally differs between the levels of x1, but there should be no effect of the independent variable x2 nor of the interaction between the two independent variables (\(F<1\) expected for both effects). The model is estimated and summarized in a single composite command.

summary( model1.aov <- aov(aa~x1*x2) )
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## x1           2 109.73   54.87  15.512 4.75e-06 ***
## x2           1   0.27    0.27   0.075    0.785    
## x1:x2        2   6.93    3.47   0.980    0.382    
## Residuals   54 191.00    3.54                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When reporting any of the hypothesis tests in this section, you should always report the effect size too (Quené 2010).


References

Johnson, Keith. 2008. Quantitative Methods in Linguistics. Blackwell.
Quené, Hugo. 2010. “How to Design and Analyze Language Acquisition Studies.” In Experimental Methods in Language Acquisition Research, edited by Elma Blom and Sharon Unsworth, 269--287. Benjamins.
———. 2014. Hqmisc: Miscellaneous Convenience Functions and Dataset. https://CRAN.R-project.org/package=hqmisc.
Wilkinson, G. N., and C. E. Rogers. 1973. “Symbolic Description of Factorial Models for Analysis of Variance.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 22 (3): 392--399.