Chapter 1 Introduction

This tutorial offers a first introduction into R. R is available as freeware from https://www.r-project.org, where one can also find a wealth of information and documentation.

This tutorial assumes that R is already properly installed on your computer. It is further assumed that the reader has some basic knowledge about statistics, equivalent to an introductory course in statistics. This tutorial introduces the R software for statistical analyses, and not the statistical analyses themselves. This tutorial occasionally mentions differences with SPSS, but the tutorial is also intended for novice users of statistical software.

1.1 What is R ?

Perhaps surprisingly, R is several things at once:

  • a program for statistical analyses
firstmodel.lm <- lm( syldur~age, data=talkers ) # linear-model regression
  • a calculator
# 440 Hz is `how many` semitones above 110 Hz?
( log(440)-log(110) ) / log(2^(1/12)) 
## [1] 24
  • a programming language
# function to convert hertz to semitones, relative to `base`, by Mark Liberman
h2st <- function( h, base=110 ) { 
  semi1 <- log(2^(1/12)) # log of frequency ratio of 1 semitone
  return( ( log(h)-log(base) ) / semi1 ) } # compare with above

The assignment operator <- is further explained in section 3.1 below.

The hash # indicates comment, which is not processed.

1.2 What is RStudio?

Most users will use R in combination with the program RStudio (https://www.rstudio.com). RStudio can be regarded as a wrapper around R, actively assisting you with many housekeeping tasks. (For users familiar with SPSS, this is somewhat similar to the SPSS graphical user interface wrapped around the SPSS “engine”). After opening, RStudio displays three or four panels or panes, with multiple tabs (in bold) within each pane.

The left panel, or lower left panel, has a tab named Console which importantly contains the current R session. You can input R commands there (try typing date() and press Enter) and see the output (warning and error messages are displayed in red).

In the top right panel, the tab Environment lists all objects in the workspace (see explanation below), and the tab History lists your previously entered R commands.

In the bottom right panel, the tab Files shows files in your current folder or directory, Plots contains plots produced by R/RStudio, and Help gives you access to help information.

You could work your way through most of this booklet using only the Console tab of RStudio, but most users find R+RStudio far easier to work with than R by itself.

1.3 Object-oriented philosophy

R works in an object-oriented way. This means that objects are the most important things in R , and not the actions we perform with these objects. Let’s use a culinary example to illustrate this. In order to obtain pancakes, a cook needs flour, milk, eggs, some mixing utensils, a pan, oil, and a fire. An object-oriented approach places primary focus on these six objects. If the relations between these are properly specified, then a good pancake will result. Provided that the necessary objects (ingredients) are available, the quasi-R syntax could be as follows:

batter <- mixed( flour,  milk/2 ) # mix flour and half of milk
batter <- mixed( batter, egg*2 ) # add 2 eggs
batter <- mixed( batter, milk/2, use=whisk) # add other half of milk
while (enough(batter)) # FALSE if insufficient for next
    pancake <- baked( batter, in=oil, with=pan, temp=max(fire) )

This example illustrates that R is indeed a full programming language (but see footnote 1). In fact, there is no recipe, in the traditional sense. This “pancake” script merely specifies the relations between the ingredients and the result. Note that some relations are recursive: batter can be both input and output of the mixing operation. Also note that the mixed relation takes an optional argument use=whisk, which will produce a fatal error message if there is no whisk in the kitchen. Such arguments, however, allow for greater flexibility of the mixed relation. Likewise, we might specify baked(in=grease) if there is no oil in the kitchen. The only requirement for the object supplied as in argument is that one can bake in it, so this object must have some attribute goodforbaking==TRUE.

For contrast, we might imagine how the pancake recipe would be formulated in a more traditional, procedure-oriented approach. Ingredients and a spoon are again assumed to be provided.

MIX batter = flour + milk/2 . # what utensil? 
MIX batter = batter + eggs . 
MIX batter = batter + milk/2 .
BAKE batter IN oil .
BAKE batter IN water . # garbage in garbage out

The programmer of this recipe has defined the key procedures MIX and BAKE, and has stipulated boundary conditions such as utensils and temperatures. Optional arguments are allowed for the BAKE command, but only within the limits set by the programmer (see footnote 2).

So far, you may have thought that the difference between the two recipes was semantic rather than pragmatic. To demonstrate the greater flexibility of an object-oriented approach, let us consider the following variant of the recipe, again in quasi-R syntax:

# batter is done
while (number(pancakes)<2) # first bake 2 pancakes
    pancake <- baked(batter,in=oil,with=pan,temp=max(fire))
feed(pancake,child) # feed one to hungry spectator
# define new function, data ’x’ split into ’n’ pieces
chopped <- function( x, n=1000 ) { return( split(x,n) ) } 
pieces <- chopped(pancake) # new data object, array of 1000 pieces
batter <- mixed(batter,pieces) # mix pancake pieces into batter
# etc

Such complex relations between objects are quite difficult to specify, if there are strong a priori limits to what one can MIX or BAKE. Thus, object-oriented programs such as R allow for greater flexibility than procedure-oriented programs.

If you are a user of the Praat software (http://www.praat.org) then you are already familiar with this basic idea. Praat has an object window, listing the known objects. These objects are the output of previous operations (e.g. Create, Read, ToSpectrum), as well as input for subsequent operations (e.g. Write, Draw). The classes or types of these objects are pre-defined (Sound, Spectrum, Periodicity, etc). R takes the same idea even further: users may create their own classes of data objects (e.g. a new class SuperData) and may create their own methods or relations to work with such objects 2.

This object-oriented philosophy results in a different behavior than observed in procedure-oriented software:

There is an important difference in philosophy between S (and hence R) and the other main statistical systems. In S a statistical analysis is normally done as a series of steps, with intermediate results being stored in objects. Thus whereas SAS and SPSS will give copious output from a regression or discriminant analysis, R will give minimal output and store the results in a fit object for subsequent interrogation by further R functions.

from: https://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics