Chapter 1 Introduction
This tutorial offers a first introduction into R. R is available as freeware from https://www.r-project.org, where one can also find a wealth of information and documentation.
This tutorial assumes that R is already properly installed on your computer. It is further assumed that the reader has some basic knowledge about statistics, equivalent to an introductory course in statistics. This tutorial introduces the R software for statistical analyses, and not the statistical analyses themselves. This tutorial occasionally mentions differences with SPSS, but the tutorial is also intended for novice users of statistical software.
1.1 What is R ?
Perhaps surprisingly, R is several things at once:
- a program for statistical analyses
<- lm( syldur~age, data=talkers ) # linear-model regression firstmodel.lm
- a calculator
# 440 Hz is `how many` semitones above 110 Hz?
log(440)-log(110) ) / log(2^(1/12)) (
## [1] 24
- a programming language
# function to convert hertz to semitones, relative to `base`, by Mark Liberman
<- function( h, base=110 ) {
h2st <- log(2^(1/12)) # log of frequency ratio of 1 semitone
semi1 return( ( log(h)-log(base) ) / semi1 ) } # compare with above
The assignment operator <-
is further explained in section 3.1 below.
The hash #
indicates comment, which is not processed.
1.2 What is RStudio?
Most users will use R in combination with the program RStudio (https://www.rstudio.com). RStudio can be regarded as a wrapper around R, actively assisting you with many housekeeping tasks. (For users familiar with SPSS, this is somewhat similar to the SPSS graphical user interface wrapped around the SPSS “engine”). After opening, RStudio displays three or four panels or panes, with multiple tabs (in bold) within each pane.
The left panel, or lower left panel, has a tab named Console which importantly contains the current R session. You can input R commands there (try typing date()
and press Enter) and see the output (warning and error messages are displayed in red).
In the top right panel, the tab Environment lists all objects in the workspace (see explanation below), and the tab History lists your previously entered R commands.
In the bottom right panel, the tab Files shows files in your current folder or directory, Plots contains plots produced by R/RStudio, and Help gives you access to help information.
You could work your way through most of this booklet using only the Console tab of RStudio, but most users find R+RStudio far easier to work with than R by itself.
1.3 Object-oriented philosophy
R works in an object-oriented way. This means that objects are the most important things in R , and not the actions we perform with these objects. Let’s use a culinary example to illustrate this. In order to obtain pancakes, a cook needs flour, milk, eggs, some mixing utensils, a pan, oil, and a fire. An object-oriented approach places primary focus on these six objects. If the relations between these are properly specified, then a good pancake will result. Provided that the necessary objects (ingredients) are available, the quasi-R syntax could be as follows:
batter <- mixed( flour, milk/2 ) # mix flour and half of milk
batter <- mixed( batter, egg*2 ) # add 2 eggs
batter <- mixed( batter, milk/2, use=whisk) # add other half of milk
while (enough(batter)) # FALSE if insufficient for next
pancake <- baked( batter, in=oil, with=pan, temp=max(fire) )
This example illustrates that R is indeed a full
programming language (but see footnote 1).
In fact, there is no recipe, in the
traditional sense. This “pancake” script merely specifies the relations
between the ingredients and the result. Note that some relations are
recursive: batter can be both input and output of the mixing operation.
Also note that the mixed
relation takes an
optional argument use=whisk
, which will produce
a fatal error message if there is no whisk in the kitchen. Such
arguments, however, allow for greater flexibility of the
mixed
relation. Likewise, we might specify
baked(in=grease)
if there is no oil in the
kitchen. The only requirement for the object supplied as
in
argument is that one can bake in it, so this
object must have some attribute
goodforbaking==TRUE
.
For contrast, we might imagine how the pancake recipe would be formulated in a more traditional, procedure-oriented approach. Ingredients and a spoon are again assumed to be provided.
MIX batter = flour + milk/2 . # what utensil?
MIX batter = batter + eggs .
MIX batter = batter + milk/2 .
BAKE batter IN oil .
BAKE batter IN water . # garbage in garbage out
The programmer of this recipe has defined the key procedures MIX
and
BAKE
, and has stipulated boundary conditions such as utensils and
temperatures. Optional arguments are allowed for the BAKE
command, but
only within the limits set by the programmer (see footnote 2).
So far, you may have thought that the difference between the two recipes was semantic rather than pragmatic. To demonstrate the greater flexibility of an object-oriented approach, let us consider the following variant of the recipe, again in quasi-R syntax:
# batter is done
while (number(pancakes)<2) # first bake 2 pancakes
pancake <- baked(batter,in=oil,with=pan,temp=max(fire))
feed(pancake,child) # feed one to hungry spectator
# define new function, data ’x’ split into ’n’ pieces
chopped <- function( x, n=1000 ) { return( split(x,n) ) }
pieces <- chopped(pancake) # new data object, array of 1000 pieces
batter <- mixed(batter,pieces) # mix pancake pieces into batter
# etc
Such complex relations between objects are quite difficult to specify,
if there are strong a priori limits to what one can MIX
or BAKE
.
Thus, object-oriented programs such as R allow for
greater flexibility than procedure-oriented programs.
If you are a user of the Praat
software (http://www.praat.org) then you are already familiar with this basic idea.
Praat
has an object window, listing the known objects.
These objects are the output of previous operations (e.g. Create, Read,
ToSpectrum), as well as input for subsequent operations (e.g. Write,
Draw). The classes or types of these objects are pre-defined (Sound,
Spectrum, Periodicity, etc). R takes the same idea even
further: users may create their own classes of data objects (e.g. a new
class SuperData
) and may create their own methods or relations to work with
such objects 2.
This object-oriented philosophy results in a different behavior than observed in procedure-oriented software:
There is an important difference in philosophy between S (and hence R) and the other main statistical systems. In S a statistical analysis is normally done as a series of steps, with intermediate results being stored in objects. Thus whereas SAS and SPSS will give copious output from a regression or discriminant analysis, R will give minimal output and store the results in a fit object for subsequent interrogation by further R functions.
from: https://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics