Chapter 3 Basic operations
This is the assignment operator: the expression to its right is
evaluated (if applicable) and then assigned to the object on the
left of the operator. Hence the expression
a <- 10 means that the object
a, a single
number, “gets” the value of 10, i.e. the value of 10 is assigned to
The symbol resembles
an arrow in the direction of assignment. The assignment may also be
in the other direction, with symbol
-> (and see note4).
There should be no space between the two characters making up the arrow.
Use spaces or brackets to avoid ambiguities and errors:
##  FALSE
indicates a comment: everything following this symbol, on the same line of input, is ignored.
This command reads a simple vector from the keyboard. Make sure to assign the result to a new object! Read in the numbers 1 to 10, and assign them to a new object.
This command shows a list of all objects in memory (similar to the
contents of the
Praat Objects window). With
objects(pattern="abc") the list is filtered so that only the objects matching the pattern string
"abc" are shown.
Objects are removed forever with this command.
Contents of an object can be inspected with this command, or by just entering the name of the object, as in some examples above.
This command offers a summary of an object. The result depends on the data class of the object, as illustrated in section REF above.
R holds its objects in memory. The whole workspace,
containing all data objects, can be stored from the
Rstudio window (
Session > Save Workspace As...).
This allows you to save a
session, and continue your work later (
Session > Load Workspace...).
(to write) and
(to read) an object from/to memory to hard disk. By default,
R data objects have the extension
Remember that there is no
undo command, nor such a menu
option. Save your work regularly. If in doubt, work with scratch
copies of your data sets.
Subselection within an object is a very powerful tool in
R. The subselection operator
x[…] selects only those data from object
that match the expression within square brackets. This expression can be a
single index number, a sequence or list of numbers, or an evaluated
expression, as illustrated in the following example.
In the following example, variable
30 numbers, but 3 of these are
NA. Notice that
the output of
is.na is the input of
## ## FALSE TRUE ## 27 3
##  11 13 19
##  1.015252
Subselection can also be achieved by using the function
subset(data, subset, select). The first
argument is the input data (set), the second argument is the selector
condition, and the optional third argument indicates which columns of a
data frame should be kept in the output.
## id sex age region syldur nsyl ## 1 60 1 38 W 0.1940 13.56 ## 3 62 1 36 W 0.2331 11.73 ## 4 112 1 33 W 0.2633 11.67 ## 45 153 0 39 W 0.2676 6.36 ## 50 158 1 40 W 0.2131 7.99 ## 51 159 0 25 W 0.2152 8.11 ## 52 160 0 26 W 0.2104 8.54 ## 53 161 0 27 W 0.2459 8.89 ## 55 163 0 33 W 0.2287 7.60 ## 80 391 1 34 W 0.2225 8.89
This command selects rows from data frame
talkers from the package
(see 8) corresponding to speakers who are under 45 years of age, and who are from the West region.
3.3 Split, merge, reshape
There are useful functions available to split and merge data frames. First we create two example data frames. The first data frame has a list of English vowels, with a phonological feature for each vowel, and with the average frequency of the second formant5 of each vowel (Peterson and Barney 1952) spoken by male speakers. The second data frame has a partially overlapping list of vowels, with key words by John Wells.6
vowelsymb <- c( "i","I","e","E","ae", "A","V","o","U","u", "@" ) v1df <- data.frame( vowel=vowelsymb, feat=factor( c(rep("front",5), rep("back",5), "central" ) ), F2=c( 2290,1990,NA,1840,1720, 1090,1190,NA,1020,870, NA ) ) v2df <- data.frame( vowel=vowelsymb[1:10], word=c("fleece","kit","face","dress","trap", "lot","strut","goat","foot","goose") )
This command divides the data in the first argument (column
v1df) into the groups defined by the second argument.
This is particularly helpful in combination with
lapply to apply a function to these grouped
data, e.g. to compute the mean of F2 for each vowel category
## $back ##  1042.5 ## ## $central ##  NaN ## ## $front ##  1960
This command merges two data frames, by common columns. Specify
all=TRUE if you want to include non-matching rows in the
output, with NA’s in the appropriate columns.
The resulting merged data frame is also sorted on the common
columns, unless argument
If you need to perform a
Repeated Measures (within-subjects) analysis of variance (RM-ANOVA)
in SPSS, your data have to be in “wide” data layout, with all
observations from one subject on a single data line.
R on the other hand uses the “long” data layout, with
one observation per line, and with all descriptors of that
observation repeated for each line. There is a convenient command
reshape to convert data between the wide
layout (of SPSS RM-ANOVA) and the long layout (of R ).
To illustrate, first we read a wide data set:
The wide data show the subject id, between-subject group, and three within-subject observations, for 6 subjects (with leading row numbers):
## subject group item1 item2 item3 ## 1 1 1 2 3 4 ## 2 2 1 3 4 6 ## 3 3 1 1 3 6 ## 4 4 2 2 4 5 ## 5 5 2 4 5 6 ## 6 6 2 2 5 7
These data are then reshaped to long layout with the following command:
The observations from multiple columns
collected into a new single column
identifiers in column
information contained in the multiple column names of
varying is stored in a new single column
timevar, using the values
Inspect the two data frames to verify this.
Peterson, G.E., and H.L. Barney. 1952. “Control Methods in a Study of the Vowels.” Journal of the Acoustical Society of America 24 (2): 175–84.