Chapter 3 Basic operations

3.1 Basics

3.1.1 <-

This is the assignment operator: the expression to its right is evaluated (if applicable) and then assigned to the object on the left of the operator. Hence the expression a <- 10 means that the object a, a single number, “gets” the value of 10, i.e. the value of 10 is assigned to a. The symbol resembles an arrow in the direction of assignment. The assignment may also be in the other direction, with symbol -> (and see note 4). There should be no space between the two characters making up the arrow. Use spaces or brackets to avoid ambiguities and errors:

x <- 10 # assignment of atomic value 10 to object x
x < - 10 # is value of x less than -10 ?
## [1] FALSE

3.1.2 #

indicates a comment: everything following this symbol, on the same line of input, is ignored.

3.1.3 scan

This command reads a simple vector from the keyboard. Make sure to assign the result to a new object! Read in the numbers 1 to 10, and assign them to a new object.

3.1.4 objects

This command shows a list of all objects in memory (similar to the contents of the Praat Objects window). With objects(pattern="abc") the list is filtered so that only the objects matching the pattern string "abc" are shown.

3.1.5 rm

Objects are removed forever with this command.

3.1.6 print

Contents of an object can be inspected with this command, or by just entering the name of the object, as in some examples above.

3.1.7 summary

This command offers a summary of an object. The result depends on the data class of the object, as illustrated in section REF above.

3.1.8 Workspace:

R holds its objects in memory. The whole workspace, containing all data objects, can be stored from the Rstudio window (Session > Save Workspace As...). This allows you to save a session, and continue your work later (Session > Load Workspace...).

3.1.9 save

(to write) and

3.1.10 read

(to read) an object from/to memory to hard disk. By default, R data objects have the extension .Rda.

3.1.11 No undo

Remember that there is no undo command, nor such a menu option. Save your work regularly. If in doubt, work with scratch copies of your data sets.

3.2 Subselection

Subselection within an object is a very powerful tool in R. The subselection operator x[…] selects only those data from object x that match the expression within square brackets. This expression can be a single index number, a sequence or list of numbers, or an evaluated expression, as illustrated in the following example.

In the following example, variable x contains 30 numbers, but 3 of these are NA. Notice that the output of is.na is the input of table.

# is.na() returns TRUE/FALSE for each element of ’x’. 
# table() summarizes categorical data 
table( is.na(x) ) 
## 
## FALSE  TRUE 
##    27     3
ok <- !is.na( x ) # exclamation mark means NOT 
which( !ok ) # which index numbers are NOT ok? inspect! 
## [1] 11 13 19
mean( x[ok] ) # select ok values, compute mean, display 
## [1] 1.015252

Subselection can also be achieved by using the function subset(data, subset, select). The first argument is the input data (set), the second argument is the selector condition, and the optional third argument indicates which columns of a data frame should be kept in the output.

require(hqmisc)
data(talkers)
subset( talkers, subset=( age<45 & region=="W" ) )
##     id sex age region syldur  nsyl
## 1   60   1  38      W 0.1940 13.56
## 3   62   1  36      W 0.2331 11.73
## 4  112   1  33      W 0.2633 11.67
## 45 153   0  39      W 0.2676  6.36
## 50 158   1  40      W 0.2131  7.99
## 51 159   0  25      W 0.2152  8.11
## 52 160   0  26      W 0.2104  8.54
## 53 161   0  27      W 0.2459  8.89
## 55 163   0  33      W 0.2287  7.60
## 80 391   1  34      W 0.2225  8.89

This command selects rows from data frame talkers from the package hqmisc
(see 8) corresponding to speakers who are under 45 years of age, and who are from the West region.

3.3 Split, merge, reshape

There are useful functions available to split and merge data frames. First we create two example data frames. The first data frame has a list of English vowels, with a phonological feature for each vowel, and with the average frequency of the second formant5 of each vowel (Peterson and Barney 1952) spoken by male speakers. The second data frame has a partially overlapping list of vowels, with key words by John Wells 6.

vowelsymb <- c( "i","I","e","E","ae", "A","V","o","U","u", "@" )
v1df <- data.frame( vowel=vowelsymb,
                    feat=factor( c(rep("front",5),
                                   rep("back",5),
                                   "central" ) ),
                    F2=c( 2290,1990,NA,1840,1720, 
                          1090,1190,NA,1020,870,
                          NA ) )
v2df <- data.frame( vowel=vowelsymb[1:10], 
                    word=c("fleece","kit","face","dress","trap", 
                           "lot","strut","goat","foot","goose") )

3.3.1 split

This command divides the data in the first argument (column F2 in data frame v1df) into the groups defined by the second argument.

with( v1df, split(F2,feat) ) -> v1list 

This is particularly helpful in combination with lapply to apply a function to these grouped data, e.g. to compute the mean of F2 for each vowel category separately:

with(v1df, lapply( split(F2,feat), mean, na.rm=T ) ) 
## $back
## [1] 1042.5
## 
## $central
## [1] NaN
## 
## $front
## [1] 1960

Also see unsplit.

3.3.2 merge

This command merges two data frames, by common columns. Specify argument all=TRUE if you want to include non-matching rows in the output, with NA’s in the appropriate columns.

m1 <- merge(v1df, v2df)

The resulting merged data frame is also sorted on the common columns, unless argument sort=FALSE.

3.3.3 reshape

If you need to perform a Repeated Measures (within-subjects) analysis of variance (RM-ANOVA) in SPSS, your data have to be in “wide” data layout, with all observations from one subject on a single data line. R on the other hand uses the “long” data layout, with one observation per line, and with all descriptors of that observation repeated for each line. There is a convenient command reshape to convert data between the wide layout (of SPSS RM-ANOVA) and the long layout (of R ). To illustrate, first we read a wide data set:

widedata <- read.table( 
  file=url("http://www.hugoquene.nl/emlar/widedata.txt"),
  header=T)

The wide data show the subject id, between-subject group, and three within-subject observations, for 6 subjects (with leading row numbers):

head(widedata)
##   subject group item1 item2 item3
## 1       1     1     2     3     4
## 2       2     1     3     4     6
## 3       3     1     1     3     6
## 4       4     2     2     4     5
## 5       5     2     4     5     6
## 6       6     2     2     5     7

These data are then reshaped to long layout with the following command:

longdata <- reshape( widedata, direction="long",
                     varying=c("item1","item2","item3"),
                     timevar="item", times=c("1","2","3"),
                     v.names="resp", idvar="subject")

The observations from multiple columns varying are collected into a new single column v.names, using identifiers in column idvar. The information contained in the multiple column names of varying is stored in a new single column timevar, using the values times. Inspect the two data frames to verify this.

References

Peterson, G. E., and H. L. Barney. 1952. “Control Methods in a Study of the Vowels.” Journal of the Acoustical Society of America 24 (2): 175–84.