Chapter 2 Objects

2.1 Vectors

A vector is a simple, one-dimensional list of data, like a single column in Excel or in SPSS. Typically a single vector holds a single variable of interest. The data in a vector can be of various classes: numeric, character (strings of letters, always enclosed in double quotes), or logical (i.e., boolean, TRUE or FALSE, may be abbreviated to T or F).

  • c: Atomic data are combined into a vector by means of the c (combine, concatenate) operator.

  • seq The sequence operator, also abbreviated as a colon :, creates subsequent values.

## [1] 1 2 3 4 5
## [1] 0 2 4 6 8

Computations are also done on whole vectors, as exemplified above. In the last example, we see that the result of the computation is not assigned to a new object. Hence the result is displayed — and then lost. This may still be useful however when you use R as a pocket calculator.

  • rep Finally, the repeat operator is very useful in creating repetitive sequences, e.g. for levels of an independent variable.
##  [1] 1 1 2 2 3 3 4 4 5 5

2.2 Factors

Factors constitute a special class of variables. A factor is a variable that holds categorical, character-like data. R realizes that variables of this class hold categorical data, and that the values are category labels or levels rather than real characters or digits, as illustrated in the examples below.

## [1] 1 1 2 2 3 3 4 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.75    2.50    2.50    3.25    4.00
## [1] "1" "1" "2" "2" "3" "3" "4" "4"
## [1] 1 1 2 2 3 3 4 4
## Levels: 1 2 3 4
## 1 2 3 4 
## 2 2 2 2

2.3 Complex objects

Simple objects, like the ones introduced above, may be combined into composite objects. For example, we may combine all pancake ingredients into a complex object of class list.

In R we often use a particular complex object, a data frame, to hold various data together. A data frame is a complex object like an Excel worksheet or SPSS data sheet. The columns represent variables, and the rows represent single observations — these may be “cases” or sampling units, or single measurements repeated for each sampling unit, depending on the study.3

The easiest way to create a data object is to read it from a plain-text (ASCII) file, using the command read.table. (Windows users must remember to use double backslashes in the file specification string). An optional header=TRUE argument indicates whether the first line contains the names of the variables; argument sep specifies the character(s) that separate the variables in the input file. The file argument can be a string specifying a local file, or a url to a web-based file, or a call of function file.choose() to select a file interactively. Argument na.strings specifies the character string(s) that indicate missing values in the input file.

# in Windows system
myexp <- read.table(
  file="f:\\temp\\mydata.txt", header=T, sep="," )

It is also possible to read so-called CSV files (comma-separated values) saved from Excel or SPSS (read.csv), and it is also possible to read Excel or SPSS data files directly using extension packages (readxl::readxl, foreign::read.spss, see Chapter 8).

The basic R and extension packages already have many datasets pre-defined, for immediate use. To see a long(!) overview of these datasets, enter the command data().

  1. For repeated measures analyses, R does not require a multivariate or “wide” layout, with repeated measures for each participant on a single row, as SPSS does. Instead R always uses a univariate or “long” layout, with each measurement on a single row of input. See the reshape command to convert between layouts, discussed in the next chapter, section @ref(sec:split.merge.reshape).