Chapter 2 Objects
2.1 Vectors
A vector is a simple, one-dimensional list of data, like a single column
in Excel or in SPSS. Typically a single vector holds a single variable
of interest. The data in a vector can be of various classes: numeric,
character (strings of letters, always enclosed in double quotes), or
logical (i.e., boolean, TRUE
or
FALSE
, may be abbreviated to
T
or F
).
c
: Atomic data are combined into a vector by means of thec
(combine, concatenate) operator.seq
The sequence operator, also abbreviated as a colon:
, creates subsequent values.
<- 1:5
x print(x)
## [1] 1 2 3 4 5
2*(x-1)
## [1] 0 2 4 6 8
Computations are also done on whole vectors, as exemplified above. In the last example, we see that the result of the computation is not assigned to a new object. Hence the result is displayed — and then lost. This may still be useful however when you use R as a pocket calculator.
rep
Finally, the repeat operator is very useful in creating repetitive sequences, e.g. for levels of an independent variable.
<- rep( 1:5, each=2 )
x x
## [1] 1 1 2 2 3 3 4 4 5 5
2.2 Factors
Factors constitute a special class of variables. A factor is a variable that holds categorical, character-like data. R realizes that variables of this class hold categorical data, and that the values are category labels or levels rather than real characters or digits, as illustrated in the examples below.
<- rep( 1:4, each=2 ) # create vector of numbers
x1 print(x1) # numeric
## [1] 1 1 2 2 3 3 4 4
summary(x1) # of numeric vector
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.75 2.50 2.50 3.25 4.00
<- as.character(x1) # convert to char
x2 print(x2) # character vector
## [1] "1" "1" "2" "2" "3" "3" "4" "4"
<- as.factor(x1) # convert to factor
x3 print(x3) # factor
## [1] 1 1 2 2 3 3 4 4
## Levels: 1 2 3 4
summary(x3) # compare against summary(x1) above
## 1 2 3 4
## 2 2 2 2
2.3 Complex objects
Simple objects, like the ones introduced above, may be combined into
composite objects. For example, we may combine all pancake ingredients
into a complex object of class list
.
In R we often use a particular complex object, a data frame, to hold various data together. A data frame is a complex object like an Excel worksheet or SPSS data sheet. The columns represent variables, and the rows represent single observations — these may be “cases” or sampling units, or single measurements repeated for each sampling unit, depending on the study 3.
The easiest way to create a data object is to read it from a plain-text
(ASCII) file, using the command read.table
.
(Windows users must remember to use double backslashes in the file
specification string). An optional header=TRUE
argument indicates whether the first line contains the names of the
variables; argument sep
specifies the
character(s) that separate the variables in the input file. The
file
argument can be a string specifying a
local file, or a url
to a web-based file, or a
call of function file.choose()
to select a file
interactively. Argument na.strings
specifies
the character string(s) that indicate missing values in the input file.
# in Windows system
myexp <- read.table(
file="f:\\temp\\mydata.txt", header=T, sep="," )
<- read.table(
nlspkr file=url("http://www.hugoquene.nl/emlar/intra.bysubj.txt"),
header=TRUE, na.strings=c("NA","MISSING") )
It is also possible to read so-called CSV files (comma-separated values) saved from Excel or SPSS (read.csv
), and it is also possible to read Excel or SPSS data files directly using extension packages (readxl::readxl
, foreign::read.spss
, see Chapter 8).
The basic R and extension packages already have many
datasets pre-defined, for immediate use.
To see a long(!) overview of these datasets, enter the command data()
.