# 8 Frequencies

## 8.1 Introduction

When analysing data, a distinction is often made between qualitative and quantitative methods. With the first method, observations (e.g. answers in interviews) are represented in words, and with the second method, observations (e.g. speech pauses in interviews) are represented in numbers. In our opinion, the difference between qualitative and quantitative methods lies in how observations are represented, and how arguments are made on the basis of these observations. Sometimes it is also possible to analyse the very same data (e.g. interviews) both qualitatively and quantitatively. The major advantages of quantitative methods are that the data can be summarised relatively straightforwardly (this is the subject of this part of the textbook), and that it is relatively simple to draw meaningful conclusions on the basis of the observations.

## 8.2 Frequencies

Quantitative data can be reported in various different ways. The most straightforward way would be to report the raw data, preferably sorted according to the observed variable’s value. The disadvantage of this is that a potential pattern in the observations will not be easily visible.

Example 8.1: Students \((N=50)\) in a first year course reported the following values for their shoe size, a variable of an interval level of measurement:

36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 42, 42, 43, 43, 44, ??.

One of the students did not provide an answer; this missing answer is shown here as ??.

It is usually more insightful and efficient to summarise observations
and report them in the form of a *frequency* for each value.
This frequency indicates the *number* of observations which have a certain value,
or which have a value in a certain interval or class. In order to get
the frequencies, we thus *count* the number of observations with a certain
value, or the number of observations in a certain interval. These
frequencies are reported in a table. We call such a table a
frequency distribution.

As a first example, Table 8.1 provides a
frequency distribution of a discrete variable of *nominal* level of measurement,
namely the phonological class of sounds in Dutch (Luyckx et al. 2007).
#52-56

main.class | sub.class | count |
---|---|---|

C | plos | 585999 |

C | fric | 426097 |

C | liq | 249275 |

C | nas | 361742 |

C | glide | 146344 |

V | lang | 365887 |

V | kort | 428832 |

V | schwa | 341260 |

V | diph | 61638 |

V | rest | 1146 |

As a second example, Table 8.2 provides a
frequency distribution of a continuous variable of *interval*
level of measurement, namely the aforementioned shoe size
of first year students (Example 8.1).

Shoe size | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | ?? |

Number | 2 | 6 | 6 | 19 | 6 | 5 | 2 | 2 | 1 | 1 |

Nevertheless, when a numerical variable is able to assume a great many different values, the frequency distribution thus consequently becomes large and confusing. We then add together values in a certain interval, and afterwards make a frequency distribution on the smaller number of intervals or classes.

Example 8.2: When Queen Beatrix of the Netherlands was giving her last Queen’s Speech, on 18th September 2012, she paused some \(305\times\). The frequency distribution of the pause length (measured in seconds) is shown in Table 8.3.

Interval | Number |
---|---|

4.50–4.99 | 1 |

4.00–4.49 | 0 |

3.50–3.99 | 2 |

3.00–3.49 | 7 |

2.50–2.99 | 4 |

2.00–2.49 | 25 |

1.50–1.99 | 32 |

1.00–1.49 | 16 |

0.50–0.99 | 67 |

0.00–0.49 | 151 |

### 8.2.1 Intervals

For a variable of nominal and ordinal level of measurement, we generally use the original categories to make the frequency distribution (see Table 8.1), although it is possible to add categories together. For a variable of interval or ratio level of measurement, a researcher can choose the number of intervals in the frequency distribution themself. Sometimes that is not necessary, for instance because the variable has a clear number of different discrete values (see Table 8.2). However, sometimes, as a researcher you have to decide for yourself how many intervals you should distinguish, and how to determine the interval boundaries (see Table 8.3). In this instance, the following are recommended (Ferguson and Takane 1989, Ch.2):

Ensure that all observations (i.e. the entire range) fall into roughly 10 to 20 intervals.

Ensure that all intervals are equally wide.

Make the lower limit of the first or second interval the same as the width of the intervals (see Table 8.3: every interval is 0.50 s wide, and the second interval’s lower limit is also 0.50).

Order the intervals in a frequency distribution from bottom to top in increasing order (i.e. from top to bottom in descending order), see Table 8.3).

The wider we make the intervals, the more information we lose about the precise distribution within each interval.

### 8.2.2 SPSS

`Analyze > Descriptive Statistics > Frequencies...`

Select variable (drag to the “Variable(s)” panel).

Tick: `Display frequency tables`

.

Choose `Format`

, choose: `Order by: Descending values`

.

Confirm with `OK`

.

### 8.2.3 JASP

From the top bar, choose

`Descriptives`

Select the variable(s) to summarize and place them into the field “Variables.” Check the option `Frequency tables (nominal and ordinal variables)`

(below the field with Variables) to obtain frequency tables for all selected variables.

If a variable has higher measurement level (“Scale,” i.e. interval or ratio, Chapter 4), and you nevertheless want a frequency table of that variable, then you need to adjust the measurement level of that variable to “Nominal” or “Ordinal.” In JASP, you can do so by clicking on the icon for measurement level, preceding the variable name in the column header in the data window.

You may also want to divide or “cut up” the continuous variable into discrete intervals. This is achieved by first creating a new empty variable, and subsequently filling that variable with the discrete interval values. The resulting variable will be of ordinal measurement level.

To create a new variable, click on the **+** button to the right of the last column name in the data tab. A “Create Computed Column” panel appears, where you can enter a name for the new variable, e.g. `troon12_ordinal`

. You can also choose between `R`

and a pointer. These are the two options in JASP to define formulas with which the new (empty) variable is filled; using `R`

code, or manually using JASP. The paragraphs below explain how to create an interval variable using these two options. Finally, you can check which measurement level the new variable should be (see Chapter 4), this should be `Ordinal`

. Next, click on `Create Column`

to create the new variable. The new variable (empty column) appears as the rightmost variable in the data set.

If the **R** option is chosen to define the new variable, a field with “#Enter your R code here :)” appears above the data. Here you can enter R code that generates random numbers using R functions. Enter the R code (see below) and click on `Compute column`

at the bottom of the field to fill the empty variable with the numbers generated by the R code.

This snippet of R code cuts up the continuous variable `troon2012`

into the discrete intervals shown in Table 8.3:

`cut( troon2012, breaks = seq(0, 5, 0.5) ) `

Parse this snippet from the innermost brackets outwards:
(i) `seq`

: make a sequence from 0 to 5 (units, here: seconds) in increments of 0.5 seconds,
(ii) `cut`

: cut up the variable (troon2012) in intervals based on this sequence.

If you do not know the width and range of the intervals, but instead you wish to obtain a particular number (say 10) of equal intervals, then use the following snippet:

`cut( troon2012, 10 ) `

The first interval starts at the lowest score, and the last interval ends at the highest value of `troon12`

, yielding `10`

intervals in total.

If the **pointer** or **manual** option is chosen to define the new variable, a work sheet will appear above the data. To the left of the work sheet are the variables, above it are math symbols, and to the right of the work sheet are several functions. From those functions you can pick one to generate your random numbers. If something goes wrong, items on the work sheet can be erased by dragging them to the trash bin on the lower right bottom. After you have completed the specification on the work sheet, then click on the button `Compute column`

under the work sheet, to fill the new variable with the ordinal values.

In order to cut up a continuous variable into equal and discrete intervals, pick the function `cut(y)`

from the right. Drag the input continuous variable to the “values” field and enter the number of resulting intervals as “numBreaks” value. If you want to specify the width and number of intervals yourself, then use the R code option (see above).

Now, at last, you can make the frequency table of the ordinal variable created above, as described in the first paragraph of this subsection. Select the newly created ordinal variable(s) to summarize and place them into the field “Variables.” Check the option `Frequency tables (nominal and ordinal variables)`

(below the field with Variables) to obtain frequency tables for all selected variables.

### 8.2.4 R

```
<- read.table(
enq2011 file=url("http://www.hugoquene.nl/R/enq2011.txt"),
header=TRUE )
table( enq2011$shoe, useNA="ifany" )
```

The output of the above `table`

command is shown in Table 8.2.
The code `NA`

(Not Available) is used in R to indicate missing data.

`table( cut( troon2012, breaks=seq(from=0,to=5,by=0.5) ) )`

Parse this task from the innermost brackets outwards:
(i) `seq`

: make a sequence from 0 to 5 (units, here: seconds) in increments of 0.5 seconds,
(ii) `cut`

: cut up the dependent variable `length`

in intervals based on this sequence,
(iii) `table`

: make a frequency distribution of these intervals.

This task’s output is shown (in edited form) in Table 8.3.

## 8.3 Bar charts

A bar chart is the graphical representation of the
frequency distribution of a discrete, categorical variable (of
nominal or ordinal level of measurement). A bar chart is constructed
of rectangles. All rectangles are equally wide, and the
rectangle’s height corresponds with the frequency of that category. The
surface area of each rectangle thus also corresponds with that category’s frequency.
In contrast to a histogram, the rectangles are *not*
joined up to each other along the horizontal axis, to
show that we are dealing with discrete categories.

A bar chart helps us to determine at a glance the most important
distributional characteristics of a discrete variable:
the most characteristic (most frequently occurring) value, and the
distribution across categories. For the sound frequencies
in Dutch (Figure 8.1), we
see that amongst the consonants the plosives
occur the most, that amongst the vowels the
short vowels occur the most, that diphthongs
are not used much (the sounds in Dutch *ei, ui, au*), and that
more consonants are used compared with vowels.

Tip: Avoid shading and other 3D-effects in a bar chart! These make the width and height of a rectangle less readable, and the visible surface area of a shaded rectangle or of a bar no longer corresponds well with the frequency.

## 8.4 Histograms

A histogram is the graphical representation of a frequency distribution of a continuous, numerical variable (of interval or ratio level of measurement). A histogram is constructed of rectangles. The width of each rectangle corresponds with the interval width (a rectangle can also be one unit wide) and the height corresponds with the frequency of that interval or value. The surface area of each rectangle therefore corresponds with the frequency. In contrast to a bar chart, the rectangles do join up to each other along the horizontal axis.

A histogram helps us to determine at a glance the most important distributional characteristics of a continuous variable: the most characteristic (most frequently occurring) value, the degree of dispersion, the number of peaks in the frequency distribution, the position of the peaks, and potential outliers. (see §9.4.2). For the pauses in the Queen’s Speech of 2012 (Figure 8.2), we see that the majority of pauses last between 0.25 and 0.75 s (these are presumably pauses for breath), that there are two peaks in the distribution (the second peak is at 2 s), and that there is one extremely long pause (with a duration of almost 5 s).

Tip: Avoid shading and other 3D-effects in a histogram! These make the width and height of a rectangle less readable, and the visible surface area of a shaded rectangle or of a bar no longer correspond well with the frequency.

### 8.4.1 SPSS

`Analyze > Descriptive Statistics > Frequencies...`

Select variable (drag to the “Variable(s)” panel).

Choose `Charts`

, then pick `Chart type: Bar chart`

for a
bar chart or `Chart type: Histogram`

for a histogram (see
the above text for the difference between these options).

Confirm with `OK`

.

### 8.4.2 JASP

From the top bar, choose

`Descriptives`

Select the variable(s) to summarize and place them into the field “Variables.” Open the “Plots” field and check the option `Distribution plots`

(under the header “Basic plots”).

JASP will automatically make a bar chart if the variable is discrete (nominal or ordinal), and it will make a histogram if the variable is a scalar, i.e., numerical and continuous (interval or ratio). Ensure that the correct measurement level is specified for your variables. In JASP, you can set the measurement level by clicking on the icon for measurement level, which precedes the variable name in the column header in the data window.

### 8.4.3 R

You can make a bar chart like Figure 8.1 in R with the following commands:

```
# read data
<- read.table( file="data/klankfreq.txt", header=T )
klankfreq # 20201130 column names in English
dimnames(klankfreq)[[2]] <- c("main.class","sub.class","count")
# make barplot from column `count` in dataset `klankfreq`
with( klankfreq, barplot( count, beside=T,
ylab="Frequency",
main="Frequencies of speech sounds in Dutch (N=2968220)",
col=ifelse(klankfreq[,1]=="V","grey40","grey20") ) ) -> klankfreq_barplot
# make labels along the bottommost horizontal axis
axis(side=1, at=klankfreq_barplot, labels=klankfreq$main.class)
axis(side=1, at=klankfreq_barplot, tick=F, line=1, labels=klankfreq$sub.class )
# or simpler: with(klankfreq, barplot(count) ) # all defaults
```

You can make a histogram like in Figure 8.2 in R with the follow commands:

```
# read dataset
load(file="data/pauses6.Rda")
# extract pause lengths (columns 12) for the year 2012, into a separate dataset `troon2012`
<- pauses6[ pauses6$jaar==2012, 12 ] # save col_12 as single vector
troon2012 # make histogram
hist( troon2012,
breaks=seq(0, 5, by=0.25),
col="grey80",
xlab="Length of pause (s)", ylab="Frequency",
main="Queen's Speech 2012 (N=305 pauses)" ) -> troonrede2012pauzes_hist
```

### References

*Statistical Analysis in Psychology and Education*. 6e ed. New York: McGraw-Hill.

*Tussen Taal, Spelling En Onderwijs: Essays Bij Het Emeritaat van Frans Daems*, edited by D. Sandra, R. Rijmenans, P. Cuvelier, and P. Van Petegem. Academia Press. https://www.academia.edu/627882/Klankfrequenties_in_het_Nederlands.