This notebook contains the workings behind the chart of ‘most important issues’ in wave 13 of the British Election Study, which I have placed on my website. This was based on the answers to the question, ‘As far as you’re concerned, what is the SINGLE MOST important issue facing the country at the present time?’

The data for wave 13 of the BES was collected just after the June 2017 General Election. This needs to be imported into R before the code below will work. I converted the data for waves 1-13 into a couple of RDS files, but I’m not uploading those because the data is not mine.

The ‘most important issues’ are interesting because this was a free text question, i.e. respondents typed in their own answers rather than selecting from options that had been chosen by the researchers.

Altogether 28463 people provided answers to the question.

So let’s start by finding out what the most common answers they provided were:

issue.counts <- as.data.frame(table(df.MII$MII_textW13))
issue.counts <- issue.counts[2:nrow(issue.counts),]
colnames(issue.counts) <- c('Issue', 'Count')
issue.counts <- issue.counts[order(issue.counts$Count, decreasing=TRUE),]
knitr::kable(issue.counts[1:10, ], row.names=FALSE)
Issue Count
Brexit 6520
brexit 2403
Terrorism 1558
Immigration 899
NHS 860
terrorism 856
immigration 702
Economy 547
BREXIT 422
economy 343

As we see, the problem with allowing survey respondents to type their own answers is that they often type the same answers in different ways. So let’s recode those.

# Change everything to lower case
answers <- tolower(as.character(df.MII$MII_textW13))
# Get rid of definite and indefinite articles at beginning of answers
answers <- str_replace(answers, '^(the|a|an) ', '')
# Get rid of definite and indefinite articles in the middle of answers
answers <- str_replace(answers, ' (the|a|an) ', ' ')
# Get rid of trailing space
answers <- trim(answers)
# Define list of synonyms
syn.list <- list('brexit' = c('eu', 'leaving eu', 'getting out of eu', 'britexit', 'europe', 'britex', 'bretix', 'leaving european union', 'brixit', 'stopping brexit', 'brevet', 'hard brexit', 'brexit negotiations', 'brexit deal', 'btexit', 'brexot', 'getting out of europe', 'eu exit'),
                 'terrorism' = c('terror', 'terroism', 'terrorists', 'islamic terrorism', 'terror attacks', 'isis', 'terrorisim', 'terrorist', 'terrisom', 'terrorist attacks', 'terrorist threat', 'terriosim', 'terriorism', 'terror threat', 'terrosim'),
                 'immigration' = c('imigration', 'immagration', 'immigrants', 'immegration', 'imagration', 'immergration', 'uncontrolled immigration'),
                 'nhs' = c('health service', 'health care', 'n h s', 'nhs funding', 'nhs crisis', 'national health','health', 'healthcare', 'nhs underfunding', 'saving nhs', 'state of nhs', 'nhs cuts', 'national health service'),
                 'economy' = c('money', 'debt', 'unemployment', 'inflation', 'cost of living', 'national debt', 'employment', 'wages'),
                 'inequality' = c('social inequality', 'wealth inequality', 'poverty', 'austerity', 'gap between rich and poor', 'income inequality', 'homelessness'),
                 'tories' = c('theresa may', 'teresa may', 'conservatives',
                              'conservative party', 'conservative government',
                              'new government', 'tory party', 'tory government',
                              'prime minister', 'getting tories out'),
                 'election result' = c('election results', 'hung parliament', 'government', 'goverment', 'stable government', 'unstable government', 'election', 'forming government', 'lack of government', 'minority government', 'coalition', 'weak government', 'stability of government', 'political stability', 'political instability', 'election', 'lack of government', 'political chaos', 'general election result', 'government stability', 'state of government', 'forming stable government', 'getting stable government', 'coalition government'),
                 'labour' = c('corbyn', 'jeremy corbyn', 'labour party'),
                 'muslims' = c('islam', 'islamic extremism'),
                 'dup' = c('coalition with dup'),
                 'environment' = c('climate change', 'global warming'),
                 'welfare' = c('benefits'),
                 'housing' = c('housing crisis', 'affordable housing'))
# Replace groups of common synonyms with a single term
for (syn in names(syn.list)) {
  answers[answers %in% syn.list[[syn]]] <- syn
}

Having done that, we can get a clearer picture of the most commonly given ‘most important issues’:

# Most common issues, following re-coding
recoded.issue.counts <- as.data.frame(table(answers))
recoded.issue.counts <- recoded.issue.counts[2:nrow(recoded.issue.counts),]
colnames(recoded.issue.counts) <- c('Issue', 'Count')
recoded.issue.counts <- recoded.issue.counts[order(recoded.issue.counts$Count, decreasing=TRUE),]
knitr::kable(recoded.issue.counts[1:10, ], row.names=FALSE)
Issue Count
brexit 9939
terrorism 2929
immigration 1852
nhs 1786
election result 1547
economy 1379
inequality 666
tories 379
environment 286
housing 229

Now let’s focus on the top eight (because anything smaller makes the bar chart too hard to read), and match them together with the social grades of the people who gave those eight.

# Create a new data frame giving each panelist's MII and social grade, leaving out any with no given MII or with no known social grade
df.new <- as.data.frame(cbind(as.character(df.panel$profile_socialgrade_cieW13), answers))
colnames(df.new) <- c('grade', 'issue')
df.new <- df.new[!is.na(df.new$grade) & df.new$grade != 'Unknown' & df.new$grade != 'Refused' & !is.na(df.new$issue),]
# Keep the eight most commonly chosen MIIs; re-code the rest as 'other'
sum.issue <- summary(df.new$issue)
main.issues <- names(sum.issue[names(sum.issue) != '(Other)'])[1:8]
df.new$issue <- factor(df.new$issue, levels=c(main.issues, 'other'))
df.new[!df.new$issue %in% main.issues, 2] <- 'other'
df.new <- droplevels(df.new)

Finally, we draw a stacked bar chart. The total height of each bar is normalised so that proportions can be compared. No further weighting of responses has been done.

# Define colours (wouldn't be necessary if I didn't specifically want 'other' to be grey)
fill.cols <- hcl(h=seq(15, 375, length=length(levels(df.new$issue))-1), c=100, l=65)
names(fill.cols) <- levels(df.new$issue)[!levels(df.new$issue) == 'other']
fill.cols['other'] <- "grey40"
# Draw bar chart using scale_fill_manual (only necessary because of the above)
ggplot(df.new) + geom_bar(aes(x=grade, fill=issue), position='fill') + scale_fill_manual(values=fill.cols) + theme_minimal() + ylab(label='') + xlab(label='') + ggtitle('BES wave 13: most important issues')

Brexit comes top in every group, but as we move across the social grades from A to D, the proportion of respondents giving it as the most important issue declines and the proportion of respondents giving terrorism or the NHS as the most important issue increases; social grade E continues this trend with regard to the NHS, but reverses it with regard to terrorism. More respondents in ‘working class’ social grades (C2DE) than in ‘middle class’ social grades (ABC1) give immigration as the most important issue, while fewer go for inequality (which in the recoding scheme used here includes poverty and homelessness) or the economy (which in the recoding scheme used here includes employment, unemployment, cost of living, and wages).

More respondents in the lowest social grade (E) gave ‘other’ responses, which makes comparisons more difficult, especially as there were fewer respondents in grades D and E than in any of the other grades. Total numbers in each social grade for this wave who answered the ‘most important issue’ question are as follows:

summary(df.new$grade)
   A    B   C1   C2    D    E 
3968 5485 7790 4586 2685 3451 
---
title: "'Middle class problems?' Social grade and the 'most important issue' in wave 13 of the British Election Study"
output: html_notebook
---

This notebook contains the workings behind the chart of 'most important issues' in wave 13 of the [British Election Study](http://www.britishelectionstudy.com/), which I have placed on [my website](http://www.danielallington.net/2017/08/middle-class-social-grade-most-important-issue-british-election-study). This was based on the answers to the question, 'As far as you're concerned, what is the SINGLE MOST important issue facing the country at the present time?'

The data for wave 13 of the BES was collected just after the June 2017 General Election. This needs to be imported into R before the code below will work. I converted the data for waves 1-13 into a couple of RDS files, but I'm not uploading those because the data is not mine.

The 'most important issues' are interesting because this was a free text question, i.e. respondents typed in their own answers rather than selecting from options that had been chosen by the researchers.

```{r, include=FALSE}
library(tidyverse)
library(stringr)
library(gdata)

df.MII <- readRDS('bes_1-13_string_MII.RDS')
df.panel <- readRDS('bes_1-13_Panel.RDS')
```

Altogether `r sum(!is.na(df.MII$MII_textW13))` people provided answers to the question. 

So let's start by finding out what the most common answers they provided were:

```{r, results='asis'}
issue.counts <- as.data.frame(table(df.MII$MII_textW13))
issue.counts <- issue.counts[2:nrow(issue.counts),]
colnames(issue.counts) <- c('Issue', 'Count')
issue.counts <- issue.counts[order(issue.counts$Count, decreasing=TRUE),]
knitr::kable(issue.counts[1:10, ], row.names=FALSE)
```

As we see, the problem with allowing survey respondents to type their own answers is that they often type the same answers in different ways. So let's recode those.

```{r}
# Change everything to lower case
answers <- tolower(as.character(df.MII$MII_textW13))
# Get rid of definite and indefinite articles at beginning of answers
answers <- str_replace(answers, '^(the|a|an) ', '')
# Get rid of definite and indefinite articles in the middle of answers
answers <- str_replace(answers, ' (the|a|an) ', ' ')
# Get rid of trailing space
answers <- trim(answers)

# Define list of synonyms
syn.list <- list('brexit' = c('eu', 'leaving eu', 'getting out of eu', 'britexit', 'europe', 'britex', 'bretix', 'leaving european union', 'brixit', 'stopping brexit', 'brevet', 'hard brexit', 'brexit negotiations', 'brexit deal', 'btexit', 'brexot', 'getting out of europe', 'eu exit'),
                 'terrorism' = c('terror', 'terroism', 'terrorists', 'islamic terrorism', 'terror attacks', 'isis', 'terrorisim', 'terrorist', 'terrisom', 'terrorist attacks', 'terrorist threat', 'terriosim', 'terriorism', 'terror threat', 'terrosim'),
                 'immigration' = c('imigration', 'immagration', 'immigrants', 'immegration', 'imagration', 'immergration', 'uncontrolled immigration'),
                 'nhs' = c('health service', 'health care', 'n h s', 'nhs funding', 'nhs crisis', 'national health','health', 'healthcare', 'nhs underfunding', 'saving nhs', 'state of nhs', 'nhs cuts', 'national health service'),
                 'economy' = c('money', 'debt', 'unemployment', 'inflation', 'cost of living', 'national debt', 'employment', 'wages'),
                 'inequality' = c('social inequality', 'wealth inequality', 'poverty', 'austerity', 'gap between rich and poor', 'income inequality', 'homelessness'),
                 'tories' = c('theresa may', 'teresa may', 'conservatives',
                              'conservative party', 'conservative government',
                              'new government', 'tory party', 'tory government',
                              'prime minister', 'getting tories out'),
                 'election result' = c('election results', 'hung parliament', 'government', 'goverment', 'stable government', 'unstable government', 'election', 'forming government', 'lack of government', 'minority government', 'coalition', 'weak government', 'stability of government', 'political stability', 'political instability', 'election', 'lack of government', 'political chaos', 'general election result', 'government stability', 'state of government', 'forming stable government', 'getting stable government', 'coalition government'),
                 'labour' = c('corbyn', 'jeremy corbyn', 'labour party'),
                 'muslims' = c('islam', 'islamic extremism'),
                 'dup' = c('coalition with dup'),
                 'environment' = c('climate change', 'global warming'),
                 'welfare' = c('benefits'),
                 'housing' = c('housing crisis', 'affordable housing'))

# Replace groups of common synonyms with a single term
for (syn in names(syn.list)) {
  answers[answers %in% syn.list[[syn]]] <- syn
}
```

Having done that, we can get a clearer picture of the most commonly given 'most important issues':

```{r, results='asis'}
# Most common issues, following re-coding
recoded.issue.counts <- as.data.frame(table(answers))
recoded.issue.counts <- recoded.issue.counts[2:nrow(recoded.issue.counts),]
colnames(recoded.issue.counts) <- c('Issue', 'Count')
recoded.issue.counts <- recoded.issue.counts[order(recoded.issue.counts$Count, decreasing=TRUE),]
knitr::kable(recoded.issue.counts[1:10, ], row.names=FALSE)
```

Now let's focus on the top eight (because anything smaller makes the bar chart too hard to read), and match them together with the social grades of the people who gave those eight.

```{r}
# Create a new data frame giving each panelist's MII and social grade, leaving out any with no given MII or with no known social grade
df.new <- as.data.frame(cbind(as.character(df.panel$profile_socialgrade_cieW13), answers))
colnames(df.new) <- c('grade', 'issue')
df.new <- df.new[!is.na(df.new$grade) & df.new$grade != 'Unknown' & df.new$grade != 'Refused' & !is.na(df.new$issue),]

# Keep the eight most commonly chosen MIIs; re-code the rest as 'other'
sum.issue <- summary(df.new$issue)
main.issues <- names(sum.issue[names(sum.issue) != '(Other)'])[1:8]
df.new$issue <- factor(df.new$issue, levels=c(main.issues, 'other'))
df.new[!df.new$issue %in% main.issues, 2] <- 'other'
df.new <- droplevels(df.new)
```

Finally, we draw a stacked bar chart. The total height of each bar is normalised so that proportions can be compared. No further weighting of responses has been done.

```{r}
# Define colours (wouldn't be necessary if I didn't specifically want 'other' to be grey)
fill.cols <- hcl(h=seq(15, 375, length=length(levels(df.new$issue))-1), c=100, l=65)
names(fill.cols) <- levels(df.new$issue)[!levels(df.new$issue) == 'other']
fill.cols['other'] <- "grey40"

# Draw bar chart using scale_fill_manual (only necessary because of the above)
ggplot(df.new) + geom_bar(aes(x=grade, fill=issue), position='fill') + scale_fill_manual(values=fill.cols) + theme_minimal() + ylab(label='') + xlab(label='') + ggtitle('BES wave 13: most important issues')
```

Brexit comes top in every group, but as we move across the social grades from A to D, the proportion of respondents giving it as the most important issue declines and the proportion of respondents giving terrorism or the NHS as the most important issue increases; social grade E continues this trend with regard to the NHS, but reverses it with regard to terrorism. More respondents in 'working class' social grades (C2DE) than in 'middle class' social grades (ABC1) give immigration as the most important issue, while fewer go for inequality (which in the recoding scheme used here includes poverty and homelessness) or the economy (which in the recoding scheme used here includes employment, unemployment, cost of living, and wages).

More respondents in the lowest social grade (E) gave 'other' responses, which makes comparisons more difficult, especially as there were fewer respondents in grades D and E than in any of the other grades. Total numbers in each social grade for this wave who answered the 'most important issue' question are as follows:

```{r}
summary(df.new$grade)
```

