Proto-tutorials – Daniel Allington

Importing National Student Survey data with R (or: how doing things programmatically helps to avoid mistakes)

I recently started to do some work with NSS (National Student Survey) data, which are available from the HEFCE website in the form of Excel workbooks. To get the data I wanted, I started copying and pasting, but I quickly realised how hard it was going to be to be sure that I hadn’t made any mistakes. (Full disclosure: it turns out that actually I did make some mistakes, e.g. once I left out an entire row because I hadn’t noticed that it wasn’t selected.) Using a programming language such as R to create a script to import data requires much more of an investment of time upfront than diving straight in and beginning to copy and paste but the payoff is that once your script works, you can use it over and over again – which is why I now have several years’ worth of NSS data covering all courses and institutions, from which I can quite easily pull out whichever numbers I want using a dplyr filter statement (as long as I am prepared to take account of irregularities e.g. in institutions’ names from one year to the next – which would also be necessary when doing things by point-and-click).

For example, looking at how all institutions performed in my particular discipline with regard to the four NSS questions relating to teaching quality, I can see that Media Studies at the University of the West of England managed the quite remarkable feat of rising from 68th place in 2015 to 2nd place in 2016 before falling back to 53rd place in 2017. To visualise only these four questions in relation to this subject at this institution over the whole time period for which I have data, I can filter out everything relating to other disciplines and other institutions with a single statement, and then use ggplot to represent each of the four variables that I’m interested in with a different coloured line:

Student perceptions of teaching quality in Media Studies at UWE Bristol

How could such a dramatic rise and fall occur? Maybe someone who still works at UWE would be better placed to explain. But the general question of what drives student perceptions of teaching quality is one that I’m interested to explore as a researcher – and I’ll be posting thoughts and findings here as and when.

In the meantime, here’s my code, presented as an example of how the automation of error-prone tasks can take some of the uncertainty out of the research process. You probably aren’t interested in working with this particular dataset, but you may have other datasets that you would like to deal with in the same way. Yes, it looks complicated if you’re not used to scripting – but the code is actually quite simple, and the thing is that I was able to build it up iteratively, by adding statements, running the script as a whole, noticing what went wrong, and then fixing whatever it was, one step at a time. (The code is very heavily commented, to give a non-coder an idea of what those steps were and what sort of thinking is typically involved in taking a code-based rather than point-and-click-based approach to data importing etc.)

Continue reading “Importing National Student Survey data with R (or: how doing things programmatically helps to avoid mistakes)”

RStudio, Jupyter, Emacs, Vim: nothing that works properly is easy to use and nothing that is easy to use works properly

EDIT: Some of the problems described below are mitigated or resolved by not saving to a network drive. That doesn’t help with all the problems, though. RStudio no longer hangs for minutes at a time and I can now use version control, but the cursor still becomes uncontrollable in long Markdown documents. Also the university’s PCs are set up in such a way that students have to save to a network drive, which means that this is a (partial) fix for me as a researcher but not for me as a teacher.

So I am preparing to teach quantitative analysis of social media data using R, the open source language for statistical programming. I usually do anything code-related in Emacs, because I already know how to use Emacs and you can do everything code-related in Emacs and I don’t want to install and learn the quirks of loads of different IDEs. But that argument won’t make sense from the point of view of my students, firstly because they won’t need to do everything code-related, they’ll just need to create R notebooks, and secondly because they don’t already know how to use Emacs, and learning how to use Emacs is hard because Emacs is weird.

Continue reading “RStudio, Jupyter, Emacs, Vim: nothing that works properly is easy to use and nothing that is easy to use works properly”

The LaTeX fetish (Or: Don’t write in LaTeX! It’s just for typesetting)

It’s that time of year when students are signing up for study skills classes. One of the skills that science students are likely to be encouraged to develop is the use of LaTeX. Other people may come to LaTeX for other reasons: people who want to typeset their own books; people who’ve heard that LaTeX may have something to do with Digital Humanities; etc. I’ve written this essay as a sort of pre-introduction to LaTeX. It won’t teach you how to use it (I’m not qualified!), but it will try to give non-users a clear understanding of what LaTeX is really for, which may help them to make their minds up about whether the effort of learning it (not to mention simply getting it to work) is really going to be worthwhile. Why such a long essay? Because many of those who evangelise for the use of LaTeX fetishise it to the extent of spreading misinformation about its true benefits and I want to clear some of that up. Continue reading “The LaTeX fetish (Or: Don’t write in LaTeX! It’s just for typesetting)”