Exploring genre on SoundCloud, part I

(Cross-posted from http://www.open.ac.uk/blogs/vem/2014/06/exploring-genre-on-soundcloud-part-i/)

One of the problems you’re always going to face when studying electronic music is the need to decide what you think ‘electronic music’ means. It’s a question of genre, and as Paul DiMaggio acknowledged in one of his most influential papers, genre is at once a formal and a social concept:

Literally, a genre is a ‘kind’ or ‘type’ of art. The notion of genre presumes that some aggregation principle enables observers to sort cultural products into categories. Formalists treat genres as comprising works that share conventions of form or content… Art historians also define genres in terms of shared conventions, but focus as well on social relations among producers in identifying ‘schools’ or ‘artistic movements’… Although students of popular culture and literary theorists of the ‘reader-response’ school consider formal similarities, they acknowledge that genres are partially constituted by the audiences that support them (DiMaggio 1987, p. 441)

Of the three approaches, the formalist one is the hardest to make headway with – and electronic music is a case in point. Defining electronic music as ‘music made with electronic instruments’ might seem straightforward, for example, but it doesn’t get you very far because pop and rap are also made with electronic instruments, yet – culturally speaking – are considered to be very different things. So one of the first things the project team started to look at was what SoundCloud users do – or rather, used to do – with the ‘genre’ free text field.[1]

One of our first findings was that, while it was not uncommon for tracks to be assigned to the ‘electronic’ genre, it was far more common for them to be assigned to a genre that would generally be categorised as a form of electronic music, for example ‘techno’, ‘dubstep’, or ‘house’. The question was, how to identify the set of ‘electronic’ genres. At first, we approached the problem by looking at the genres that were most commonly used and relying on our cultural knowledge to decide whether they were likely to be electronic or not. ‘Trance’ yes, ‘classical’ no. But having so much data to work with (even in this small initial sample) raised the possibility of approaching the problem quantitatively. And that’s where things got really interesting.

It turned out that, while SoundCloud only allowed a user to assign a single genre to a track, users often gave tracks multiple genres by informal means, for example separating genre terms with slashes, backslashes, or commas (‘and’ and ‘&’ are also used in this way, but less often, and they appear within genres such as ‘drum and bass’ or ‘rock and roll’). These multi-genre strings were recognised by SoundCloud as unique genres – that is, it treated ‘pop’, ‘rap’, and ‘pop / rap’ as three different genres – as too were alternative spellings of the same genres, e.g. ‘hip hop’ and ‘hip-hop’ were different genres from SoundCloud’s point of view.

Using a short Python program, we took all the ‘genre’ strings from all the tracks that had been uploaded by an initial snowball sample of 1500 users, cut them up into smaller strings wherever the most common separators (e.g. ‘/’) were present, changed the word ‘and’ to the ampersand (so that ‘drum and bass’ and ‘drum & bass’ could be treated as a single genre), and removed spaces and hyphens (so that ‘hip hop’, ‘hip-hop’, and ‘hiphop’ all become ‘hiphop’). We then identified the resulting genre terms most commonly used by each user, and created a matrix showing how often the 50 most common genre terms in the sample as a whole appeared among the three most common genre terms used by a single individual. Here’s the part of that matrix covering the most common five overall:

house techno hiphop deephouse electronic
house 359 39 484 139
techno 359 9 141 97
hiphop 39 9 11 50
deephouse 484 141 11 42
electronic 139 97 50 42

Reading the first row shows you how many times the term ‘house’ appeared with ‘techno’, ‘hiphop’, ‘deephouse’, or ‘electronic’ among the most common three genre terms in a single SoundCloud user’s uploads. So there were for example 359 SoundCloud users in our initial sample whose top three genre terms included both ‘house’ and ‘techno’: no surprise there, because these were the two most common genre terms overall. But the co-occurrence of terms does not appear to be random, as we clearly see when we look at how the first and second most common genre terms relate to the third and fourth: it turns out that ‘house’ occurs far more frequently with ‘deephouse’ than with ‘techno’, even though that was a less common term overall, and that ‘hiphop’ was commonly used by very, very few of the people who commonly used the terms ‘techno’ or ‘house’. Instead, ‘hiphop’ co-occurred most commonly with the eighth most common genre term, ‘rap’. Altogether, there were 562 users in our sample whose most commonly used three genre terms included both ‘rap’ and ‘hiphop’: the highest rate of co-occurrence in the sample.

So already in the above, we start to see patterns in the usage of the SoundCloud ‘genre’ tag that are suggestive of associations and disassociations between musical genres: associations and disassociations that in turn hint at relationships among SoundCloud users (and, one might conjecture, among music-makers offline): DiMaggio’s genre-defining ‘social relations among producers’ (above). For example, it would seem plausible that ‘rap’ and ‘hiphop’ co-occur so frequently because rapping is such a prominent feature of hiphop music, while ‘hiphop’ and ‘techno’ co-occur so rarely because techno is a form of electronic music, which is produced by a different group of people than hip-hop. Eminem (2002) once rapped that ‘nobody listens to techno’, but it might have been more accurate to say that, generally speaking, rappers don’t listen to techno. What the above figures begin to suggest is that they don’t tend to upload it to SoundCloud either.

In part II, we’ll take a look at what happens when the co-occurrence matrix is visualised as a network.

[1] This feature was recently removed from the SoundCloud uploader.


DiMaggio, Paul (1987). ‘Classification in art’. American Sociological Review 52 (94): 440-455.
Eminem (2002). ‘Without me’. New York: Shady Records / Aftermath Entertainment.

Comments disabled here. To comment on this essay, please go to the original version.