Thursday 19 June 2014

How to make a word cloud using R

Recently I have been using R for some basic data visualisations, outputs like word clouds and heat maps. I don't have a programming background so upon first look the R command line based environment can seem a little daunting. However, the ease at which I have been able to create some pretty amazing outputs with very little code has surprised me. In this blog I will attempt to share the steps in a simple process as well as the small amount of code that is needed.

1. RStudio + Packages

First of all, you will need to install RStudio (available across platforms). The program gives the user a nice interface to operate within. The code can be typed in the window to the top left of the program, useful particularly if you want to save your code as a script. The code can be sent to the command line from there, or you can simply start typing the code into the Console.


2. Load the text 

This is the point where you load the text with which you would like to create your word cloud with. For this example I am using JFK's 'We choose to go to the Moon' speech. 


3. Format and clean the text 

These commands will remove various things like punctuations and english words you aren't particularly interested in for the cloud like conjunctions. Additionally it will format the case of the text, I am going to go with lowercase, however you can run various combinations of these arguments including arguments not listed here.


4. Word Cloud Time!

Time to produce a word cloud, run the following command and watch RStudio populate the 'Plots' window to the right of the console.

5. The Result


VoilĂ ! Where there is a 'will' there is a way. It's hard to imagine a current leader of a western country announcing such a bold and expensive policy now - despite the world increasing its wealth substantially in the last  half century of rapid development and therefore being even more capable than it was in the 60s. 

Back to technical aspects, for further information and optional arguments you can use to customise the word cloud please see the following:

  • tm - the text mining package
  • wordcloud - the cloud generator, also does commonality clouds e.g. compare two political speeches for common themes
  • SnowballC - multi-language stemming algorithm package

The full script and output can be found at my github repo. I learnt to do this from a fair amount of Googling, however the most helpful blog I came across was Georeferenced - so credit where credit is due. I learnt this method due to Wordle being blocked from my workplace, so of course use the website if you want to keep your hands clean!