Thursday 19 June 2014

How to make a word cloud using R

Recently I have been using R for some basic data visualisations, outputs like word clouds and heat maps. I don't have a programming background so upon first look the R command line based environment can seem a little daunting. However, the ease at which I have been able to create some pretty amazing outputs with very little code has surprised me. In this blog I will attempt to share the steps in a simple process as well as the small amount of code that is needed.

1. RStudio + Packages

First of all, you will need to install RStudio (available across platforms). The program gives the user a nice interface to operate within. The code can be typed in the window to the top left of the program, useful particularly if you want to save your code as a script. The code can be sent to the command line from there, or you can simply start typing the code into the Console.


2. Load the text 

This is the point where you load the text with which you would like to create your word cloud with. For this example I am using JFK's 'We choose to go to the Moon' speech. 


3. Format and clean the text 

These commands will remove various things like punctuations and english words you aren't particularly interested in for the cloud like conjunctions. Additionally it will format the case of the text, I am going to go with lowercase, however you can run various combinations of these arguments including arguments not listed here.


4. Word Cloud Time!

Time to produce a word cloud, run the following command and watch RStudio populate the 'Plots' window to the right of the console.

5. The Result


VoilĂ ! Where there is a 'will' there is a way. It's hard to imagine a current leader of a western country announcing such a bold and expensive policy now - despite the world increasing its wealth substantially in the last  half century of rapid development and therefore being even more capable than it was in the 60s. 

Back to technical aspects, for further information and optional arguments you can use to customise the word cloud please see the following:

  • tm - the text mining package
  • wordcloud - the cloud generator, also does commonality clouds e.g. compare two political speeches for common themes
  • SnowballC - multi-language stemming algorithm package

The full script and output can be found at my github repo. I learnt to do this from a fair amount of Googling, however the most helpful blog I came across was Georeferenced - so credit where credit is due. I learnt this method due to Wordle being blocked from my workplace, so of course use the website if you want to keep your hands clean!

Friday 2 May 2014

The National Commission of Audit

I have feared the release of this report for quite some time. However, this post is not about lambasting the report. Instead I am looking at the 5% of content that would be good for the country and should be considered by both sides of the political divide. If it were up to me, I would enjoy a good bon fire with 95% of the 1200 page report - as mostly it is ideologically looking for cuts in government spending and operations that favour the monopolistic business community and the tea-party wannabes.

The sliver of good...


Taking on the Pharmacies Guild - A highly protected industry whose market structure keeps medicine prices higher than they need to be.

Including the home in the pension means test - Does anyone think that people who own multi-million dollar properties should be receiving government payments? Really?

Family Tax Benefit (FTB) - This is poor policy, why? If you want to deliver a payment to Australian families deliver it, don't obfuscate the qualification to receive that payment through tax returns. The current FTB allows for many people to either falsely or mistakenly claim the payment. The inefficient and costly onus then falls on the Tax Office to catch the false claims. A better system is to provide payments through the existing payment arm of Government, Centrelink, and put the onus on claimants to prove before receiving the payment that they qualify to receive it. Ex ante, not ex post. Additionally, regularity of the payment could easily be changed through current operations of Centrelink. If the payment was delivered throughout the year rather than through a lump sum, it would likely improve the utility of the funds for families that struggle on a week to week basis.

Paid Parental Leave (PPL) has been reduced from the inequality reinforcing $150k - Quite simply, why should those earning $150k receive a government payment to replace their wage of $150k? I think the government could find other more productive and equitable ways to that money. The reduction is a good thing.

Raising the preservation and pension age - Life expectancies have been going up but the age at which you stop working has not. It is a shame they exclude the baby booms from this policy recommendation. Of course, for this to be equitable, we do need to consider exceptions. Should ATSI have a different age considering their lower life expectancy? Should those in hard manual jobs be expected to continue to such an age if they are unable to find less physically demanding work?

These are the headline highlights of the report. There may be more sound recommendations, but without reading the 1200 page report I will stop here.

Political Strategy of the Government

Progressives will flinch at reading this, but the the Coalition Government is operating exceptionally well on the political strategy front. They didn't outline their harsh plans to the electorate as this would damage their chances of winning. Instead, after winning government they have set-up a smoke screen, 'The National Commission of Audit', that provides credence to their deeply held agenda and does the talking for them. The Coalition have set the terms upon which the battles will occur.

Labor needs to start forging its own narrative of how it wants to shape the country. Wealth and income inequality are major issues, not just because of the empirical evidence that shows the socially damaging impacts, but because it stifles productive economic investment into items Australia really needs and that Labor should stand for - think for example a proper National Broadband Network or High Speed Rail infrastructure. Perhaps the Labor party could learn from the Coalition's strategy. Set up a 'National Commission of Inequality of Income and Wealth', don't be shy with the terms of reference, let it go full bore. Stack it with academics like Prof. Frank Stilwell or Prof. Bill Mitchell, put a former MP on there that is genuinely pro-worker like Greg Combet, homelessness advocates, environmental experts etc. Then make the Commission present their findings to the public with an announcement prior to the budget. Lay the groundwork for a progressive budget. We'll need one after the Coalition are finished.

Labor needs to think about how it can shift the terms of the political war as well as matching the Coalition's tactics.

*Read the Spirit Level for an engaging meta-study of inequality the detrimental impacts of inequality.