Shut up and Hack, 25 March

This week there was definitely hacking, and more than a bit of yacking. New people and regulars, were in part of the crowd.

Playing with words

A few weeks ago, there was a presentation on the data that has been made available by the Australian Charities and Not-for-profit Commission (ACNC Registered Charities, ACNC Annual Information Statement 2013). They have released the data that has been reported by more than 40,000 registered organisations. This is data set contains a mix of data types – text, numbers, booleans, and not all records are complete or clean.

The text components of this ACNC data set were an opportunity to try out an online word cloud generator, Tagxedo. This is similar to Wordle, with more controls to use on the text data.

Before doing anything with text, especially free text, it needs to be cleaned. Cleaning text involves removing common words (stop words) and grouping words with common stems (stemming and lemmatisation) – and take up a lot of time with not much to show for it. Stopping, stemming and lemmatisation are all component of natural language processing algorithms. This is a whole area of study within many universities.

What is a word cloud and why use one?

A word cloud is a visual representation of the text data – the text is broken into individual words, the frequency of each word is displayed in different colours and/or sizes.

The display of any analysis results is usually a table and a chart. Open any annual report, white paper, government report, research paper, article, or thesis – tables and charts galore. A word cloud will add interest to an otherwise boring report – particularly when it is difficult to add a photo or image. It will stand out from other assignments, other reports, other presentations.

What’s in a name?

Below is a word cloud analysis of the names of the charities, created using Tagxedo. This shows the most prevalent words that are in the names of the charities. Notice that the words appear in different sizes, this indicates how often the word appears – more frequently appearing words are larger.

charity_names_wordcloudThe stopping and stemming process was performed by Tagxedo. The word list was cleaned further by removing some additional words like, ‘St’, which would not add value. The number of words to appear was restricted to the 100. This was easily configured in the word | layout options menus.

The name of charities can also include a description of what they are or do. The word cloud shows that a large number of schools and churches are charities; different Christian denominations can be seen. Foundations, associations, trustees and trust also feature amongst the most frequent.

How much do charities and not-for-profits care?

This next word cloud was generated from the activity description information provided.

howcharitiespursuedtheiractivities_wordcloud_2This word cloud show the top 200 words used by charities and not-for-profits to describe how they pursued their main activities. After Tagxedo removed stop-words, additional words were removed, (e.g. ‘including’).

So, how much do charities and not-for-profits care? Not as much about activities, services, community, education, people, meetings and support.

Resources

https://data.gov.au/dataset/acnc-register

http://www.tagxedo.com

http://www.wordle.net/

Sally Pryor

@pryor365

Leave a Reply

Your email address will not be published. Required fields are marked *