Update: Part 2 of this article can be found here.
Using spreadsheets to analyze numerical or well-categorized data is relatively straightforward. It might not be easy necessarily, but at least you normally know exactly what to do. If you have ever been faced with open-ended text responses, perhaps from a survey, emailed questions or feedback forms, you know how tricky it can be to make sense of it.
The problems are many. Non-standard formatting, having to manually read each response to understand its content, variable length, and those are just the first that come to mind.
What we need is some way to drill down automatically to see if there are any common patterns, and therefore have an immediate starting point to start interpreting the responses.
A Potential Solution
Say we had a spreadsheet containing all the website feedback from the last few weeks.
What if we could count occurrences of certain words or phrases? We could then see if there were issues most prevalent on the minds of our customers.
First of all we need to make a list of all the words contained in the feedback. We can do this using a hash table or “Dictionary”. This is like an array, but instead of the items in the array being held against a number they instead are indexed using a keyword. Using this system we can store a word, such as the word “Data”, and then count how many times this word occurs.
By “Splitting” the text in the cells wherever a space is found we can crudely grab all the words used. Obviously this is not a pure approach; there will be badly formatted responses or other delimiters that would need to be cleaned. For our purposes though it will work well enough.
First Results
As you can see, while this approach is encouraging in that it worked as intended, the results are not yet useful. We need to exclude superfluous words and get to the “meat”. The traditional approach to this kind of indexing issue is to compile a “stop list” of words and phrases that can be safely ignored.
Armed with this list we ought to be able to now generate a clean set of data from which we can begin to draw conclusions.
Full Solution
So we have learned that for a full solution we need to create a macro that goes over the response text in our first data sheet and it should create a list of words, taking care to ignore words we determine to be irrelevant, listed in a “Stop List” sheet. Then for each word our macro needs to count how many times each word appears. This word list and associated word count should appear in another sheet we will call the “Issue List”. That list should be sorted in descending order so we can clearly see which issues are most important to our customers.
Here is what our data set looked like when we ran the macro against a set of feedback. We had to run it a couple of times to build up our specific stop list. I imagine this will be a common theme when you try it yourself.
Probably not surprising that in this particular result we see Data and Excel score highly, but I am intrigued by the issues around the Security responses.
The Finished Word Counting Macro
Here is the completed macro so you can use it in your own projects.
Sub WordCount() Dim vArray As Variant Dim lngLoop, lngLastRow As Long Dim rngCell, rngStoplist As Range With CreateObject("Scripting.Dictionary") 'declaring object array, Items can be any form of data, and are stored in this object array. Worksheets(1).Activate For Each rngCell In Worksheets("Sheet1").Range("A1", Cells(Rows.Count, "A").End(xlUp)) vArray = Split(rngCell.Value, " ") For lngLoop = LBound(vArray) To UBound(vArray) If Application.WorksheetFunction.CountIf(Sheets("Stoplist").Range("A1:A" & Sheets("Stoplist").UsedRange.Rows.Count), vArray(lngLoop)) > 0 Then Else If Not .exists(vArray(lngLoop)) Then 'if the word is not in the array then its true. .Add vArray(lngLoop), 1 'adds the word to object array. Else .Item(vArray(lngLoop)) = .Item(vArray(lngLoop)) + 1 'if the item or the word already exist in the array it will just increase the keys which is the number of word count. End If End If Next lngLoop Next rngCell Worksheets("Issue").Range("A2").Resize(.Count).Value = Application.Transpose(.keys) 'this line shows the retrieval of the word count stored in the array. Worksheets("Issue").Range("B2").Resize(.Count).Value = Application.Transpose(.items) 'this line shows the retrieval of the word added in the array. End With lngLastRow = Worksheets("Issue").Cells(Rows.Count, 1).End(xlUp).Row Range("A1:B" & lngLastRow & "").Select ActiveWorkbook.Worksheets("Issue").Sort.SortFields.Clear ActiveWorkbook.Worksheets("Issue").Sort.SortFields.Add Key:=Range("B2:B" & lngLastRow & ""), _ SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal With ActiveWorkbook.Worksheets("Issue").Sort .SetRange Range("A1:B" & lngLastRow & "") .Header = xlYes .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With Worksheets("Issue").Activate Cells(2, 1).Select End Sub
Summary
By no means am I saying this is a silver-bullet solution. You still must read the responses and deal with them appropriately. What you gain by doing this is an overview of the kinds of issues your customers are talking about, and perhaps a way to “score” your feedback.
Perhaps you can spot trends over time, or overlay “sentiment” by analyzing the balance of positive and negative words, and correlations between them? I wonder if it is possible to turn this data into one of those pretty keyword charts used in Web2.0 sites?
How might you use an approach like this in your work? We would love to know your ideas…
About the author
Yoav Ezer co-authors the technology and productivity blog Codswallop. He is also the CEO of a company that produces PDF to Excel conversion software.
For more Excel tips from Yoav, join him on Facebook or Twitter
Update: Part 2 of this article can be found here.
About Rich
Rich is the owner and creator of Windows Guides; he spends his time breaking things on his PC so he can write how-to guides to fix them.
- Web
- |
- |
- |
- |
- Google+
- |
- More Posts (1019)
thank you Mr. Rich for the education on using excel to analyze feedback, complains and survey. you can never fully understand what it means to people in business, particularly in Nigeria.
This indeed looks like just the information I was looking for however there are few steps which leave me lost.
1st how to do I get a list of all words used in the survey. You mentioned hash table or “Dictionary” but I am clueless as to what you mean (I did a google search and only confused me more). I suppose I will finish there and await a reply.
Riley, I’ve informed the author of your comment.
Hi Rileywashington
In the Macro, I used the following command to define a scripting object which works as a hash table:
With CreateObject(“Scripting.Dictionary”)
I then use .exist to verify if a word already exists within the dictionary and .add to add that word if it doesn’t.
It’s just quicker than using an array and scanning it over and over again to find out if a word was previously encountered.
You can check the dictionary object methods and functions here:
http://msdn.microsoft.com/en-us/library/x4k5wbx4%28VS.85%29.aspx
I hope that helps
This is a pretty nifty DYI solution for quick text analysis.
Though there’s a free add-in for excel provided by a text and sentiment analysis company called Semantria. The add-in installs directly into your Excel ribbon and lets you perform really robust text and sentiment analysis.
While the add-in is free, the service itself isn’t. But when I registered, they gave me 10 000 free credits. 1 credit lets you process one document. Pretty nifty!
Yes Semantria is a great service. A document in their terms is a row of data containing text to analyse.