Extracting social meaning and sentiment: Initial set-up
I hope the class will be dominated by hands-on exploration of sentiment data. To this end, I've compiled a bunch of datasets into CSV format and written a bunch of R code for working with them.
I'm not assuming that anyone has any experience with R. I've worked to make it so that we can all use R essentially as a visualization and analysis tool. (If you already know R, then you'll be able to extend and improve these tools.)
To this end, if you plan to have a laptop with you during classes, it would be great if you followed the instructions below for getting R and getting the data and code for the first day. It will definitely be more fun if you can hack around in class. (If you won't have a laptop, don't worry — the class should still be rewarding, and the code and data will be available at this site for the long-term.)
- Download and install R, which is typically very easy: http://www.r-project.org/
- Make sure R is working: open R and paste in the following code. If all is going well, you should get a plot window with a pretty arc for `awesome'.
- ratings = seq(1,10)
- counts = c(1324, 604, 783, 881, 1404, 2031, 3800, 6468, 7484, 21142)
- totals = c(25395214, 11755132, 13995838, 14963866, 20390515,27420036,
40192077, 48723444, 40277743, 73948447)
- relfreq = counts/totals
- plot(ratings, relfreq, main="Relative frequency of 'awesome' in IMDB")
- Install the following packages. To do this, use the Packages & Data > Package Installer window. Be sure to click Install dependencies when you install packages.
- binom
- Hmisc
- plyr
- tsne
- lme4
- plotrix
- Download the course data-and-code distribution: nasslli2012-sentiment-datacode.zip
- Unpack the above archive. You might designate a folder for the code and data for this course — there will be other things to add as we go.
- Back in R, use the Misc > Change Working Directory option to move R to the folder where you unpacked the files.
- Then paste in the following code. You should get a nice plot window. If not, it would be great if you copied out whatever error message you got and sent it to me via email. (Note: you need to have installed at least binom and Hmisc at this stage.)
- Load the data files; these are large and thus might be slow to load:
- ep = read.csv('ep3-context.csv')
- eptok = read.csv('ep3-context-tokencounts.csv')
- Load the code:
- source('ep.R')
- Try out a visualization:
- epPlot(ep, eptok, 'awesome', probs=TRUE)
- Perhaps try out some other adjectives in place of 'awesome' ...