Extracting social meaning and sentiment: Initial set-up

I hope the class will be dominated by hands-on exploration of sentiment data. To this end, I've compiled a bunch of datasets into CSV format and written a bunch of R code for working with them.

I'm not assuming that anyone has any experience with R. I've worked to make it so that we can all use R essentially as a visualization and analysis tool. (If you already know R, then you'll be able to extend and improve these tools.)

To this end, if you plan to have a laptop with you during classes, it would be great if you followed the instructions below for getting R and getting the data and code for the first day. It will definitely be more fun if you can hack around in class. (If you won't have a laptop, don't worry — the class should still be rewarding, and the code and data will be available at this site for the long-term.)

Download and install R, which is typically very easy: http://www.r-project.org/
Make sure R is working: open R and paste in the following code. If all is going well, you should get a plot window with a pretty arc for `awesome'.
1. ratings = seq(1,10)
2. counts = c(1324, 604, 783, 881, 1404, 2031, 3800, 6468, 7484, 21142)
3. totals = c(25395214, 11755132, 13995838, 14963866, 20390515,27420036, 40192077, 48723444, 40277743, 73948447)
4. relfreq = counts/totals
5. plot(ratings, relfreq, main="Relative frequency of 'awesome' in IMDB")
Install the following packages. To do this, use the Packages & Data > Package Installer window. Be sure to click Install dependencies when you install packages.
- binom
- Hmisc
- plyr
- tsne
- lme4
- plotrix
Download the course data-and-code distribution: nasslli2012-sentiment-datacode.zip
Unpack the above archive. You might designate a folder for the code and data for this course — there will be other things to add as we go.
Back in R, use the Misc > Change Working Directory option to move R to the folder where you unpacked the files.
Then paste in the following code. You should get a nice plot window. If not, it would be great if you copied out whatever error message you got and sent it to me via email. (Note: you need to have installed at least binom and Hmisc at this stage.)
1. Load the data files; these are large and thus might be slow to load:
2. ep = read.csv('ep3-context.csv')
3. eptok = read.csv('ep3-context-tokencounts.csv')
4. Load the code:
5. source('ep.R')
6. Try out a visualization:
7. epPlot(ep, eptok, 'awesome', probs=TRUE)
8. Perhaps try out some other adjectives in place of 'awesome' ...