Extracting social meaning and sentiment: Initial set-up

I hope the class will be dominated by hands-on exploration of sentiment data. To this end, I've compiled a bunch of datasets into CSV format and written a bunch of R code for working with them.

I'm not assuming that anyone has any experience with R. I've worked to make it so that we can all use R essentially as a visualization and analysis tool. (If you already know R, then you'll be able to extend and improve these tools.)

To this end, if you plan to have a laptop with you during classes, it would be great if you followed the instructions below for getting R and getting the data and code for the first day. It will definitely be more fun if you can hack around in class. (If you won't have a laptop, don't worry — the class should still be rewarding, and the code and data will be available at this site for the long-term.)

  1. Download and install R, which is typically very easy: http://www.r-project.org/
  2. Make sure R is working: open R and paste in the following code. If all is going well, you should get a plot window with a pretty arc for `awesome'.
    1. ratings = seq(1,10)
    2. counts = c(1324, 604, 783, 881, 1404, 2031, 3800, 6468, 7484, 21142)
    3. totals = c(25395214, 11755132, 13995838, 14963866, 20390515,27420036, 40192077, 48723444, 40277743, 73948447)
    4. relfreq = counts/totals
    5. plot(ratings, relfreq, main="Relative frequency of 'awesome' in IMDB")
  3. Install the following packages. To do this, use the Packages & Data > Package Installer window. Be sure to click Install dependencies when you install packages.
  4. Download the course data-and-code distribution: nasslli2012-sentiment-datacode.zip
  5. Unpack the above archive. You might designate a folder for the code and data for this course — there will be other things to add as we go.
  6. Back in R, use the Misc > Change Working Directory option to move R to the folder where you unpacked the files.
  7. Then paste in the following code. You should get a nice plot window. If not, it would be great if you copied out whatever error message you got and sent it to me via email. (Note: you need to have installed at least binom and Hmisc at this stage.)
    1. Load the data files; these are large and thus might be slow to load:
    2. ep = read.csv('ep3-context.csv')
    3. eptok = read.csv('ep3-context-tokencounts.csv')
    4. Load the code:
    5. source('ep.R')
    6. Try out a visualization:
    7. epPlot(ep, eptok, 'awesome', probs=TRUE)
    8. Perhaps try out some other adjectives in place of 'awesome' ...