2015-01-28

This accompanies ScatterBoxBarPlots lecture

I'm testing out a slide-making framework R Markdown.

In case you want to see how this is done, review the file "ScatterBoxBarPlots-1-discussion.Rmd"

That Rmd file gets converted to "ScatterBoxBarPlots-1-discussion.md" and that thing in turn gets turned into HTML or PDF output

I'm not a fan of this yet, but I'm trying.

Scatterplots

Statistics in the News

  • New England Patriots accused of deflating balls so that their team is more able to handle them.

  • Warren Sharp's blog, "The New England Patriots Prevention of Fumbles is Nearly Impossible" http://www.sharpfootballanalysis.com/blog/2015

Here's a peculiar scatterplot

Fumbles

Critique that(?!)

  • The line is ridiculous

  • I'd like to see a Histogram

  • What distribution should we expect if everybody plays with the same balls?

Replies Appear!

A qq plot appears in Mattews and Lopez!

Fumbles 1

Change from fumble/play to play/fumble

Fumbles 2

And a Barplot

Fumbles 3

Cross Tabulation Tables

rockchalk development 1.8.90

library(rockchalk)
dat <- data.frame(x1 = sample(c("A", "B", "C"), size = 200, replace = TRUE),
                  y1 = sample(c("Denver", "Kansas City", "St. Louis"), size = 200, replace = TRUE))
t1 <- pctable(y1 ~ x1, data = dat)
## Count (column %)
##              x1
## y1            A         B         C         Sum
##   Denver      20(29.4%) 22(34.4%) 22(32.4%) 64 
##   Kansas City 24(35.3%) 26(40.6%) 23(33.8%) 73 
##   St. Louis   24(35.3%) 16(25%)   23(33.8%) 63 
##   Sum         68        64        68        200

That produced "defaults"

  • I (almost always) want column percents
  • Includes missings in display by default, unlike R table. To undo that, include EXCLUDE = NA as an argument.
  • This adds marginal totals, they come in handy
  • I'm printing the raw count and percent in each cell (different from my past idea)

Variations

summary(t1, rowpct = TRUE, colpct = FALSE)
## Count (row %)
##              x1
## y1            A         B         C         Sum
##   Denver      20(31.2%) 22(34.4%) 22(34.4%) 64 
##   Kansas City 24(32.9%) 26(35.6%) 23(31.5%) 73 
##   St. Louis   24(38.1%) 16(25.4%) 23(36.5%) 63 
##   Sum         68        64        68        200

Having both is admittedly frustrating

summary(t1, rowpct = TRUE, colpct = TRUE)
## Count (row %)
## column %
##              x1
## y1            A         B         C         Sum
##   Denver      20(31.2%) 22(34.4%) 22(34.4%) 64 
##               29.4%     34.4%     32.4%        
##   Kansas City 24(32.9%) 26(35.6%) 23(31.5%) 73 
##               35.3%     40.6%     33.8%        
##   St. Louis   24(38.1%) 16(25.4%) 23(36.5%) 63 
##               35.3%     25%       33.8%        
##   Sum         68        64        68        200

Features

  • rounded = TRUE is "privacy protection"

  • Arguments intended for table will be passed through

Can make fancy tables that can go into documents

  • The tables package by Duncan Murdoch provides some convenience features that we can ride.
  • For people that use LaTeX, the example demonstrates.

html output

Can make fancy tables that can go into documents

html(as.tabular(summary(t1)))
  x1
y1 A B C Sum
Denver 20(29.4%) 22(34.4%) 22(32.4%) 64
Kansas City 24(35.3%) 26(40.6%) 23(33.8%) 73
St. Louis 24(35.3%) 16(25%) 23(33.8%) 63
Sum 68 64 68 200