Chapter 1 contains a nice example of the importance of checking for group balance after randomization (test for “happy” randomization).

Specifically the study groups kids into 20 different block groups, and it randomizes blocks, not kids.

In the study they have collected academic performance scores for all of the kids and they have aggregated the scores by block groups. Instead of randomly assigning each block group to one of the four study groups (T1, T2, T3, T4) they assign them in tranches.

The start with the top 4 blocks and assign them each to one study group, then work with the group of the 4 next highest performers and randomly assign them next, etc.

Why not just randomly assign all of the blocks to the 4 study groups and not worry about tranches? Since we are using randomization for assignment it should work out, right?

In this case since we have small sample sizes - only 5 blocks per group - there is a high likelyhood that pure randomization will fail to produce balanced groups. This can be shown through a simulation of random assignment. Note that in the actual study they DO achieve group balance using the tranches method.

The take-away is that randomization is not a silver bullet that solves all problems. Depending upon the group structure and sample sizes it is not guaranteed that random assignment will produce group balance. The test for group equivalence can be used to ensure randomization was “happy” or that manual group construction approaches like matching have worked as expected.

The lab this week walks you through the process of testing for study group equivalency.

# group academic performance prior to the program start
# (groups are NOT balanced) 
  
study.group   ave.y
group1          51
group2          47
group3          71
group4          33
# assign 5 students to each block,
# assign 5 blocks to each of 4 study groups

# study population of 100 students with pre-study
# academic performance measured in percentiles:
y <- 1:100  
     
# 5 students in each block
blocks <- rep( LETTERS[1:20], each=5 )
 
# randomize order of blocks then
# assign 5 blocks to each group:

x <- sample( blocks, 20 )
study.group <- NULL
study.group[ blocks %in% x[1:5] ]   <- "group1"
study.group[ blocks %in% x[6:10] ]  <- "group2"
study.group[ blocks %in% x[11:15] ] <- "group3"
study.group[ blocks %in% x[16:20] ] <- "group4"

# preview data 
d <- data.frame( block, y, study.group )
head( d, 20 )
   block  y study.group
1      A  1      group4
2      A  2      group4
3      A  3      group4
4      A  4      group4
5      A  5      group4
6      B  6      group3
7      B  7      group3
8      B  8      group3
9      B  9      group3
10     B 10      group3
11     C 11      group4
12     C 12      group4
13     C 13      group4
14     C 14      group4
15     C 15      group4
16     D 16      group2
17     D 17      group2
18     D 18      group2
19     D 19      group2
20     D 20      group2
 
d %>% group_by( study.group ) %>% summarize( ave.y=mean(y) )
# study.group   ave.y
# group1         51
# group2         47
# group3         71
# group4         33