Wednesday, May 4, 2011

Was the CrossFit Open fair?

If you cruise across many CrossFit webpages, it’s pretty easy to find people grumbling over the Open workouts.  Some decried at the number of body weight exercises – the double unders of 11.1,  the box jumps of 11.2, and the burpees at 11.4, while others scowled (including myself) at the squat clean and jerk weight of 11.3.  The fact is, designing the Open is probably more difficult than most people are willing to give credit for.  Sure, the act of thinking of different workouts isn’t especially hard, we’ve all done it, but there are so many factors in the Open that it’s a challenge to satisfy them all.  I’m speculating a bit here, but I assume HQ had a few broad goals in mind for the Open:  (1) the workouts should separate the fit from the unfit (2) the workouts should be relatively accessible to everybody (3) there should be equal opportunity to perform well no matter your body type.   Designing the workouts to satisfy all three requirements is no easy task and requires careful thought and planning. 

As many who have read the blog, I’ve been commenting on some aspects of the third point in recent weeks.  How did the overall performance depend on the biometrics such as weight and height?  Was the Open fair in this regard?  After looking at all the data, I would argue that the answer is more ‘yes’ than ‘no’.   I’m not saying the programming was perfect, but overall the workouts seemed pretty balanced.   

Let’s look at data from the main division for men (I’ll do ladies soon enough) for competitors that completed all six workouts.   First, I take each competitor’s overall rank score and convert it into a number between zero and 100, where Dan Bailey represents 100 and some 17 year old named Thomas Thompson represents zero.  As some may recall, I called this scaled score by a different name - ‘overall percentile’.  I have stopped doing that here because it’s not quite right.  Thanks to the weird rule about ties, the overall rank scores have a skew towards better scores, so the scaled score that actually represents the 50th percentile is around 55. Also for reference, a scaled score of 30 represents the 20th percentile.  I may change the scaled score metric to percentile, but roll with me for now. 

Here I’ve presented the average scaled score (color) grouped by their height (y-axis) or weight (x-axis). Let’s take an example to explain further.   If you took all the male athletes from about 175 to 180 pounds that were 6’2 to 6’3 (here labeled with a pink dot to aid the eye), you would find that the average overall scaled score was around 45-50.   

Overall open performance broken down by weight and height.  Between BMI 22.5 and BMI 30.5, performance is relatively similar -  in contrast to the individual workouts shown below.  

Let’s backup though, to define what I’m thinking about when I evaluate ‘fairness’.  In a perfect egalitarian world, we would expect to see uniform color across this plot, ie, the average performance for every given height and weight is similar.  Deviations from uniform color have increasing degrees of unequality.  Could we imagine an Open where we have uniform color?  I could definitely imagine it, but practically speaking, it may be difficult, especially keeping in mind the broad goals of the Open discussed above.

Let’s discuss the plot.  Broadly, the plot has a skewed look that roughly travels along similar body ‘thicknesses’.  To illustrate this, I have plotted iso-BMI lines plotted on the charts, denoting height and weights where BMIs are the same.  In between the two extreme BMI lines on each chart is something reassuring to see – that the performance of the average athlete is relatively similar (reds and yellows), and there is little weight bias.

For now, set aside the topic of the ideal crossfit ‘thickness’, as many have expressed interest in (I’ll address that in another post).

What’s the point here?  The overall plot shows that for a given height, there exists an optimal weight range where performance is similar for other optimal height/weight combinations.   This is huge. As we’ll see in the breakdowns of the individual workouts below, this definitely did not have to be the case.   In the overall plot, as you travel along the 26.5 iso-BMI curve, performance is relatively constant as you slide from left to right.  For many workouts, I’m thinking especially of 11.2 and 11.3, performance across a BMI curve is not equal as you shift, indicating a weight bias even when controlling for body thickness. 

Thus, either by accident or good design, HQ managed to design a program where many body thicknesses performed on par with one another.   I think they should be applauded for that.

Be clear that inequalities do exist.  Performance rapidly falls off at BMIs below 22.5 and above 30.5.  Does this data imply that individuals in these low performing regions should be packing some weight gain 2000, or consider some good ol’ fen-fen?  If doing better at CrossFit is your primary goal – sure.  Perhaps though you play sports where being agile (low thickness) might be helpful or other sports (say football), where being thicker is better.  It’s up to the person for sure.   The Open data can only speak about what’s optimal for the Open, and predictions about other activities may not hold.

Below I have included the same height/weight plots as above, for each individual workout.  I think it’s fun to take a look at them, but I won’t go completely crazy discussing them.

Individual workouts, broken down by height and weight.  They sure are pretty, no?

Couple of major points or questions:

0) On the color scale for each plot, 240(40.6) represents the raw score and the percentile score (following the Open method of breaking ties).

1) We often think of the exercises involved in order to ask which workouts are similar to others.  Do these plots graphically agree with our intuition?  For me, I guess I’m relatively surprised at how completely flipped 11.3 and 11.6 are, given the similarities of the thruster/squat-clean movements.  The central tendencies are almost completely rotated.

2)  11.3 is wild to look at.  Iso-BMI lines have equal performance until you hit a BMI of around 30.5, where more weight is better.

3)  11.4 is one of the few workouts that shows a strong height effect.  Perhaps wallballs? 

4)  11.1 is intrinsically the fairest workout, maybe because being bad at double-unders affects people of all body shapes.

6)  11.2. and 11.6 look moderately redundant, at least in terms of their graphics.


  1. Are you keeping the sample the same for all the individual WOD graphs in this post?

  2. Yup, only individuals who posted scores (and height and weight) to all six workouts were included in this analysis.

  3. Is your data available? Or where did you get it all? Did you have to grab it manually or is there a download of scores and ranks?

  4. Good stuff. Sorry, I had too much to say to fit here. In general:
    11.1-I'm surprised to see such an even distribution. But I assume that's because the double unders were so heavily weighted in this one rep-wise that it was the main driver. You can see from the fallout that being heavy wasn't much of an advantage (which it would have been if it was more snatch-based scoring-wise).

  5. 11.2-favored the short and light as expected. The weights were too light to favor a larger individual with 2 other bodyweight movements in there.

    11.3-We knew this would favor the heavier and shorter individual. They had more muscle to lift with and had to move the weight less distance. The heavyweight would have probably been favored even more if it were a power clean instead of squat clean. It would have been interesting to see how the height might have changed then though.

  6. 11.4 I figured with so burpees and muscle ups it would favor the lightweights but thought the OHS might make up for it. But it looks like the scoring doesn't even go past 1 round? So really there were so few people making it past one round that it didn't get it's own breakout?

  7. 11.5 - Looks like the high number of wallballs allowed the taller folks to do well here as long as they weren't heavy enough to make the toes to bar horrible ;)

    11.6 - We've seen the same thing happen with Fran's numbers in beyong the whiteboard. When bodyweight goes up the thrusters get easier but the pullups start getting so bad that it negates the benefit to the thrusters. That was probably just compounded by having CTB pullups.

  8. Nice work. Would love to see you do similar statistics for the Regionals and the Finals.

  9. Good graphic analysis. However, your third basic assumption in the first paragraph is completely incorrect, leaving much of your analysis off-base.

  10. I hope to see more analysis in the future.