Wednesday, May 4, 2011

Was the CrossFit Open fair?


If you cruise across many CrossFit webpages, it’s pretty easy to find people grumbling over the Open workouts.  Some decried at the number of body weight exercises – the double unders of 11.1,  the box jumps of 11.2, and the burpees at 11.4, while others scowled (including myself) at the squat clean and jerk weight of 11.3.  The fact is, designing the Open is probably more difficult than most people are willing to give credit for.  Sure, the act of thinking of different workouts isn’t especially hard, we’ve all done it, but there are so many factors in the Open that it’s a challenge to satisfy them all.  I’m speculating a bit here, but I assume HQ had a few broad goals in mind for the Open:  (1) the workouts should separate the fit from the unfit (2) the workouts should be relatively accessible to everybody (3) there should be equal opportunity to perform well no matter your body type.   Designing the workouts to satisfy all three requirements is no easy task and requires careful thought and planning. 

As many who have read the blog, I’ve been commenting on some aspects of the third point in recent weeks.  How did the overall performance depend on the biometrics such as weight and height?  Was the Open fair in this regard?  After looking at all the data, I would argue that the answer is more ‘yes’ than ‘no’.   I’m not saying the programming was perfect, but overall the workouts seemed pretty balanced.   

Let’s look at data from the main division for men (I’ll do ladies soon enough) for competitors that completed all six workouts.   First, I take each competitor’s overall rank score and convert it into a number between zero and 100, where Dan Bailey represents 100 and some 17 year old named Thomas Thompson represents zero.  As some may recall, I called this scaled score by a different name - ‘overall percentile’.  I have stopped doing that here because it’s not quite right.  Thanks to the weird rule about ties, the overall rank scores have a skew towards better scores, so the scaled score that actually represents the 50th percentile is around 55. Also for reference, a scaled score of 30 represents the 20th percentile.  I may change the scaled score metric to percentile, but roll with me for now. 

Here I’ve presented the average scaled score (color) grouped by their height (y-axis) or weight (x-axis). Let’s take an example to explain further.   If you took all the male athletes from about 175 to 180 pounds that were 6’2 to 6’3 (here labeled with a pink dot to aid the eye), you would find that the average overall scaled score was around 45-50.   

Overall open performance broken down by weight and height.  Between BMI 22.5 and BMI 30.5, performance is relatively similar -  in contrast to the individual workouts shown below.  

Let’s backup though, to define what I’m thinking about when I evaluate ‘fairness’.  In a perfect egalitarian world, we would expect to see uniform color across this plot, ie, the average performance for every given height and weight is similar.  Deviations from uniform color have increasing degrees of unequality.  Could we imagine an Open where we have uniform color?  I could definitely imagine it, but practically speaking, it may be difficult, especially keeping in mind the broad goals of the Open discussed above.

Let’s discuss the plot.  Broadly, the plot has a skewed look that roughly travels along similar body ‘thicknesses’.  To illustrate this, I have plotted iso-BMI lines plotted on the charts, denoting height and weights where BMIs are the same.  In between the two extreme BMI lines on each chart is something reassuring to see – that the performance of the average athlete is relatively similar (reds and yellows), and there is little weight bias.

For now, set aside the topic of the ideal crossfit ‘thickness’, as many have expressed interest in (I’ll address that in another post).

What’s the point here?  The overall plot shows that for a given height, there exists an optimal weight range where performance is similar for other optimal height/weight combinations.   This is huge. As we’ll see in the breakdowns of the individual workouts below, this definitely did not have to be the case.   In the overall plot, as you travel along the 26.5 iso-BMI curve, performance is relatively constant as you slide from left to right.  For many workouts, I’m thinking especially of 11.2 and 11.3, performance across a BMI curve is not equal as you shift, indicating a weight bias even when controlling for body thickness. 

Thus, either by accident or good design, HQ managed to design a program where many body thicknesses performed on par with one another.   I think they should be applauded for that.

Be clear that inequalities do exist.  Performance rapidly falls off at BMIs below 22.5 and above 30.5.  Does this data imply that individuals in these low performing regions should be packing some weight gain 2000, or consider some good ol’ fen-fen?  If doing better at CrossFit is your primary goal – sure.  Perhaps though you play sports where being agile (low thickness) might be helpful or other sports (say football), where being thicker is better.  It’s up to the person for sure.   The Open data can only speak about what’s optimal for the Open, and predictions about other activities may not hold.

Below I have included the same height/weight plots as above, for each individual workout.  I think it’s fun to take a look at them, but I won’t go completely crazy discussing them.

Individual workouts, broken down by height and weight.  They sure are pretty, no?

Couple of major points or questions:

0) On the color scale for each plot, 240(40.6) represents the raw score and the percentile score (following the Open method of breaking ties).

1) We often think of the exercises involved in order to ask which workouts are similar to others.  Do these plots graphically agree with our intuition?  For me, I guess I’m relatively surprised at how completely flipped 11.3 and 11.6 are, given the similarities of the thruster/squat-clean movements.  The central tendencies are almost completely rotated.

2)  11.3 is wild to look at.  Iso-BMI lines have equal performance until you hit a BMI of around 30.5, where more weight is better.

3)  11.4 is one of the few workouts that shows a strong height effect.  Perhaps wallballs? 

4)  11.1 is intrinsically the fairest workout, maybe because being bad at double-unders affects people of all body shapes.

6)  11.2. and 11.6 look moderately redundant, at least in terms of their graphics.

Saturday, April 30, 2011

Good luck in the final weekend!

Apologies for being a little quite on the blog!  Week 5 brought the superfast new leaderboard and also broke my old extraction code.  Thanks to a very generous and talented Greg Perkins, who is helping me collect the data in his spare time when he's not actively competing in the master's division.  Thanks, Greg!  

Once the final data is in, I'll continue analyzing the data and post my results here.

For those still competing, best wishes for the final weekend! 

Tuesday, April 19, 2011

Age and crossfit open performance

The age versus performance question has been a popular request among readers.  Is the relationship as strong as the weight effect?

Quick mention on the methods - I've grouped data together by two years wide, in order to make the data a little smoother.  This means the 24 and 25 year olds were lumped together, 26 and 27 year olds and so on.  Master's folks - you make an appearence in the data!   I collected the data from the first two masters divisions (since they do the same Rxed workout), and plotted them here.  At the extreme ends, data isn't plotted because there's not enough athletes.  Note:  I also apologize for the rough plots and text, I'll try to clean things up a bit better and add the percentile scores on the charts a little later.

Looking at the plots below for each workout (for men), the first striking point is how the curves are all very similar to each other, from the general shape to where the peak of performance exists.  In men, across every workout, peak performance is near the age of 24, in elites (blue) and median athletes (red).  If you look closer at some plots though, you can begin to see subtle differences between each workout.  The 'peak' of workout 4 in the elites is especially prominent.  I surmise the muscle up becomes increasingly more difficult with age than compared to other exercises.  Does that suggest power (heavily required for the muscle up), over strength, is the first aspect of fitness that we lose as we age?  Or is the data explained by some other reason?

Male performances for CrossFit open workouts 11.1-4 across different ages.  Elites (blue), medians (red).  Singing:  "... if I could turn back time...".  Okay, really lame out of place Cher reference.  Sorry.

In contrast,  overall the plots for female athletes seem less dramatic, or less 'peaky' for some reason.  Looking back at workout 4, the elite females are fairly flat across age, maybe because the muscle up was so difficult, that only the very best performers could muster the muscle up.  Even with that workout as an exception, the curves have a much flatter appearance.  I wonder if the physiological effects of age on women are somewhat dampened compared to men.  Hey, who's to say which sex ages more gracefully?  


Are things less 'peaky' over here?


On a last point, I'd like to address an issue that I haven't done a good job of in previous posts  - that a vast majority of performance cannot be explained by all the biometrics - height, weight, age... etc.  My suspicion is that even if I had more biologic measurements (leg/torso ratio, arm length, noggin size to toss out a couple of ridiculous possibilities), we could not explain all the variance, mainly because the main factor determining differences in athletes is pretty simple - fitness.

The performance plot below serves to demonstrate.  Here, I've selected upon the most common male athlete in the open (175 pounds, 5'10", and 28 years old) and plotted the distribution of their overall rank percentiles.  Notice there is still a large range of possible scores!

The bigger picture of crossfit shouldn't be forgotten among the comparison charts.  Get out there and improve yourself!  Stop reading this nerdy blog... okay okay... continue to read this blog.   Get stronger, work on lifts, feel good that you're accomplishing something you couldn't do before!  For the most people out there, certainly myself included, the downward slope on the age shouldn't be scary -  we still have a lot of upward potential.



Biometrics can't explain everything. There's still huge variation in performance (that is, fitness) among male athletes around 180 pounds, 5'10", aged 28. 

Saturday, April 16, 2011

What's a muscle up worth?

Not much, compared to the last OHS!

Thanks to BJ's observation below (see comments), the analysis of the original post is pretty much wrong.  Even though the muscle up should have been worth quite a bit (considering the percentile chart below), HQ is awarding the highest ranking points to tied individuals instead of the lowest.  So if you score a 90 (ie where the big vertical line is in both men and women on the plot below), you are awarded the percentile score at the top of the vertical line.  I had incorrectly assumed that the rank score for 90 would be at the bottom of the vertical line.

The corollary of this: the last OHS is one of the most important reps you can complete!

Thanks again, BJ, for pointing this out.  I have left the evidence of the previous post so people can make fun of me.


The muscle-up is the most valuable rep in the open thus far, gaining 13 percentile pts on the men's side, and a staggering 20 percentile pts on the women's side.  To put this in absolute terms, if all the competitors from week 3 continue through week 4, a male competitor completing a muscle up will beat an additional ~1150 competitors, while on the women's side the same will beat another ~950 competitors.

The first muscle-up is the single most valuable rep in the open thus far. The way HQ is scoring the open, the last OHS is one of the most valauble reps.  In fact, if you replace everything that I previously said about "the first muscle up" and replace it with "the last OHS" the post would be all good.


Since the percentile score at the point of the muscle-up starts around the 70th percentile for women, this presents a critical strategy for the gals. A single muscle up may mean the different between getting to regionals and not (also see chart for percentile scores of the top 100 athletes in each region).  If I were a combination of things:  (1) woman (2) on the edge of making regionals (3) able to get through the first two phases of the workout (all things I am not even remotely close to), I would do my very best to rest all muscles after the OHS and give the muscle-up the best shot in the last 20 seconds of the workout.  The second muscle up has little value compared to the first, and it doesn't make any sense to possibly expend energy if you have a few minutes left.


For the men, the same things hold, but the end result has far fewer implications since the muscle-up starts around the 42nd percentile.  The muscle-up would clearly be for pure glory!

Also, by popular request I have started to look into the age vs performance question, and yikes... the plots are pretty grim, especially on the men's side.  Stay tuned...


This post discusses the fourth 2011 crossfit open workout, described here.

Wednesday, April 13, 2011

Week 3 Dropout Statistics

Thanks to everybody who's commented so far on the blog!  There are some open questions to be decided for sure, in particular the height relationship on performance.  Rest assured it's being looked and will be presented in the future.

Erg.  I wanted to be able to cite specific numbers regarding the dropouts from week 2 to week 3.  I have learned, however, that HQ isn't showing a perfect leaderboard each week, which probably doesn't surprise a few folks given the comments I've seen on some of the pages.  For example, when I gathered the week 3 data for the men, there were a few hundred names that were not present in the week 2 data set.  I presume these folks actually had scores for week 2, but for whatever reason were not posted on the leaderboard at the end of week 2.  That alone would be fine, but in general, I think it means I can't totally trust that week 3's leaderboard contains all the continuing athletes.  Thus, I think the histograms below might be off by a few percent.

Overall the dropout percentage was similar across M/F boundaries - 23% (Men), and 24% (Women).  I estimate that the true number is +/- a few percent at most.

The first plot below shows, for each 5 pound female weight class, what percentage of athletes dropped in week 3.  Remember, this plot does not represent all athletes, just athletes that bothered to fill out their weight information.  While the absolute numbers might be slightly off, my prediction from earlier seems to have held.  Almost 50% of athletes under 110 pounds did not complete week 3.  Cuts were felt all around though, and even the lowest drop percentage was still around 15%.

From a performance standpoint, how did these athletes do?  This next plot shows the scatter of week 2 versus week 1 scores.  Blue dots represent athletes who finished week 3, and red dots are athletes who dropped (wk3 score = 1, or nothing).  A quick inspection reveals many of the drops occuring in the lower scores (lower left), but a surprising number are in the middle.

Week 3 heavily pruned female athletes that were light (left), and who received lower scores in weeks 1 and 2 (right).  Note: I am not  confident that the red dots in the highest performing areas (upper right) of the scatter chart are real dropouts.  They might be absent from the leaderboard, as discussed above.

The same plots for the men are less dramatic.  Interestingly, while the dropout percentage has a strong trend downward, the drops by wk1 and wk2 performance seem more scattered.  I can only conclude that some drops might result from people just not having the time to do the workout as directed, rather than some limitation in performance.

Sadly, I couldn't muster one rep for week3's WOD.   Our workout group had just started power cleans a month ago, and I had recently managed to clean my weight (~140).  In a fairly dumb move, I attempted a squat clean at 165 and not only managed to fail miserably, but also managed to sprain my wrist. 


Dropouts followed a similar weight trend as females (left), but surprisingly their performances seemed to be fairly uniform (right), except maybe in the very elite categories.  Ignore the big bar at 275, it's from low number stats.

Tuesday, April 12, 2011

Top 100 regional athletes through week 3

I thought I'd generate a plot so folks can easily compare regions.  One thing HQ did was generate an overall rank and rank score on the leaderboard.   I was going to do that, but they saved me a few bits of code.  For the following plot, the y-axis needs some explanation.   Someone correct me if I'm wrong, but HQ is determining the overall rank for each athlete by summing that athlete's rank for each specific WOD and using that as a 'total' score.  This actually makes perfect sense to me, but the actual number doesn't make sense to a whole lot of people.  I've normalized this score to a percentile, where your total score is converted to a number between 0 and 100.

If you were a superstar and finished first in all three WODs for a total rank score of '3', you'd have a percentile score of 100.   In contrast, if your total rank score is the highest (lowest performing) for all the athletes left, you'd have a percentile score of 0.  In theory, I think if you scored a consistent 80th percentile for all 3 exercises so far, your total rank percentile should be around 80 on my plot.  I should double check this.


Alright, I'll say it.  Adjusted for the number of competitors, it looks like Southern California has the fittest athletes.
I had previously assumed that in order to get into regionals, one had to shoot for the top ten percent in each exercise.   I would say it's more the exception than the rule, and (not hoping to start any flame wars) it depends quite a bit on each region.

Friday, April 8, 2011

Week 3 - destroyer of lightweights

We all knew week 3 was a big boy's (or girls) workout.  One big difference though from previous weeks - some athletes are finding it near impossible to do the workout at all!  This is most dramatic with the female athletes.  Below is the distribution of weights from athletes who completed week 2, versus those who have submitted scores for week 3.

Is this shift temporary, or a permanant result of week three's workout?

The bar height is the fraction of total female athletes in any particular 5 pound interval.  The green distribution represents a snapshot of the athletes after week 2, while the red distribution represents what it looks like right now.  If workout 3 was doable for everyone, we would expect both distributions to look fairly similar.   What we actually see, is a shift to the right, marked by areas where you see green peaking over the red in the lighter weight classes.   Keep in mind the height of the green bar relative to the red bars will tell you how much of the weight class is growing or shrinking.

There's two possible explanations for what we see now.   The lightweights are waiting to submit their scores, or there's a small fraction of athletes that can't do the workout and won't end up submitting.  We'll see after the end of the week, but if trends continue, we could see near half the female athletes weighing 110 and under registering a zero score.

 Is this fair?  For example, can there be workouts where it can be near impossible to complete if you're too heavy?  I've got one for for next year - 36" box jumps anyone?


Wednesday, April 6, 2011

Women's body weight performances wk1 and wk2

Considering that a third of the athletes still in the competition are women, I feel pretty bad about leaving them out so far.  Let's check in.

First, some caveats about the data.  One, while the number of female athletes right now are large (>6000), ahem, for whatever reason there's a big difference between the men and women in filling out the optional biologic data.  ~71% of men filled out weight and height information, while only ~47% of women did.  Consequently, the data for the women's weight class plots is a bit sparse. There's still some interesting information, though, so let's get to it!


First, in week 1:  I think the individual variation in performance in wk 1 was large enough to dominate any strong body weight relation.   You might say that the 100 and 105 pounders were crushed, and there may be some truth to that, but I'm not greatly confident here because the numbers are pretty sparse.  I could generally see how it makes sense though.  The power snatch Rx weight for Wk1 was 55 lbs, which might have been difficult to sustain for the lighter ladies.



For week 2 -  Interestingly, average workout 2 performance had a  gradual downward trajectory from left to right, which suggests to me that the box jumps/push ups were, compared to the boys, more difficult than the deadlift part of the workout.   I'm not too sure what to make of the elites, which were up and down in seemingly no real pattern.

Overall, through wk 2 I'd say the open has been fair throughout the weight classes on the women's side.  Now through week 3, I'd guess the circumstances are about to change drastically!

Tuesday, April 5, 2011

An ideal CrossFit weight (for men)?

Ooooo, good stuff today.   All the athletes have submitted scores for week 2, which means tons more data.  Logistically, I should be able to make plots a little quicker because I shouldn't have to reacquire all the personal data (weight, height, region) from everybody again.

The crossfit open doesn't use weight classes, and the argument put forth has been something along the lines of, "The workouts have all been balanced so heavy weight (easier for heavy people) are balanced with body weight exercises (easier for lighter folks).  Is that true?   Let's look at some data.

The blue line represents the 'elite' or the top 10% of athletes for a given weight (5 lb increments).  The red line is the overall average athlete for a given weight.


Plotted is the performance of athletes across weights for workouts 1 and 2.  To explain further - I have grouped athletes according to their weight (in 5 lb increments), and plotted the mean performance of those athletes.  'Average' athletes in a given weight are plotted in red, and 'elite' althletes, or the top ten percent of athletes in a given weight, are plotted in blue.  For reference, I have also included the cumulative percentile score chart on the right, since that information isn't immediately obvious on the open website.  If you wanted to quickly see where a score of 200 ranked on workout 1, you would go directly up from the bottom axis at 200 until it crossed the 'S' shaped curve.  From that point, going directly left to the axis will give the percentile score, which in this case is around the 30th percentile.

For workout 1, the curves go up from a weight of 140, peak at 185, then head downward.  Does this make a huge difference?  I think for average athletes the answer is yes.  An 'average' 185 open athlete bested 20 percent more people than his 'average' comrade weighing in at 230 lbs, translating to an overall rank difference of over 2000 people!

What about for elite athletes?  The mean percentile scores for 140, 185, and 230 pound elites were all in the top 10% of total scores, so I think weight wasn't too much of an issue for those folks.  The overall pattern is the same though.

As an aside, why should we care about the 'elite' catagory of athlete?   Given that there's 17 total regions, and 50 athletes that move onto the regionals (total 850), I would argue that the top 10% of athletes (~1100) should stand a reasonable chance at moving on to the next round of competition.  Understanding patterns in these athletes should hopefully tell us in the future what's important for the open.


It's good be a 180 pounder so far!


The circumstances change a bit when you look at workout 2.  Everybody, say over 220 pounds, should have screamed a collective Bender favorite, "We're boned!"   Yikes the heavier people were punished on this exercise,  no matter if you were an average athlete or an elite one.   In the elite category, the difference between an average athlete at 180 lbs and one at 230 was a crazy 19 percentile points.  Another stat, no person weighing above 220 lbs cracked the top 5% of scores.   I don't know how that will translate into a real effect at the end of the open, but if this trend continues I wouldn't be surprised if there weren't many 220 pounders that make it to Regionals.

So to answer the question, in there an ideal CrossFit weight for men?  So far in the open, the athletes around 180 pounds have had it pretty good.  There's still 4 workouts to go, so circumstances could change...

Analysis for the women will hopefully come next!

Friday, April 1, 2011

April 1st - First analysis

So one might ask, how does my week 1 performance predict my week 2 performance?   From a correlation standpoint it's actually pretty good, as one would expect.  The elites are still elites and less elite (such as myself) are still not elite.  If you haven't done wk2's exercise quite yet, you can use the plot to plan some sort of pace.  Of course you should try as hard as you can, but the plot might allow you to set some goals from the outset.


Correlation is around 0.7, for people wanting to know.
Part of doing this analysis was wondering if my meager weight might have something to do with my pitful performance in wk1.  While we can't all be Chris Spealler, I figured being a buck40 didn't do me any favors.  Not so.  There's almost zero relationship between weight and wk1 performance.  What correlation ther is most affected much heavier competitors.  Probably those double unders.

Correlation: -0.05

The weight issue, though, in week 2 becomes interesting.  While the data is all over the place, it looks like performance increases from low to about 180, but performance takes a hit after that.  Maybe the box jumps get harder for heavier folks?

Because of the larger penalty as things get heavier, correlation is -0.2.  Clearly the relationship is more complicated if you look at the plot though.



Thursday, March 31, 2011

I am a xfit nerd

The CrossFit Open is awesome.  Most crossfitters would claim than the open is foremost an international fitness competition where everybody with some basic equipment and a video camera can compete for the chance to be a world champion.  Perhaps secondly, and realistically for most people, the open gives a chance for people who've become fit to compare themselves to other folks working on their own fitness.  And we shouldn't forget about the exposure the open gives to CrossFit.  If you're like me, you at least told your closest circle of coworkers and frends what you've been up to in the last few weeks.

For me, the Open also represent a fantastic opportunity to collect a great data set.  It's not only the numbers - something close to 13500 participants for week 1, but the type of data and the quality of it.   Let me try and explain further - quite rarely can you get large fitness data on 'normal' individuals, across may different fitness modalities, and also expect it to be accurate and reflective of a person's true fitness. 

I say 'normal' here, in the sense that most people participating in the Open have other lives and don't do this in a professional sense, or for a living.  Clearly the average Open participant has above average fitness relative to the general population.  I think it's possible, though, with a little training for most healthy individuals to at least perform close to the average Open performace level.

It's rare to get information on different fitness modalities for one individual.   The Open measures performance across broad categories - endurance, strength, power, coordination, just as we would expect from the Open.  While we have enormous population data for run times like the mile, 5K, etc..., and probably equal amount of data for certain track and field events, usually individuals only train for one particular event and cannot be compared directly with other events.  The closest thing in track and field that resembles the Open is the decathalon, which is hardly accessible to the average individual.


Hypothetically, in order to get a dataset like the Open, one would have to ask 10000 volunteers to perform a set of defined exercises to their best ability, and do so in the exact same same manner.  Sound familiar?  Actually the only thing that comes to mind is the ol' Presidential Fitness Award test back in school.   Back a couple decades ago, the US measured and published the performance of children across many different ages on some basic exercises like the mile run, pullups, situps, shuttle run, and sit-and-reach.   What about WODclub.com you ask?  Their website is great and they do have statistics on some of the more popular workouts, but there's no oversight to how people do exercises and whether they are bumping their scores a bit.  The Open workouts are meticulously described, and submission requires video evidence or submission through an authorized CrossFit affiliate. 

Well, that's the idea why the data is pretty cool.  Some graphs to follow soon.