Saturday, April 30, 2011

Good luck in the final weekend!

Apologies for being a little quite on the blog!  Week 5 brought the superfast new leaderboard and also broke my old extraction code.  Thanks to a very generous and talented Greg Perkins, who is helping me collect the data in his spare time when he's not actively competing in the master's division.  Thanks, Greg!  

Once the final data is in, I'll continue analyzing the data and post my results here.

For those still competing, best wishes for the final weekend! 

Tuesday, April 19, 2011

Age and crossfit open performance

The age versus performance question has been a popular request among readers.  Is the relationship as strong as the weight effect?

Quick mention on the methods - I've grouped data together by two years wide, in order to make the data a little smoother.  This means the 24 and 25 year olds were lumped together, 26 and 27 year olds and so on.  Master's folks - you make an appearence in the data!   I collected the data from the first two masters divisions (since they do the same Rxed workout), and plotted them here.  At the extreme ends, data isn't plotted because there's not enough athletes.  Note:  I also apologize for the rough plots and text, I'll try to clean things up a bit better and add the percentile scores on the charts a little later.

Looking at the plots below for each workout (for men), the first striking point is how the curves are all very similar to each other, from the general shape to where the peak of performance exists.  In men, across every workout, peak performance is near the age of 24, in elites (blue) and median athletes (red).  If you look closer at some plots though, you can begin to see subtle differences between each workout.  The 'peak' of workout 4 in the elites is especially prominent.  I surmise the muscle up becomes increasingly more difficult with age than compared to other exercises.  Does that suggest power (heavily required for the muscle up), over strength, is the first aspect of fitness that we lose as we age?  Or is the data explained by some other reason?

Male performances for CrossFit open workouts 11.1-4 across different ages.  Elites (blue), medians (red).  Singing:  "... if I could turn back time...".  Okay, really lame out of place Cher reference.  Sorry.

In contrast,  overall the plots for female athletes seem less dramatic, or less 'peaky' for some reason.  Looking back at workout 4, the elite females are fairly flat across age, maybe because the muscle up was so difficult, that only the very best performers could muster the muscle up.  Even with that workout as an exception, the curves have a much flatter appearance.  I wonder if the physiological effects of age on women are somewhat dampened compared to men.  Hey, who's to say which sex ages more gracefully?  

Are things less 'peaky' over here?

On a last point, I'd like to address an issue that I haven't done a good job of in previous posts  - that a vast majority of performance cannot be explained by all the biometrics - height, weight, age... etc.  My suspicion is that even if I had more biologic measurements (leg/torso ratio, arm length, noggin size to toss out a couple of ridiculous possibilities), we could not explain all the variance, mainly because the main factor determining differences in athletes is pretty simple - fitness.

The performance plot below serves to demonstrate.  Here, I've selected upon the most common male athlete in the open (175 pounds, 5'10", and 28 years old) and plotted the distribution of their overall rank percentiles.  Notice there is still a large range of possible scores!

The bigger picture of crossfit shouldn't be forgotten among the comparison charts.  Get out there and improve yourself!  Stop reading this nerdy blog... okay okay... continue to read this blog.   Get stronger, work on lifts, feel good that you're accomplishing something you couldn't do before!  For the most people out there, certainly myself included, the downward slope on the age shouldn't be scary -  we still have a lot of upward potential.

Biometrics can't explain everything. There's still huge variation in performance (that is, fitness) among male athletes around 180 pounds, 5'10", aged 28. 

Saturday, April 16, 2011

What's a muscle up worth?

Not much, compared to the last OHS!

Thanks to BJ's observation below (see comments), the analysis of the original post is pretty much wrong.  Even though the muscle up should have been worth quite a bit (considering the percentile chart below), HQ is awarding the highest ranking points to tied individuals instead of the lowest.  So if you score a 90 (ie where the big vertical line is in both men and women on the plot below), you are awarded the percentile score at the top of the vertical line.  I had incorrectly assumed that the rank score for 90 would be at the bottom of the vertical line.

The corollary of this: the last OHS is one of the most important reps you can complete!

Thanks again, BJ, for pointing this out.  I have left the evidence of the previous post so people can make fun of me.

The muscle-up is the most valuable rep in the open thus far, gaining 13 percentile pts on the men's side, and a staggering 20 percentile pts on the women's side.  To put this in absolute terms, if all the competitors from week 3 continue through week 4, a male competitor completing a muscle up will beat an additional ~1150 competitors, while on the women's side the same will beat another ~950 competitors.

The first muscle-up is the single most valuable rep in the open thus far. The way HQ is scoring the open, the last OHS is one of the most valauble reps.  In fact, if you replace everything that I previously said about "the first muscle up" and replace it with "the last OHS" the post would be all good.

Since the percentile score at the point of the muscle-up starts around the 70th percentile for women, this presents a critical strategy for the gals. A single muscle up may mean the different between getting to regionals and not (also see chart for percentile scores of the top 100 athletes in each region).  If I were a combination of things:  (1) woman (2) on the edge of making regionals (3) able to get through the first two phases of the workout (all things I am not even remotely close to), I would do my very best to rest all muscles after the OHS and give the muscle-up the best shot in the last 20 seconds of the workout.  The second muscle up has little value compared to the first, and it doesn't make any sense to possibly expend energy if you have a few minutes left.

For the men, the same things hold, but the end result has far fewer implications since the muscle-up starts around the 42nd percentile.  The muscle-up would clearly be for pure glory!

Also, by popular request I have started to look into the age vs performance question, and yikes... the plots are pretty grim, especially on the men's side.  Stay tuned...

This post discusses the fourth 2011 crossfit open workout, described here.

Wednesday, April 13, 2011

Week 3 Dropout Statistics

Thanks to everybody who's commented so far on the blog!  There are some open questions to be decided for sure, in particular the height relationship on performance.  Rest assured it's being looked and will be presented in the future.

Erg.  I wanted to be able to cite specific numbers regarding the dropouts from week 2 to week 3.  I have learned, however, that HQ isn't showing a perfect leaderboard each week, which probably doesn't surprise a few folks given the comments I've seen on some of the pages.  For example, when I gathered the week 3 data for the men, there were a few hundred names that were not present in the week 2 data set.  I presume these folks actually had scores for week 2, but for whatever reason were not posted on the leaderboard at the end of week 2.  That alone would be fine, but in general, I think it means I can't totally trust that week 3's leaderboard contains all the continuing athletes.  Thus, I think the histograms below might be off by a few percent.

Overall the dropout percentage was similar across M/F boundaries - 23% (Men), and 24% (Women).  I estimate that the true number is +/- a few percent at most.

The first plot below shows, for each 5 pound female weight class, what percentage of athletes dropped in week 3.  Remember, this plot does not represent all athletes, just athletes that bothered to fill out their weight information.  While the absolute numbers might be slightly off, my prediction from earlier seems to have held.  Almost 50% of athletes under 110 pounds did not complete week 3.  Cuts were felt all around though, and even the lowest drop percentage was still around 15%.

From a performance standpoint, how did these athletes do?  This next plot shows the scatter of week 2 versus week 1 scores.  Blue dots represent athletes who finished week 3, and red dots are athletes who dropped (wk3 score = 1, or nothing).  A quick inspection reveals many of the drops occuring in the lower scores (lower left), but a surprising number are in the middle.

Week 3 heavily pruned female athletes that were light (left), and who received lower scores in weeks 1 and 2 (right).  Note: I am not  confident that the red dots in the highest performing areas (upper right) of the scatter chart are real dropouts.  They might be absent from the leaderboard, as discussed above.

The same plots for the men are less dramatic.  Interestingly, while the dropout percentage has a strong trend downward, the drops by wk1 and wk2 performance seem more scattered.  I can only conclude that some drops might result from people just not having the time to do the workout as directed, rather than some limitation in performance.

Sadly, I couldn't muster one rep for week3's WOD.   Our workout group had just started power cleans a month ago, and I had recently managed to clean my weight (~140).  In a fairly dumb move, I attempted a squat clean at 165 and not only managed to fail miserably, but also managed to sprain my wrist. 

Dropouts followed a similar weight trend as females (left), but surprisingly their performances seemed to be fairly uniform (right), except maybe in the very elite categories.  Ignore the big bar at 275, it's from low number stats.

Tuesday, April 12, 2011

Top 100 regional athletes through week 3

I thought I'd generate a plot so folks can easily compare regions.  One thing HQ did was generate an overall rank and rank score on the leaderboard.   I was going to do that, but they saved me a few bits of code.  For the following plot, the y-axis needs some explanation.   Someone correct me if I'm wrong, but HQ is determining the overall rank for each athlete by summing that athlete's rank for each specific WOD and using that as a 'total' score.  This actually makes perfect sense to me, but the actual number doesn't make sense to a whole lot of people.  I've normalized this score to a percentile, where your total score is converted to a number between 0 and 100.

If you were a superstar and finished first in all three WODs for a total rank score of '3', you'd have a percentile score of 100.   In contrast, if your total rank score is the highest (lowest performing) for all the athletes left, you'd have a percentile score of 0.  In theory, I think if you scored a consistent 80th percentile for all 3 exercises so far, your total rank percentile should be around 80 on my plot.  I should double check this.

Alright, I'll say it.  Adjusted for the number of competitors, it looks like Southern California has the fittest athletes.
I had previously assumed that in order to get into regionals, one had to shoot for the top ten percent in each exercise.   I would say it's more the exception than the rule, and (not hoping to start any flame wars) it depends quite a bit on each region.

Friday, April 8, 2011

Week 3 - destroyer of lightweights

We all knew week 3 was a big boy's (or girls) workout.  One big difference though from previous weeks - some athletes are finding it near impossible to do the workout at all!  This is most dramatic with the female athletes.  Below is the distribution of weights from athletes who completed week 2, versus those who have submitted scores for week 3.

Is this shift temporary, or a permanant result of week three's workout?

The bar height is the fraction of total female athletes in any particular 5 pound interval.  The green distribution represents a snapshot of the athletes after week 2, while the red distribution represents what it looks like right now.  If workout 3 was doable for everyone, we would expect both distributions to look fairly similar.   What we actually see, is a shift to the right, marked by areas where you see green peaking over the red in the lighter weight classes.   Keep in mind the height of the green bar relative to the red bars will tell you how much of the weight class is growing or shrinking.

There's two possible explanations for what we see now.   The lightweights are waiting to submit their scores, or there's a small fraction of athletes that can't do the workout and won't end up submitting.  We'll see after the end of the week, but if trends continue, we could see near half the female athletes weighing 110 and under registering a zero score.

 Is this fair?  For example, can there be workouts where it can be near impossible to complete if you're too heavy?  I've got one for for next year - 36" box jumps anyone?

Wednesday, April 6, 2011

Women's body weight performances wk1 and wk2

Considering that a third of the athletes still in the competition are women, I feel pretty bad about leaving them out so far.  Let's check in.

First, some caveats about the data.  One, while the number of female athletes right now are large (>6000), ahem, for whatever reason there's a big difference between the men and women in filling out the optional biologic data.  ~71% of men filled out weight and height information, while only ~47% of women did.  Consequently, the data for the women's weight class plots is a bit sparse. There's still some interesting information, though, so let's get to it!

First, in week 1:  I think the individual variation in performance in wk 1 was large enough to dominate any strong body weight relation.   You might say that the 100 and 105 pounders were crushed, and there may be some truth to that, but I'm not greatly confident here because the numbers are pretty sparse.  I could generally see how it makes sense though.  The power snatch Rx weight for Wk1 was 55 lbs, which might have been difficult to sustain for the lighter ladies.

For week 2 -  Interestingly, average workout 2 performance had a  gradual downward trajectory from left to right, which suggests to me that the box jumps/push ups were, compared to the boys, more difficult than the deadlift part of the workout.   I'm not too sure what to make of the elites, which were up and down in seemingly no real pattern.

Overall, through wk 2 I'd say the open has been fair throughout the weight classes on the women's side.  Now through week 3, I'd guess the circumstances are about to change drastically!

Tuesday, April 5, 2011

An ideal CrossFit weight (for men)?

Ooooo, good stuff today.   All the athletes have submitted scores for week 2, which means tons more data.  Logistically, I should be able to make plots a little quicker because I shouldn't have to reacquire all the personal data (weight, height, region) from everybody again.

The crossfit open doesn't use weight classes, and the argument put forth has been something along the lines of, "The workouts have all been balanced so heavy weight (easier for heavy people) are balanced with body weight exercises (easier for lighter folks).  Is that true?   Let's look at some data.

The blue line represents the 'elite' or the top 10% of athletes for a given weight (5 lb increments).  The red line is the overall average athlete for a given weight.

Plotted is the performance of athletes across weights for workouts 1 and 2.  To explain further - I have grouped athletes according to their weight (in 5 lb increments), and plotted the mean performance of those athletes.  'Average' athletes in a given weight are plotted in red, and 'elite' althletes, or the top ten percent of athletes in a given weight, are plotted in blue.  For reference, I have also included the cumulative percentile score chart on the right, since that information isn't immediately obvious on the open website.  If you wanted to quickly see where a score of 200 ranked on workout 1, you would go directly up from the bottom axis at 200 until it crossed the 'S' shaped curve.  From that point, going directly left to the axis will give the percentile score, which in this case is around the 30th percentile.

For workout 1, the curves go up from a weight of 140, peak at 185, then head downward.  Does this make a huge difference?  I think for average athletes the answer is yes.  An 'average' 185 open athlete bested 20 percent more people than his 'average' comrade weighing in at 230 lbs, translating to an overall rank difference of over 2000 people!

What about for elite athletes?  The mean percentile scores for 140, 185, and 230 pound elites were all in the top 10% of total scores, so I think weight wasn't too much of an issue for those folks.  The overall pattern is the same though.

As an aside, why should we care about the 'elite' catagory of athlete?   Given that there's 17 total regions, and 50 athletes that move onto the regionals (total 850), I would argue that the top 10% of athletes (~1100) should stand a reasonable chance at moving on to the next round of competition.  Understanding patterns in these athletes should hopefully tell us in the future what's important for the open.

It's good be a 180 pounder so far!

The circumstances change a bit when you look at workout 2.  Everybody, say over 220 pounds, should have screamed a collective Bender favorite, "We're boned!"   Yikes the heavier people were punished on this exercise,  no matter if you were an average athlete or an elite one.   In the elite category, the difference between an average athlete at 180 lbs and one at 230 was a crazy 19 percentile points.  Another stat, no person weighing above 220 lbs cracked the top 5% of scores.   I don't know how that will translate into a real effect at the end of the open, but if this trend continues I wouldn't be surprised if there weren't many 220 pounders that make it to Regionals.

So to answer the question, in there an ideal CrossFit weight for men?  So far in the open, the athletes around 180 pounds have had it pretty good.  There's still 4 workouts to go, so circumstances could change...

Analysis for the women will hopefully come next!

Friday, April 1, 2011

April 1st - First analysis

So one might ask, how does my week 1 performance predict my week 2 performance?   From a correlation standpoint it's actually pretty good, as one would expect.  The elites are still elites and less elite (such as myself) are still not elite.  If you haven't done wk2's exercise quite yet, you can use the plot to plan some sort of pace.  Of course you should try as hard as you can, but the plot might allow you to set some goals from the outset.

Correlation is around 0.7, for people wanting to know.
Part of doing this analysis was wondering if my meager weight might have something to do with my pitful performance in wk1.  While we can't all be Chris Spealler, I figured being a buck40 didn't do me any favors.  Not so.  There's almost zero relationship between weight and wk1 performance.  What correlation ther is most affected much heavier competitors.  Probably those double unders.

Correlation: -0.05

The weight issue, though, in week 2 becomes interesting.  While the data is all over the place, it looks like performance increases from low to about 180, but performance takes a hit after that.  Maybe the box jumps get harder for heavier folks?

Because of the larger penalty as things get heavier, correlation is -0.2.  Clearly the relationship is more complicated if you look at the plot though.