Wednesday, February 29, 2012

WOD 12.1 - Early plots and analysis

Thanks to miked and killers411, newcomers to the crossfit analysis scene, we have some early plots and discussion points!  Thanks guys!

Miked has plotted the real distributions and fitted distributions of the WOD 12.1, for all athletes, including masters.  He's also very nicely calculated the mean and std of each of these groups and included them in the legend.  Notes:  the masters distributions are hard to see because the y-axis is number of athletes, and of course the number of athletes in the non-masters groups dwarf that of the masters.

Also, he's plotted performance against height and weight, and done some modeling of workout performance and workload.  Outstanding stuff and it'll be fun to see some more.

Here's the link from miked.

Killers411, has done a nice number comparison of each region, which suggests the open has grown 2.5 times since last year!  The fight for regionals will certainly be tougher with all these newcomers.

...and the link from killers411.

Saturday, February 25, 2012

Group thought: Questions to ask during the 2012 open....

Jeff is doing a great job providing a complete dataset this year (see last post for a link), which will allow many people to play with the data and come up with interesting analysis.  It's a ton of hard work and we're very fortunate to have him donating his time to this effort.

That said, what are some of the burning questions people want to ask of the dataset?  Hopefully by getting the questions out, we can have folks discussing the data and questions together.

Here are some of the things I'd like to see during the games.

1)  I'm fond of the height/weight plot for each workout, as I think it gives a quick informative picture of what each WOD.  By week 3 or week 4, is it possible to predict what the last wod should be?  One of my standing predictions is that wall ball will be a part of the open.  I think it's one of the few exercises that helps taller athletes.  Erg likely falls into that category too, but I have trouble thinking that would be required.

2)  At the end of the games, will it be possible to answer how much a year's worth of training helps?  This is a bit tricky, because we'll need someway of understanding if the overall performance of open participants has changed over the years.  Has crossfit's emergence and popularity shifted the performance curve down? 

Feel free to post your own questions, about a specific WOD or of the games overall.

Tuesday, February 21, 2012

The Crossfit Open 2011 dataset, for download

At some point I thought I would write something up more formally about the 2011 Open, but that moment has certainly passed.  Many folks have asked for the data from the 2011 Open, and here it is!

The funny thing is, I have some suspicions that the 2011 Open dataset might be better than the 2012 dataset for the height/weight analyses.  Why?  Part of it is the registration process.  In 2011, when athletes registered, they were asked immediately for their height and weight.  In 2012, these questions have been eliminated, although folks can enter this information freely in their profile.  As a result though, I expect that volunteered information for height/weight will fall off dramatically.  Who knows though, with how big the Open is getting maybe it won't matter?

Click here for a .csv file of the 2011 Open dataset, courtesy of work done by Greg Perkins.  The dataset includes all athletes, including athletes that did not finish all six workouts.  The column headers should be:
- athlete ID,nameURL,age,sex&division,height,weight, overall-points,overall-rank, score1,rank1, score2, rank2, score3,rank3, score4,rank4, score5,rank5, score6,rank6  

Click here for some helpful matlab scripts, including one that breaks the overall dataset into separate structures for each competition category.

And here for some information about the .csv file and descriptions of the matlab scripts.

Thursday, February 16, 2012

Calling all fellow xfit nerds - help!

So the open is just around the corner... and so is my PhD thesis defense!  I defend April 12th, and this next month or two is going to be jam packed for me.  I'm trying to assemble one more manuscript before I finish and I have yet to start writing - yikes.

I'm almost certain that I won't be able to maintain the blog as well as I did last year - and let's be clear, Greg Perkins was a HUGE help in helping me acquire the data in last couple of weeks of 2011.  Without him I probably would have spent many more hours just trying to mine the data.

So at this point, I'm trying to start a discussion among people who would be willing and capable of helping out this year.  I can host the 2012 blog, or somebody else can if they prefer and want to make it look all fancy (instead of the terrible book background I've been using).  I can share the matlab scripts I used to generate the plots last year, which can be easily adapted to other languages.  

I'm just brain storming now, but we need people who can do the following:

1)  Understand how HQ is posting the workout scores on the website, and figure out a way to dump these scores along with all the other athlete information (name, region, age, height, weight) etc.  Greg, are you still around these days?  Do you think your nid trick will work like last year?

2)  Take data dumps in relatively real time and play around with the data, pose a few interesting questions and make some rough plots.  Ideally, this person would also be writing something to go along with their plots.  I can facilitate here a bit and help the crowd.

3)  Folks that can take the plots and clean them up a bit for presentation purposes.

I think that's all for now.  Please comment if you think you have the skill and desire to help!