# Running a marathon

Last week I ran the 2012 Marine Corps Marathon. It was my first marathon, I did it as vegan, and it was a great experience. Finish time was 3h54m17s, much better than I expected. To get to that point I trained hard, I focused on my diet and my stride, I read papers on strategies, and I geeked with the data. I want to talk about all that.

I can proudly say that the marathon was a challenge, but not a suffering. I did the last mile with a smile on my face and a strong 15 seconds full sprint to the finish line. Two days later I barely had soreness in the legs, and none of the usual problems: my toe nails are intact, no blisters, injuries, chaffing or particular pain. I base this success in 4 things: Stride, Fitness, Nutrition, Motivation.

###1. Stride

“Every year as many as eighty percent of runners suffer an injury that will keep them off the streets for a month”, quotes Christopher McDougall. It doesn’t make much sense to me. Running should be natural, why do we get so many injuries? These kind of thoughts lead me to consider the minimal shoes Vibram Fivefingers. I read a couple of articles, I followed a couple of threads and that was enough for me to switch to the forefoot strike

The key points to me were:

• The running man hypothesis makes sense to me.
• I’ve seen and stretched an Achilles tendon on a dead body (at the Faculty of Medicine). They really have some elasticity. I can see how part of the landing weight is absorbed by the elasticity and released when you push, minimizing the energy to run. Incidentally you can see on the image above a fellow runner with a prosthetic limb using the same principle with the arched metal.
• Landing on the ball of the foot really uses the feet that otherwise are quite passive.
• The stride changes your posture and makes you run more straight and stable above the waist, landing your feet right above the center of mass, so there is no vector pointing backwards. It’s just pushing back all the time.
• The posture also make you land already with a slight bend on the knee, minimizing the impact on the knee and hips. This point was very important, as I would usually get pain there after long runs.
• Quite literally the absorption of the impact is done by the muscles and tendons, not the bones and articulations.
• It just feels smoother and lighter.

The only drawback is that, since we always use running shoes, we have never run like that, and muscles and tendons are not prepared. You need to go very slowly or you will injure yourself (like I did last year when I went cold turkey to the Vibram). At first it will feel awkward and it will be hard on your Soleus, Gastrocnemius, Aquiles, soles of the feet and thigh. It helps to walk barefoot at home. It soon pays back, I really believe this stride contributed to running longer, faster and with less pain or soreness afterwards.

I did not run barefoot like at least one people I saw, or with minimal shoes, like a few as well that I saw, but I would not discard it. As I run, I barely use the heel, so the extra cushioning isn’t really involved on my stride. However, having that option helps and on a few occasions I changed my stride to heel strike to release tension and change the cadence as I felt it.

As a side note. When I finished, after a few minutes, I started to feel my feet cramping. I removed the shoes and walked on my socks. It worked immediately, relaxing my feet. They just wanted to get free from the shoes and stretch out a little bit, just like my legs.

###2. Fitness

Together with the stride, you need to have physical fitness to endure the 26 miles. For that you just need to bank as many miles as possible, without over training. I followed a training plan from Adidas MiCoach, but adjusting it whenever I didn’t feel good or I had any problems. I didn’t´really pay much attention to the suggested change of pace, I just kept it on a comfortable pace of around 10km/h. Before this training the longest I did was a half marathon in March, before changing the stride, and that was my absolute maximum. I felt awful when I finished, much pain in the left hip and soreness for several days. In the 100 days I trained for this marathon, I went for 5 km/week the first week, to 70 km/week 4 weeks before tapering for the race.

From what I could learn, one of the keys of endurance training is learning about Glycogen dynamics. Glycogen is basically the main energy storage of the muscles. The other one, less efficient but more compact is lipids. Some is stored directly on the muscles for immediate use and most of the rest on the liver where, on demand, will be broken down to glucose and send with the blood elsewhere. So the more your muscles store, the better. Training helps muscles store more Glycogen. When we ran out of it, we start to use the other fat reserves. When we deplete glycogen, we hit the wall, typically around mile 20. That’s why is helpful to replenish glucose as you run. My choice was GU and Clif Shots every 45 minutes or so. With too many of either you get tired of their taste. Even so, my stomach was not too happy after so many of those. If think next time I´ll try peanut butter sandwich, falafel roll or something more… food resembling.

Another factor here, the metabolic rate, is very important. When there is enough oxygen the preferred cell respiration to get ATPs on the Slow Twitch fibers (the ones for endurance) is aerobic. You want that for low intensity over long periods of time. If you need more energy than the aerobic metabolism can provide, the body uses the anaerobic pathways, which help get that sprint (mostly on the Fast Twitch fibers), but also produce lactic acid at higher rates than the body can clean, which will make you cramp and even vomit. You want to keep the anaerobic rate down for as long as possible. There is a rough relation of heart rate (oxygen blood flow) and metabolic regime, for my age (31) is below ~150 bpm for aerobic exercise. You want to keep it below that range, or you will start to consume your precious glycogen anaerobically while your lactate acid builds up. I used a heart rate band for some runs, until I learn to recognize which levels of effort meant which heart rate, and then I was pretty good at keeping it around 150, which also meant around 10 km/h. See my heart rate progress on the marathon run:

It was actually quite quick to get there, just 250 m:

Unfortunately my phone battery died 2 miles before the end (…some call it progress…) so I don´t have the pace or bpm during the 15 seconds or so of the sprint. From previous times I think it would be around 25 km/h and closing in into 180 bpm or more when I stopped. I would have loved to see the recovery time back to the resting 60sh bpm or so (perhaps 10 minutes? no idea). From other runs I know I can go down from 170ish to 110 in one minute, but I presume the decay is proportional to the bpm, so from there to 60 perhaps another 5 minutes or more.

My guess is that a key to run faster marathons is to train your body to run faster keeping it on the aerobic side.

###3. Nutrition

Since last Christmas I decided to try to be vegan. No meat (yes, chicken is also meat), eggs, milk, cheese, honey. Nothing from an animal. I did it mostly for environment reasons (although I find more and more), and switched back to vegetarian or even meat eater whenever it was too difficult (e.g. when I went to Rio de Janeiro, they eat meat on everything!). All in all I eat meat roughly once a month. I had gone to the doctor before becoming vegan of preference and 3 months later for blood and physical test. Everything was great, I felt better and lost weight. But my doctor did not approved it. When I told him I was planning to train for a marathon as a vegan, he called me stupid. Literally.

I find more and more reasons to reduce my meat intake as much as I can, but I wasn’t sure about running a marathon on plants. After reading some more, and learning about many ultramarathoners are indeed vegan, I decided to try, but monitoring my health. I am not an expert in nutrition so I keep it simple: tons of cereals, many legumes and veggies, some fruit and a bit of olive oil. This is my grocery shopping for a couple of days, 90% from the produce section:

In particular I enjoyed the extra intake of quinoa, beets, nuts… I plan to continue like this. Please read “In defense of food”, it´s a great book. I don’t think being vegan makes training for a marathon any harder, perhaps on the contrary. And it makes you be more conscious of what you eat, which is good. And Mint says no significant impact on my food budget.

###4. Motivation

Probably the first lesson you learn when you train for a marathon is that commitment is a big part of it. The training will predate over your life. Forget about going out or drinking the night before a long run. The long run itself will make you wake up early on the weekends and will keep you busy at least half of that day alone. I was making 8 hours of actual running a week. You need to watch what you eat, not to get injured, what to wear, … You friends will tell you you are obsessed with running. Turns out its addictive! My life during the last month or so was pretty much running, sleeping and working.

You really need motivation specially on the first days. Running 40 minutes can get very long those early days. But as you train more and more, you start to enjoy and look forward that feeling of achievement and fitness at the end of the race. After a while, the running itself becomes the reward and, believe me, you will really enjoy running. You stop thinking about the time and the steps and you get lost in your thoughts, the music you hear, the nature around you.

Try to find various sources of motivations. What really worked for me those early days was the music I was playing via Pandora, and then later Spotify (btw this is my running playlist). And of course the mapping. I was mapping every race with RunKeeper, and knowing I was going to be able to monitor the track, the speed or the heart rate was a motivation on itself. I even made plans to make tracks of particular paths or shapes, like running the borders of the DC rhomboid. During the long runs I would post a live link so that friends or family could track it in real time. Knowing someone is following my run made me push harder. Getting those likes and comments in Facebook really helped to push through weak times.

Since November 2011 that I started using RunKeeper, I’ve logged 1,440 km, roughly 718Km for this marathon training. I’ve written before on how to make a map in TileMill with all your runs as a heatmap. I did it again, this time with the marathon itself in black, for contrast.

Together with the same process I used for the heatmap, this is the result. Pan and zoom at will [full screen here]:

##The day of the Marathon

There should be nothing new the day of the race. Don´t try anything for the first time on that day, the slight deviation could risk the performance you have trained for 100 days or more. New shirt, new shoes, more coffee. Nope.

The day of the race I woke up at 4:30am to get up and have the usual pre long race breakfast: Granola, soy milk, banana, and a slice of bread with peanut butter. Get to the first metro around 5:15am and head to the Starting line, on the Pentagon. More than 30.000 people are expected, and there several steps to make, so you don´t want to miss the 8am start.

On the street a few Halloween walks of shame to cheer you up, and the metro is only has runners. Get off at the Pentagon station and head to the Parking lot. Go through the security lines (it´s the Pentagon after all), gently apply Body Glide or Vaseline on chaffing areas, visit one of the thousands of portajohns, drop your bag at the UPS trucks, get some water, … and wait. Everything was much faster than expected, and there you have an increasing number of runners, all ready, sitting down quiet on the parking lot, still before dawn. Most of them wearing plastic bags to avoid getting cold. Around 7:20am you head to you corral. People organize themselves in groups, corrals according to the expected finishing time. I chose one one the slow side of my expectation (4h30m), to help me push along if things go wrong, and to get that extra boost of overtaking people as you go stronger.

When the shot gun fires, it still takes a few minutes for your corral to start moving, and for the first 5 miles of so, you run on dense multipede of people snaking around the bends and turns of the course. Getting to an overpass or a downhill is quite a view of just people running, for miles. One of the common mistakes is to start too strong, so I tried to concentrate on the music and the feeling of any other long run. Slow but not walking on the uphills, easy but not letting go on the downhills. Some spots have a lot of people that cheer you up, and honestly carry you on with their support. Some spots have no spectators, so you just enjoy the beautiful autumn day, and look around fellow runners. Along with the music of Pandora, RunKeeper tells me the pace every 5 minutes. On my forearm I wrote my boundaries. Keep it between a 9:11 min/mile (for a 4 h marathon) and 9:55 for a (4h30m). I was always a bit faster than 9, so I was holding it most of the time. Every 45 minutes or so eat your Go. Keep running and remember you are running a marathon.

Until the half marathon mark, I was just running. Then I realized I was a bit slow, and had enough energy. But the stomach was acting up a bit. Too much Gu with caffeine? A technical stop for 60 seconds, and back to the race. The first part of the second half is where most of the people are watching and cheering, so I let myself be carried on with their cheers. Mile 20, the famous milestone for many runners, coincided with the bridge over the Potomac, the wind, no people to cheer you, seeing many fellow runners give up and walk, sit down or even lay down… I turned the music on just as Thievery Corporation started. 6 miles didn´t seem so short any more after 20. On that moment I though of my cousin who suffers a degenerative motor disorder and is on a wheel chair. I focused my thoughts on my luck to being able to get this far, and I pushed on faster. Nearing the mile 24 was when I realized it was almost over, and I saw the 4h pacer (a guy how runs from the 4h corral and keeps the pace, to help others) so I took my last Gu (probably unnecessary and not good for the stomach) and kept the pace, now more relaxed and even smiling. I knew I had energy to finish, so as soon as I thought I was less than 20 seconds on a full sprint, I gave it all and dashed to the finish line. Vamos!

The geek fun continues after the marathon. I got hold of all runners data and started playing around with it. This is the second part:

[Update: Added this SpeakerDeck with the slides I used for the GedoDC Meetup talk on Dec 2012]

##How I got the data

Scrapping the data

The official website lists the results up to 100 per page, but it won´t allow to download it or get all the pages. Thankfully, just hacking the URL you can request “the first 50.000 results” for the first page. It takes a bit to load, it is indeed a huge table with all the data. This is public information, so I don’t see any problem taking the public data and mining it.

Converting the table into csv

To convert the HTML table into workable data, I tried several ways, and ended up just selecting all the text in the page via “Select All”, “Copy” and then paste it on an Excel Workbook. It takes a while, but it works. Just clean the extra lines and columns, and save it as a csv file. From the 34 Mb of the html or Excel, we go down to a 4 Mb csv file.

Cleaning and geocoding

I then used Google Refine to clean the data. In particular I did some Clustering to fix misspelled Location names, like “Alexxandria”. I also joined the “City” with “State” so that the geolocation would work better.

One important step for later is the Geocoding of the “City, State” column. As mentioned on this post, you can use Yahoo API to request the (latitude,longitude) for any location. To geocode your locations with Google Refine, you need to make a new column based on the Location column using the URL Fetch, and use the YAHOO geocode Yahoo API:

https://where.yahooapis.com/geocode?q=" + value.replace(/\s+/, " ") + "&appid=[appID]


The “replace” thing is to convert spaces into “%20” for the URL (like New York,NY into New%20York,NY). I minimized the milliseconds of throttling to 1 and still it took a good 15 minutes.

Then parse the extra column you just made to extract the (latitude,longitude) tuples from the XML:

forEach(value.split("<Result>"), v,v.partition("<latitude>")[2].partition("</latitude>")[0] +","+v.partition("<longitude>")[2].partition("</longitude>")[0])[1]


Then just export the data as csv for the next step, mining the data.

Mining the data with Python

Disclaimer: This is my very first Python code, so its is not very… pythonic

To import the data in Python I used: import csv runner=[] f = open(‘MCM.csv’, ‘rt’) try: reader = csv.DictReader(f) for row in reader: runner.append(row) finally: f.close()

With this I can get some insights on the Marathon:

• There were 23517 finishers,
• Runners came from 4274 different Locations.
• 42,5% where women, 57,5% men.
• The most common runner was a male, 38 years old, from Washington DC, finishing in ~4h24m
• The most common woman was ~20m slower than the most common man.

But let’s break it down more. You can see my whole code to mine the data and make the plots in python here.

Histogram of participation by gender

This histogram shows the distribution of Ages, according to gender (mouse over for tooltips):

Remember this is not a random sample of population. These are all marathon finishers (A small subset of the population… 1%??). At any age, there is more men than women, except from 25-30. Perhaps women are more physically active then. After that women decline while male participation increases. I wonder if having your first kid is reflected here. I can imagine it’s hard to train or run when you are pregnant or new mum. According to the statistics the mean age for first kid in the USA is 24, so it is plausible that effect shows here.

Men participation peaks at 40-45, after which the decline is similar for both genders, with always more men. Remarkable than 10% of the participants were above 53, and another 10% below 26. As I will show later, above 5 hours of finishing time there is not much difference in gender. Tell me another sport where this variety of ages can challenge themselves side to side. Running is a natural thing for humans.

Histogram of time by gender

Now, lets compare finishing times by gender, regardless of age. Men peak around ~4h30m, with another peak just below 4h (me!). Women peak more smoothly around ~4h50m.

Interestingly, the women distribution is close to a Gaussian (and you see similar values around the peak on both sides). Now, male runners faster than 5h seem to be able to squeeze better times than females of same age. The difference is highest just below 4h. I know this time is kind of a mental goal for many runners (me included). However, after 5 hours, both genders decay in the same manner, with women always slightly better then men.

What about time AND age, by gender?

Combining both histograms one can plot a Scatter diagram where the horizontal is Age and vertical is Finishing time. Different color for males and females.

Both genders show a similar distribution, quite broad both in Age and Time. That means neither variable is significative to the others. Youngsters can be slow, and seniors fast, regardless of gender. There is a slight convergence towards more time as age increases. There seems to be a slight slope of less than 1 hour every 20 years for the best runners. I think the race course closed after 7 hours, hence the flat upper boundary.

However, there is blue edge on the fast side, all across ages. Faster men tend to be faster than women of the same age, specially between 30 and 60 years old.

By the way, I am flabbergasted by the right side of the scatter, runners above 60 running on the same time band than people 40 years younger. Respect!

Pacing yourself

One can also plot the Excess time. This is the difference between twice the half-marathon time and the total time. If you keep a constant pace, this Excess time will be 0. If you hit the wall on the second half and need 1 hour more to finish, then your Excess time will be 1. I am plotting this against Age, to see how it depends on the age, and also with different colors for men and women:

As the figure shows, very few people are actually faster on the second half (means to the left of the vertical line above 0). Above 60 virtually everyone is slower on the second half.

The graph is pretty much a packed vertical shape, but with some convergence on the top. Older runners seem to be able to keep their pace better, specially women.

Again you can see a blue edge across all ages, mostly men that need more than one hour on the second half. I would say they “hit the wall” hard. In that sense, it seems men hit the wall more than women, specially young runners below 60. Combined with the previous graph, faster men are faster than faster women of the same age, but also more men end up hitting the wall.

Mapping the clusters of runners

For the next visualization, I wanted to map all runners on the US map. You can indeed upload the csv file with just the names of their cities to Google Fusion Tables (the file is too big for Google Docs), and Google will recognize the Location names and map then for you. Pretty awesome, quick, and easy.

There are several problems here. For once, I want to have some control to make the map more beautiful and customize the looks (and I dig open source). More importantly, any place with more than one runner will simply overlap everyone on the same place. We need to aggregate the data.

To aggregate the data I use again Python, and this little code:

#Get a list of cities
#Count uniques
from collections import Counter
Places=[r['Location'] for r in runner if 'Location' in r]
Places_counts = Counter(Places)
len(Places_counts)
Places_counts.most_common(10)
#Make a dictionary with Places, their location, and number of runners
Places_dic={}

from collections import Counter
#runner is the list with every runner a dictionary
Places=[r['Location'] for r in runner if 'Location' in r]
Places_counts = Counter(Places)
for r in runner:
for P in Places_counts:

if r['Location']==P:  #runner is from that place
if P in Places_dic:
#not the first time, add info to this location
Places_dic[P]['runners']+=1
Places_dic[P]['Time'].append(int(r['ChipTimeSeconds']))
Places_dic[P]['Ages'].append(int(r['Age']))
Places_dic[P]['Gender'].append(r['Sex'])
if Places_dic[P]['lat']=='':
Places_dic[P]['lat']=r['lat']
print 'gotcha'
if Places_dic[P]['lng']=='':
Places_dic[P]['lng']=r['lat']
print 'gotcha'
else:
Places_dic[P]={}
Places_dic[P]['runners']=1
Places_dic[P]['Time']=[int(r['ChipTimeSeconds'])]
Places_dic[P]['Ages']=[int(r['Age'])]
Places_dic[P]['Gender']=[r['Sex']]
Places_dic[P]['lat']=r['lat']
Places_dic[P]['lng']=r['lng']
print 'new place:',P


With a little bit of some more python magic, you can aggregate the times and ages to get the percentiles by gender, and save everything to a neat csv file with all the info you need. You can see the whole python code here.

Now you can go to TileMill, import the file on a new project and have the map styled, up and running in less than 10 minutes.

and this is the result, pan and zoom at will (Full Screen here)

Overtakers or overtaken?

When I was talking with another marathoner about the draft of this post he said it would be interesting to see a scatter plot of how many people you pass by gender, or age. This involves not only your pace, but also which corral you chose (your expectation). We only have data for Bib numbers at Gun Time, ChipTime, and partial ChipTime splits for 5K,10K,15K..40K. Can we get the data we want?

1. Ordering Bib numbers by “Gun Time” at any landmark (5K,10K, ..) gives you the order of arrival to that line mark, regardless of their position at the Start time. It´s just the time it took them to get to the landmark from the Gun Time.
2. Ordering Bib numbers by “Gun Time” minus Chip Time, gives you an estimate of the distance from the runner at the Gun Time and the Start Line. That is a rough estimate of the order at the start time.
3. Hence the difference of 1 and 2 for each land mark gives you the net change. Using the landmark you can even check for which Bib numbers changed in between.

Helpers in python to get there:

• You can create a new key with to convert time in 1:34:10 into second with:

   def count_sec(lista):
return np.sum(np.array([np.int(a) for a in lista.split(':')])*[60*60,60,1])

• Get the rough order estimates with:

    for r in runner:
r['Start_order_in seconds']=count_sec(r['ChipTime'])-count_sec(r['ClockTime'])
Start=sorted(runners, key=lambda k: k['Start_order_seconds'])

• Get the change in order for each Bib with:

    for r in runner:
index_s=map(itemgetter('Bib'),Start).index(r['Bib'])
index_e=map(itemgetter('Bib'),End).index(r['Bib'])
r['Change']=index_s-index_e
`

… but I leave the train here. There are surely many more things one can do. It has been an amazing experience, and many more will come. I enjoyed and learnt a great deal with the training. Running the marathon was a challenge but not a suffering and I really enjoyed doing it. Playing around with this data has also been very fun!

Now I don´t know If I want to try a trail marathon, an ultra, or a triathlon :)