In the cycling world, there is a wealth of data analysis and visualization tools, of which I have experience with Strava and TrainingPeaks. In fact, cycling is probably one of the most data-mature consumer industries, ironic considering at face value cycling brings up images of rugged mountain trails not computers and sensors.
The data we are working with here is based around a GPS headunit, usually a Garmin unit, although Wahoo, Lezyne, and others are moving up in the market share. These units bring together a wealth of data, of which the most important is power (watts, direct measure of effort) but due to the greater than $500 price of the units, is reserved mostly for serious cyclists. Heart rate is also extremely useful for quantifying the human cost of effort. Beyond that are various terrain/movement measures such as speed and elevation, cadence measuring pedal rotations per minute, and even options like current gear ratio with eTap and DuraAce electronic groupsets. In short, there’s lots of data which can be collected.
Behind the scenes, a Garmin is recording all of this data every second in a .fit file which is a type of XML data storage. In addition to the per second data logs, these files also have an extensive header with a wealth of information like unit serial number, battery charge, standing device records, and so on. Most users don’t need to look to closely at these files, as the data is accessible by manufacturer (Garmin) or third-party software (Strava).
But I’m not a normal person, and I wanted to get a closer look at the data.
In order to do that, I needed to first get the data out of the unique .fit file into a more standard format like a csv. Luckily, the maintainer of this data standard (the ANT+ Alliance run by Garmin Canada) provides a set of development tools for this purpose. It would also be relatively straight-forward, albeit much slower, to write your own code to convert between formats.
.fit to .csv conversion
Using the development tools found here on the ANT+ Alliance website and discussed here, .fit files can be batch processed by the java bat file by an expedient such as dragging and dropping the files onto the .bat icon. You’ll also need the development version of Java installed and listed in the system path, if you don’t have that already.
If you only want to view one file, then you are done here. I, however, wanted to analyze all data across time. My next step was to first merge together all the csv files into a single mass csv file. The fastest way to do this is to on the command line.
CD to your folder with the files (and only your ride csv files).
cat *.csv > combined.csv
copy *.csv combined.csv
And viola, you have a massive file of all the individual rides combined into one. It’s definitely messy, with headers splattered throughout, so it still needs lots of cleaning.
You can view my data processing in R on the associated github page.
Some general pointers:
- Timestamps are seconds from 1989-12-31
- Latitude and Longitude needs to be converted from semicircles (semicircle*(180/2^31))
- It’s useful to create an id for each ride. I did so by combining the ‘time created’ and ‘serial number’
- Excepting the first couple of columns, all remaining columns are not consistent. For example, the ‘power’ data could be in column 8 or column 12 depending how many sensors are connected for that particular ride. The data field will tell you which is which, but you need to search all columns for this. I did a bulky, but effective `ifelse` command in R.
- Filter out all but “data” and “record” rows for analyzing the ride data directly.
*Note that not all of my sister’s data is in .fit format. Some is also in .pwx and .srm data files from track cycling or watt bikes. I didn’t even bother with those, since they don’t contain any of the GPS position and movement data, which was part of my interest in the project, and because it would be a lot more work.