Thursday, October 23, 2014

Wrangling F1 Data With R - Living Book Release

Earlier this year, I started drafted a book on "Wrangling F1 Data With R". In part this was to explore self-publishing production workflows, in part to try to pull together the various notes and doodlings I've done over the last few years - and to act as a home for further tinkerings.

As projects such as this tend to do, it stalled. But in an attempt to restart it, I've published what I've done to date over on Leanpub in the hope that it'll provoke me to do more...

The book is available in a couple of forms:
  • as a paid for item: if you actually buy the book, the Leanpub model means you'll get access to any and all updates and revisions to it. There is a minimum price point and a pay-what-you-like-over-the-minimum price point.
  • as a preview item: I'm going to be randomly changing the free preview chapters (though I don't know how frequently) so over time every chapter will appear there. On occasion, the whole book to date will appear as a free preview item. If I do blogposts around particular topics (and I'm hoping to start blogging here again, though perhaps not significantly till next year) and those topics a book topics, the corresponding chapters will probably appear in the preview around the time of the post and for a week or two after.
You can find the free preview - and a place to buy access to the living book - here: Wrangling F1 Data With R.

Note that the chapters in the preview and the actual book may still bit a bit ragged and in draft or incomplete form. That's just the way it is... (If nothing else, it'll give you some hints about how a particular chapter might develop...)

At the moment the price is set at the minimum amount to enrol it in the affiliate marketing program. Affiliates get paid half the minimum price of the book. If an affiliate is responsible for the sale of a book at the minimum price, they get paid more than me.

Leanpub also allows coupon based marketing. So here are a couple of offers...
If you think you're deserving of a coupon, let me know...

This is all something of an experiment - in fact, several experiments - so any and all comments and feedback welcome... And purchases, of course;-)

Thursday, January 3, 2013

The Best F1 Driver of 2012???

I've been having a dabble with a database downloaded from the Ergast Motor Racing Database and came up with this ad hoc model for trying to get a feel for who the best F1 "racing" driver of 2012 was based on the difference in their grid position and final race classification.

The model I used was to calculate the following quantity: sum of (gridPosition-racePosition)/racePosition) over all races

The numerator (top term), gridPosition-racePosition, gives the number of positions gained (positive), or lost (negative).

These values are scaled by dividing through by the final racePosition in each corresponding race, which means that position changes at the head of the field count for more than those at the back of the field. (There is a certain balancing effect going on here though, because cars that start at the back of the field may gain several places through non-finishers amongst the cars that started ahead of them). Here are the resulting rankings...
I then tweaked the model to add in a term that accounted for holding position, equal to 1/racePosition (so the equation became sum of ( (gridPosition==racePosition)/racePosition + (gridPosition-racePosition)/racePosition) over all races. Here's the result this time:
A problem with that approach is that there is no differentiation between gaining a a place and holding position. We can account for this using the following refinement: sum of ( (gridPosition>=racePosition)/racePosition + (gridPosition-racePosition)/racePosition) over all races
Here's a final chart, this time using the product of the grid and final positions as the denominator (that is, calculating sum of ( (gridPosition==racePosition)/(racePosition*gridPosition) + (gridPosition-racePosition)/(racePosition*gridPosition) ) over all races).

So do any of these charts make (non)sense? (And if so, which make most (or least) sense and why?)

Friday, December 21, 2012

F1 2012 Race Review Charts App

I just realised I haven't posted any links to my graphical race review app, so here's a link now: F1 2012 Race Review Charts App...



There's also a brief write-up at More Shiny Goodness – Tinkering With the Ergast Motor Racing Data API.

Saturday, November 24, 2012

F1 2012 Brazil Third Practice Summary

A quick visual summary of P3 at the 2012 Brazilian Formula One Grand Prix...

Here are the deltas between best lap time and ultimate laptime (so who's not driving to the limit?;-):



The ultimate vs. personal best laptime chart shows similar information in an alternative form - the further away form the line, the further off the ultimate lap:



How does each driver fare by sector? 

We can also compare sectors separately:
Or like this....



Deltas again...



If we compare sector rankings with overall classifications, we can get a feel for drivers who are particularly quick - or slow - in a sector; something goes wrong for WEB in sector 3, but GRO is good in 2 and DIR in 3? ALO's losing in sector 1?



How do deltas compare by sector against sector classification?



How do the deltas compare in each sector (delta from fastes sector time)? Again, lets order by overall classification so we can see folk who are out of sorts:


By the by, how do speeds compare?


And sector rank v straightline speed?

Sector delta v speed?

Friday, November 23, 2012

F1 2012 VET vs. ALO in Practice and Qualifying

A couple of quick sketches to summarise how Vettel and Alonso have compared over the season to date in practice and qualifying.

First up, the time delta between their best times in each session:

This doesn't take into account the lap length, though, so we can normalise by dividing through by the fastest laptime recorded across VET and ALO for each session:
Casting a quick eye over the charts, I think we could make at least an anecdotal claim that progression continues in qualifying at least... so if VET improves over ALO going Q1 to Q2, then we'll see a further increase going into Q3?

How do P3/third practice times compare with qualifying session times?
VET seems to straddle his P3 time with the Q1 and Q2 times? (Below the line shows the qualifying session time beat the third practice time.)

How about ALO?
Is there a tendency there for ALO to perform better in qualifying than in P3? This is where "proper" statistics come in, not just graphical stats "by eye"... 

So, what statistical test should we use to see how qualifying session times compare with practice times?

Thursday, November 22, 2012

F1 2012 Championship Going in to Brazil

A quick round up of some of the elements of the races to date...

Intra-team comparison: the dots show the position of the two drivers ("driver 1" in the team is blue the "second" driver s orange; the bar shows the difference between positions in the team. Bar to the left of the centre line shows that driver 1 was higher placed):

As a side-effect, the lack of a green bar denotes that at most only one car finished for the team. The chart therefore immediately makes it clear that Ferrari had good reliability, but the McLarens had quite a few races without a double finish. ALO and VET are also seen to dominate their respective team rankings. HUL has had a good run in his team in the latter part of the season.

Here's another take on that chart - but this time we order the x-axis for classification so that 1st is at x=23 (i.e. better placed is further to the right; winer is extreme right).



Reasons for non-finishes (no data for United States?):
Pit stop performance - delta from best pit times:

Here are the pit deltas from best pit time by race, with a LOESS best fit line to try and pull out any trends:
Here's a close-up of fast pit stops by driver number:


Saturday, November 17, 2012

F1 2012 United States Qualifying Summary

A quick summary of qualifying at the 2012 Formula One US Grand Prix.

Here's how the practice and qualifying sessions went:


Does that table need to have the y-axis reversed so that first is at the top? I think it does, doesn't it?



How about progress through the actual qualifying sessions?




How close to ultimate lap times did each driver get?



In response to a comment from MartinB, here's that same chart with the laptime axes reversed so the fastet laps are top right:



Does that work? Convention is to read axes as increasing value, so we break that here... but maybe the chart does make more sense? If anything, I guess it shows how you really would need to take care when reading axis titles and labels... Below the line is now slower than ultimate lap. Would it be more useful if Ultimate and personal best axes were flipped? Or maybe we should just focus on deltas between personal bests and personal ultimates?

How about this - looking at deltas vs the personal best time?
The different axis scales really make a big difference here - VET was "way off" his ultimate laptime compared to HAM. But a problem with this chart is that we can't easily tell whether HAM would have beaten VET if HAM had driven his ultimate lap compared to VET's personal best lap?

How about with flipped x-axis so fastest is to the right:

Any good?




If drivers had driven ultimate laps during qualifying, would the qualification classification have been any different? RAI would have moved up a place, it seems...


Here's how the best sector time deltas stacked up - drivers are ordered by qualifying classification, so it should be easy enough to spot anyone who drove good sectors on different laps, but didn't hook them up in a single qualifying lap:






Are the drivers performing particularly well or badly in any particular sector?  MAL and ALO are making up ground in sector 1 compared to their overall classification, but MSC is losing places. In sector 2, HUL does well but WEB is down. In sector 3, MAS is up, GRO and HUL are down.

We can also look at deltas to the fastest time in each sector, again arranged by overall classification (rather than rank in sector). WEB and GRO both lose time in sector 2.
How do the session times vary from the ultimate laptime (and did anyone fare better in an earlier session than a later one?)? Seems like BUT did better in Q1 than Q2... and ALO did better in Q2 than Q3:


How did speeds compare?
How slow are the Red Bulls?!

Any obvious (to the eye) relationship between speed and ultimate laptime?


I guess the next thing to do is to start trying to tie these data views into session reports, such as this one form f1fanatic: Vettel takes pole as Alonso struggles to fourth row. What we really need to support session write-ups of that form is the timing data for each qualifying lap. This is available from the FIA media centre, but I've been laying off the PDF scraping for the second half of this season...