The International Cycling Federation publishes a list of bike frames legal for use in the World Tour – the highest level of cycling racing. This list is a good way to keep an eye on the new release of top-end road bikes, as they are published to the list days or weeks before their official media release. It’s also a good way to check and see the best bikes – in general newer bikes are a bit more advanced than older ones. However, it is published as a PDF that is not sortable, and scrolling through its many pages, hunting for a new release every week is not practical.
To make monitoring this list easier for me, I (somewhat crudely) am using Python to extract the PDF into a collection of tables. It’s not particularly clean, as it’s sensitive to table rows that have multiple lines in them. It also tends to mess up, often putting data in the wrong column. But all I really want is a Date and Name in some form, and that’s what this does.