I created Hayfevr.ly (GH repo to come) to solve a problem I had. Two local allergy clinics and a TV news station posted daily pollen readings on their websites, so I had to keep re-checking three different websites each morning to find out what was making me sneeze each day.
I wrote a web scraper using Python and Selenium that checks these websites for new readings, updates the daily pollen counts as newreadings come in, and summarizes the data in one convenient place on the website.
- Selenium for the scraper
- Tesseract for OCR to extract pollen counts published only as graphics
- MySQL for storing current and historical readings
- AWS S3 for web service