If youโve tried scraping โany website,โ youโve met chaos: inconsistent markup, dynamic content, random anti-bot tricks, and that one page that puts the year in an image alt tag for no sane reason. A single BeautifulSoup script works for one site. I wanted something that generalizes:
- Learn where data lives on a site (selectors, patterns).
- Run deterministically and fast using those learned rules.
- Use an LLM only when it actually helps (strict, validated JSON).