Webscraping with LLMs

Up-to-date web scraping combined with LLMs.

LLM for Devs

Recently added

Advanced Web Crawling Extraction with Firecrawl /extract

This lesson demonstrates efficient web scraping techniques using the Firecrawl library, focusing on extracting AI model pricing data from multiple websites concurrently. By comparing individual and batch extraction methods, it highlights the significant speed improvements achieved through asynchronous programming and parallel processing.

15mFeb 5, 2025

Member

Lesson 3

Firecrawl /extract vs scrape + LLM extract

Master web data extraction with Firecrawl's powerful `/extract` feature, leveraging AI to transform websites into structured JSON data using simple prompts and schemas. Explore the user-friendly playground API, compare different extraction methods, and efficiently validate your data using Pydantic for streamlined workflows.

13mFeb 3, 2025

Free

Lesson 11

Scrape any website with OpenAI Functions & LangChain

This lesson teaches you to build a Python web scraping and data extraction system using AI. It covers techniques like using Playwright for efficient scraping, Langchain for LLM-powered information extraction, and Pydantic for schema-based data validation, resulting in a robust and adaptable system.

24mJan 11, 2025

Free

Lesson 10

Batch Scrape URLs instead of one at a time

This lesson demonstrates efficient web scraping of Anthropic's job postings using the `firecrawl` library, extracting key details like job titles, skills, and salary ranges. The asynchronous approach, leveraging `asyncio`, allows for concurrent processing of multiple URLs, significantly improving the speed and efficiency of data extraction.

6mDec 4, 2024

Member

Lesson 9

Agentically scrape the web with Firecrawl & LangGraph (LangChain)

This lesson teaches you to build efficient web scraping agents using Python libraries `langgraph` and `firecrawl-py`. By combining agent-based design with a state machine, you'll learn to extract specific product information from websites like Canada Goose, handling errors and optimizing for speed.

15mNov 3, 2024

Free

All lessons

Lesson 1

Why is web scraping popular with LLMs now?

This lesson explores AI-powered web scraping tools, comparing user-friendly platforms like ScrapeGraphAI and Tavily with the more advanced, customizable Firecrawl API. Key features, pricing models, and integrations with LLMs are discussed to help viewers choose the best tool for their data extraction needs.

5mOct 29, 2024

Member

Lesson 2

Baby step: scrape your first website and pipe to an LLM

This lesson teaches you how to use Python's Firecrawl package to scrape pricing data from Stripe and Paddle, then uses a large language model to compare their features and costs. The tutorial covers web scraping techniques, LLM integration, and a detailed analysis of Stripe and Paddle's pricing models, highlighting key differences and advantages.

11mOct 31, 2024

Member

Lesson 3

Firecrawl /extract vs scrape + LLM extract

13mFeb 3, 2025

Free

Lesson 4

Advanced Web Crawling Extraction with Firecrawl /extract

15mFeb 5, 2025

Member

Lesson 5

Onboard new users by scraping their sites and extract required info with LLM

This lesson showcases Firecrawl, a user-friendly web scraping SaaS, integrating seamlessly with Python and LLMs for efficient data extraction. Learn how to quickly onboard customers by automating data retrieval from websites like Whit's Custard, leveraging Firecrawl's frequent updates and ease of use for streamlined workflows.

24mOct 29, 2024

Member

Lesson 6

Scrape interactive sites like Perplexity or Meetup.com with Actions from Firecrawl

This lesson teaches you how to use Firecrawl for web scraping, focusing on extracting event data from websites like Meetup, Amazon, and Perplexity. It also demonstrates how to combine web scraping with LLMs for OCR and data analysis, overcoming challenges like unreliable website structures and improving data extraction accuracy.

16mOct 29, 2024

Member

Lesson 7

Don't wait and poll crawl jobs - use webhooks to get notified of when they're done

This lesson teaches efficient web scraping using Firecrawl and webhooks, eliminating inefficient polling methods. It demonstrates building a FastAPI server to receive real-time job completion notifications from Firecrawl, enabling asynchronous processing and analysis of scraped data, such as comparing Stripe and Paddle pricing with an LLM.

7mOct 31, 2024

Member

Lesson 8

Evaluate several software solutions at once with Firecrawl's crawl and LLM for judging

This lesson teaches you how to rapidly evaluate software vendors using Python, leveraging the Firecrawl API to automate web scraping and either Groq or OpenAI to analyze gathered data against predefined criteria (like SOC 2 compliance and F500 testimonials). The process dramatically accelerates the software selection process compared to manual research, focusing on platforms like Drata, Vanta, and Secureframe.

9mNov 1, 2024

Member