Crawling
Crawl mode lets you systematically browse and extract content from multiple pages of a website starting from a single URL.
Options
You can customize the extraction process with the following options:
- AI Refinement: Uses a generative AI model to clean up the extracted Markdown. This can help fix formatting issues, remove redundant whitespace, and improve overall readability.
- No Cache Mode: Forces the scraper to re-fetch the content from the website, ignoring any previously cached versions.
- Custom Body Selector: Specify a CSS selector (e.g.,
main,#content,.article-body) to target a specific part of the page for extraction. This is useful for noisy pages where you only want the main content.
Crawl-Specific Options
- Crawl Max Pages: Limits the total number of pages the crawler will process.
- Crawl Depth: Defines how many "clicks" away from the starting page the crawler is allowed to go. A depth of 1 will only crawl pages directly linked from the start page.
Results
After an extraction job is complete, the results are displayed in a tabbed interface.
- URL List: Shows a list of all processed URLs and their status. You can click on any URL to view its content.
- Preview: Renders the extracted Markdown as formatted text, giving you a clean reading experience.
- Code: Shows the raw Markdown code, which you can easily copy.
You can use the Copy and Download buttons to save the content of the currently viewed result.