crawl
Crawl a website without running analysis
The crawl command crawls a website and stores the data without running audit rules. Use this to separate crawling from analysis, or to crawl first and analyze later.
Usage
squirrel crawl <url> [options]Arguments
| Argument | Description |
|---|---|
url | The URL to crawl (required) |
Options
| Option | Alias | Description | Default |
|---|---|---|---|
--max-pages | -m | Maximum pages to crawl | 500 |
--refresh | -r | Ignore cache, fetch all pages fresh | false |
--resume | Resume interrupted crawl | false |
Examples
Basic Crawl
squirrel crawl https://example.comCrawl More Pages
squirrel crawl https://example.com -m 1000Fresh Crawl (Ignore Cache)
squirrel crawl https://example.com --refreshResume Interrupted Crawl
squirrel crawl https://example.com --resumeCrawl Behavior
The crawl command:
- Fetches and stores HTML content for each page
- Extracts and follows internal links
- Respects robots.txt and sitemaps
- Deduplicates URLs automatically
- Caches page content locally
Output
Crawling: https://example.com
Max pages: 500
✓ Crawled 42 pages in 12.3s
Crawl ID: a7b3c2d1
After crawling, use squirrel analyze to run audit rules on the stored data.
Exit Codes
| Code | Meaning |
|---|---|
0 | Success |
1 | Error (invalid URL, crawl failed, etc.) |
Configuration
The crawl command respects settings from squirrel.toml:
[crawler]
max_pages = 100
delay_ms = 200
timeout_ms = 30000
include = ["/blog/*"]
exclude = ["/admin/*"]See Configuration for all options.
Workflow
# 1. Crawl the site
squirrel crawl https://example.com
# 2. Analyze the crawl
squirrel analyze
# 3. View the report
squirrel reportThis workflow is useful when:
- You want to crawl once and analyze multiple times
- Testing different rule configurations
- Crawling is slow and you want to iterate on analysis
Related
- analyze - Analyze stored crawl
- audit - Crawl + analyze in one command
- Configuration - Config file options