Crawlability

Robots.txt, sitemaps, and crawl directives

Rules

Checks for sitemap URLs returning 4XX status codes

Lists all pages blocked from indexing for user audit

Checks for redirect chains on canonical URLs

Checks HTML document size against Googlebot crawl limits

Identifies pages blocked from search engine indexing

Detects conflicting signals between robots.txt and meta/headers

Checks for noindexed pages listed in sitemap

Checks that paginated pages have proper canonicals

Checks linked PDF sizes against Googlebot 64MB truncation limit

Detects multi-hop redirect chains that waste crawl budget

Detects conflicts between robots meta tags and robots.txt

Checks if robots.txt exists and is properly configured

Detects pages with rich result schema that are blocked from indexing

Checks for indexable pages that are not in the sitemap

Checks that all sitemap URLs belong to the expected domain

Checks if XML sitemap exists and is referenced in robots.txt

Validates sitemap structure and URL limits

squirrel.toml

toml

[rules]
disable = ["crawl/*"]