Crawlability
Robots.txt, sitemaps, and crawl directives
Robots.txt, sitemaps, and crawl directives
Rules
Checks for sitemap URLs returning 4XX status codes
Lists all pages blocked from indexing for user audit
Checks for redirect chains on canonical URLs
Checks HTML document size against Googlebot crawl limits
Identifies pages blocked from search engine indexing
Detects conflicting signals between robots.txt and meta/headers
Checks for noindexed pages listed in sitemap
Checks that paginated pages have proper canonicals
Checks linked PDF sizes against Googlebot 64MB truncation limit
Detects multi-hop redirect chains that waste crawl budget
Detects conflicts between robots meta tags and robots.txt
Checks if robots.txt exists and is properly configured
Detects pages with rich result schema that are blocked from indexing
Checks for indexable pages that are not in the sitemap
Checks that all sitemap URLs belong to the expected domain
Checks if XML sitemap exists and is referenced in robots.txt
Validates sitemap structure and URL limits
Disable All Crawlability Rules
[rules]
disable = ["crawl/*"]