Filtering

--match=PATTERN / --ignore=PATTERN

Regex applied to every discovered URL before it's fetched. Each pattern has a 1-second timeout to prevent ReDoS.

# Only check internal links
deadfinder sitemap https://www.example.com/sitemap.xml \
  --match='^https://(www\.)?example\.com/'

# Skip media files
deadfinder url https://www.example.com \
  --ignore='\.(png|jpg|gif|webp|mp4)$'

Using both: --match is applied first, then --ignore.

--include30x

By default, 3xx redirects are treated as healthy (the destination is what matters). Enable this flag to mark them as dead too:

deadfinder url https://www.example.com --include30x

Use this when your policy is "redirects are technical debt" rather than "follow the redirect chain".

--limit=N

Cap the number of URLs scanned per invocation (useful for quick smoke tests of a large sitemap):

deadfinder sitemap https://www.example.com/sitemap.xml --limit=50

Applies to the input list (file lines, STDIN lines, or sitemap <loc> entries). Not to discovered child links on each page.

--concurrency=N / --timeout=N

Not filters per se, but the other knobs you'll reach for:

  • --concurrency=50 (default) — number of parallel workers.
  • --timeout=10 (default, seconds) — per-request connect + read timeout.

Ramp concurrency down on rate-limited targets; up on fast internal scans.