Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TecharoHq/Anubis/llms.txt
Use this file to discover all available pages before exploring further.
Bot rules are the core of Anubis policy configuration. They define how to identify and respond to different types of traffic.
Rule Structure
A bot rule consists of matchers and an action:
bots:
- name: amazonbot
user_agent_regex: Amazonbot
action: DENY
Required Fields
| Field | Type | Description |
|---|
name | string | Unique identifier for the rule |
action | string | Action to take: ALLOW, DENY, CHALLENGE, WEIGH |
Matchers
Rules must include at least one matcher:
| Matcher | Type | Description |
|---|
user_agent_regex | regex string | Match User-Agent header |
path_regex | regex string | Match request path |
headers_regex | map[string]regex | Match arbitrary headers |
remote_addresses | []CIDR | Match client IP ranges |
expression | CEL expression | Advanced matching logic |
Matcher Types
User Agent Matching
- name: googlebot
user_agent_regex: "Googlebot"
action: ALLOW
Path Matching
- name: api-endpoints
path_regex: "^/api/.*$"
action: ALLOW
- name: cloudflare-workers
headers_regex:
CF-Worker: ".*"
action: DENY
IP Range Matching
- name: internal-network
action: ALLOW
remote_addresses:
- 10.0.0.0/8
- 192.168.0.0/16
- 100.64.0.0/10
Combined Matching
Combine IP ranges with other matchers:
- name: qwantbot
user_agent_regex: "\\+https\\://help\\.qwant\\.com/bot/"
action: ALLOW
remote_addresses:
- 91.242.162.0/24
CEL Expressions
For advanced matching, use Common Expression Language (CEL) expressions:
Single Expression
- name: no-user-agent
action: DENY
expression: userAgent == ""
Multiple Conditions (all)
All conditions must be true:
- name: api-json-requests
action: ALLOW
expression:
all:
- '"Accept" in headers'
- 'headers["Accept"] == "application/json"'
- 'path.startsWith("/api/")'
Multiple Conditions (any)
At least one condition must be true:
- name: banned-ips
action: DENY
expression:
any:
- remoteAddress == "8.8.8.8"
- remoteAddress == "1.1.1.1"
Available Variables
| Variable | Type | Example |
|---|
remoteAddress | string | "1.2.3.4" |
userAgent | string | "Mozilla/5.0..." |
path | string | "/api/users" |
method | string | "GET", "POST" |
host | string | "example.com" |
headers | map[string]string | {"User-Agent": "..."} |
query | map[string]string | {"page": "1"} |
contentLength | int64 | 1024 |
load_1m | double | 2.5 (system load average) |
load_5m | double | 3.1 |
load_15m | double | 2.8 |
DNS Functions
# Verify Forward-Confirmed Reverse DNS
- name: require-fcrdns
action: DENY
expression: "!verifyFCrDNS(remoteAddress)"
# Check reverse DNS pattern
- name: googlebot
action: ALLOW
expression:
all:
- 'userAgent.matches("Googlebot")'
- 'verifyFCrDNS(remoteAddress, "\\.googlebot\\.com$")'
Available DNS functions:
reverseDNS(ip) - Get PTR records
lookupHost(hostname) - Get A/AAAA records
verifyFCrDNS(ip) - Verify FCrDNS
verifyFCrDNS(ip, pattern) - Verify FCrDNS with regex pattern
arpaReverseIP(ip) - Convert to ARPA notation
Helper Functions
# Check for missing headers
- name: old-chrome
action: WEIGH
weight:
adjust: 10
expression:
all:
- 'userAgent.matches("Chrome/[1-9][0-9]?\\.0\\.0\\.0")'
- 'missingHeader(headers, "Sec-Ch-Ua")'
# Random behavior (use sparingly)
- name: deny-sometimes
action: DENY
expression: 'randInt(4) == 0' # 25% chance
# Path segments
- name: deep-paths
action: WEIGH
weight:
adjust: 5
expression: 'size(segments(path)) > 5'
Rule Actions
ALLOW
Bypass all checks and forward to backend:
- name: health-check
path_regex: "^/health$"
action: ALLOW
DENY
Block with a deceptive success page:
- name: scrapers
user_agent_regex: "(?i:scraper|crawler)"
action: DENY
CHALLENGE
Present a proof-of-work challenge:
- name: browsers
user_agent_regex: "Mozilla"
action: CHALLENGE
challenge:
algorithm: fast
difficulty: 2
WEIGH
Adjust request suspicion score:
# Remove suspicion
- name: session-cookie
action: WEIGH
expression: 'headers["Cookie"].contains("session=")'
weight:
adjust: -5
# Add suspicion
- name: high-load
action: WEIGH
expression: 'load_1m >= 10.0'
weight:
adjust: 20
Rule Evaluation Order
Rules are evaluated in the order they appear in the policy file. The first matching rule determines the action.
bots:
# Specific rules first
- name: googlebot-verified
user_agent_regex: "Googlebot"
expression: 'verifyFCrDNS(remoteAddress, "\\.googlebot\\.com$")'
action: ALLOW
# Generic rules last
- name: all-bots
user_agent_regex: "(?i:bot)"
action: DENY
Weight-Based Rules
Weight rules accumulate. All matching WEIGH rules apply:
- name: has-session
action: WEIGH
expression: '"session_id" in headers["Cookie"]'
weight:
adjust: -10
- name: high-load
action: WEIGH
expression: 'load_1m >= 8.0'
weight:
adjust: 15
# Final weight determines threshold action
thresholds:
- name: low-suspicion
expression: 'weight < 5'
action: ALLOW
- name: moderate-suspicion
expression:
all:
- weight >= 5
- weight < 15
action: CHALLENGE
challenge:
algorithm: metarefresh
difficulty: 1
Regular Expression Syntax
Anubis uses Go’s regexp package (RE2 syntax):
# Case-insensitive
user_agent_regex: "(?i:bot|crawler|scraper)"
# Character classes
path_regex: "^/[a-z0-9]+$"
# Anchors
user_agent_regex: "^curl/" # Starts with
path_regex: "\\.json$" # Ends with
# Escaping special characters
user_agent_regex: "example\\.com" # Literal dot
Test regex at regex101.com (select Golang flavor).
Common Patterns
Allow Static Assets
- name: static-assets
path_regex: "\\.(css|js|jpg|png|gif|svg|woff2?)$"
action: ALLOW
Block Known Bad Actors
- name: scrapers
user_agent_regex: "(?i:scraper|download|extract|harvest)"
action: DENY
Protect POST Endpoints
- name: post-requests
action: CHALLENGE
expression:
all:
- 'method == "POST"'
- '!verifyFCrDNS(remoteAddress)'
challenge:
algorithm: fast
difficulty: 3
Dynamic Load Protection
- name: high-load-stricter
action: WEIGH
expression: 'load_1m >= 16.0'
weight:
adjust: 30
- name: low-load-lenient
action: WEIGH
expression: 'load_15m <= 2.0'
weight:
adjust: -15
Best Practices
- Order matters: Place specific ALLOW rules before generic DENY rules
- Test expressions: Use
--debug-benchmark-js to test without blocking
- Use FCrDNS: Verify bot IP addresses with
verifyFCrDNS()
- Prefer CHALLENGE over DENY: Legitimate users can solve challenges
- Monitor metrics: Track rule matches via Prometheus metrics
- Use weights: Build gradual suspicion instead of binary decisions
Generating Rules from robots.txt
Anubis includes the robots2policy tool to automatically convert robots.txt files into Anubis policy rules.
Usage
# Convert local robots.txt file
robots2policy -input robots.txt -output policy.yaml
# Convert from URL
robots2policy -input https://example.com/robots.txt -format json
# Read from stdin
curl https://example.com/robots.txt | robots2policy -input -
Options
| Flag | Default | Description |
|---|
-input | (required) | Path to robots.txt file, URL, or - for stdin |
-output | stdout | Output file path or - for stdout |
-format | yaml | Output format: yaml or json |
-action | CHALLENGE | Default action for disallowed paths |
-deny-user-agents | DENY | Action for blocked user agents |
-name | robots-txt-policy | Name for the generated policy |
-crawl-delay-weight | 0 | Weight adjustment based on crawl-delay |
Example Output
Input robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: *
Crawl-delay: 10
Disallow: /admin/
Disallow: /api/private/
Generated policy:
bots:
- name: robots-txt-gptbot
action: DENY
expression:
all:
- user_agent.contains("GPTBot")
- name: robots-txt-admin
action: CHALLENGE
expression:
all:
- path.startsWith("/admin/")
- name: robots-txt-api-private
action: CHALLENGE
expression:
all:
- path.startsWith("/api/private/")
Next Steps