Bot Rules - Anubis

Bot rules are the core of Anubis policy configuration. They define how to identify and respond to different types of traffic.

Rule Structure

A bot rule consists of matchers and an action:

bots:
  - name: amazonbot
    user_agent_regex: Amazonbot
    action: DENY

Required Fields

Field	Type	Description
`name`	string	Unique identifier for the rule
`action`	string	Action to take: ALLOW, DENY, CHALLENGE, WEIGH

Matchers

Rules must include at least one matcher:

Matcher	Type	Description
`user_agent_regex`	regex string	Match User-Agent header
`path_regex`	regex string	Match request path
`headers_regex`	map[string]regex	Match arbitrary headers
`remote_addresses`	[]CIDR	Match client IP ranges
`expression`	CEL expression	Advanced matching logic

Matcher Types

User Agent Matching

- name: googlebot
  user_agent_regex: "Googlebot"
  action: ALLOW

Path Matching

- name: api-endpoints
  path_regex: "^/api/.*$"
  action: ALLOW

Header Matching

- name: cloudflare-workers
  headers_regex:
    CF-Worker: ".*"
  action: DENY

IP Range Matching

- name: internal-network
  action: ALLOW
  remote_addresses:
    - 10.0.0.0/8
    - 192.168.0.0/16
    - 100.64.0.0/10

Combined Matching

Combine IP ranges with other matchers:

- name: qwantbot
  user_agent_regex: "\\+https\\://help\\.qwant\\.com/bot/"
  action: ALLOW
  remote_addresses:
    - 91.242.162.0/24

CEL Expressions

For advanced matching, use Common Expression Language (CEL) expressions:

Single Expression

- name: no-user-agent
  action: DENY
  expression: userAgent == ""

Multiple Conditions (all)

All conditions must be true:

- name: api-json-requests
  action: ALLOW
  expression:
    all:
      - '"Accept" in headers'
      - 'headers["Accept"] == "application/json"'
      - 'path.startsWith("/api/")'

Multiple Conditions (any)

At least one condition must be true:

- name: banned-ips
  action: DENY
  expression:
    any:
      - remoteAddress == "8.8.8.8"
      - remoteAddress == "1.1.1.1"

Available Variables

Variable	Type	Example
`remoteAddress`	string	`"1.2.3.4"`
`userAgent`	string	`"Mozilla/5.0..."`
`path`	string	`"/api/users"`
`method`	string	`"GET"`, `"POST"`
`host`	string	`"example.com"`
`headers`	map[string]string	`{"User-Agent": "..."}`
`query`	map[string]string	`{"page": "1"}`
`contentLength`	int64	`1024`
`load_1m`	double	`2.5` (system load average)
`load_5m`	double	`3.1`
`load_15m`	double	`2.8`

DNS Functions

# Verify Forward-Confirmed Reverse DNS
- name: require-fcrdns
  action: DENY
  expression: "!verifyFCrDNS(remoteAddress)"

# Check reverse DNS pattern
- name: googlebot
  action: ALLOW
  expression:
    all:
      - 'userAgent.matches("Googlebot")'
      - 'verifyFCrDNS(remoteAddress, "\\.googlebot\\.com$")'

Available DNS functions:

reverseDNS(ip) - Get PTR records
lookupHost(hostname) - Get A/AAAA records
verifyFCrDNS(ip) - Verify FCrDNS
verifyFCrDNS(ip, pattern) - Verify FCrDNS with regex pattern
arpaReverseIP(ip) - Convert to ARPA notation

Helper Functions

# Check for missing headers
- name: old-chrome
  action: WEIGH
  weight:
    adjust: 10
  expression:
    all:
      - 'userAgent.matches("Chrome/[1-9][0-9]?\\.0\\.0\\.0")'
      - 'missingHeader(headers, "Sec-Ch-Ua")'

# Random behavior (use sparingly)
- name: deny-sometimes
  action: DENY
  expression: 'randInt(4) == 0'  # 25% chance

# Path segments
- name: deep-paths
  action: WEIGH
  weight:
    adjust: 5
  expression: 'size(segments(path)) > 5'

Rule Actions

ALLOW

Bypass all checks and forward to backend:

- name: health-check
  path_regex: "^/health$"
  action: ALLOW

DENY

Block with a deceptive success page:

- name: scrapers
  user_agent_regex: "(?i:scraper|crawler)"
  action: DENY

CHALLENGE

Present a proof-of-work challenge:

- name: browsers
  user_agent_regex: "Mozilla"
  action: CHALLENGE
  challenge:
    algorithm: fast
    difficulty: 2

WEIGH

Adjust request suspicion score:

# Remove suspicion
- name: session-cookie
  action: WEIGH
  expression: 'headers["Cookie"].contains("session=")'
  weight:
    adjust: -5

# Add suspicion
- name: high-load
  action: WEIGH
  expression: 'load_1m >= 10.0'
  weight:
    adjust: 20

Rule Evaluation Order

Rules are evaluated in the order they appear in the policy file. The first matching rule determines the action.

bots:
  # Specific rules first
  - name: googlebot-verified
    user_agent_regex: "Googlebot"
    expression: 'verifyFCrDNS(remoteAddress, "\\.googlebot\\.com$")'
    action: ALLOW
  
  # Generic rules last
  - name: all-bots
    user_agent_regex: "(?i:bot)"
    action: DENY

Weight-Based Rules

Weight rules accumulate. All matching WEIGH rules apply:

- name: has-session
  action: WEIGH
  expression: '"session_id" in headers["Cookie"]'
  weight:
    adjust: -10

- name: high-load
  action: WEIGH  
  expression: 'load_1m >= 8.0'
  weight:
    adjust: 15

# Final weight determines threshold action
thresholds:
  - name: low-suspicion
    expression: 'weight < 5'
    action: ALLOW
  
  - name: moderate-suspicion
    expression:
      all:
        - weight >= 5
        - weight < 15
    action: CHALLENGE
    challenge:
      algorithm: metarefresh
      difficulty: 1

Regular Expression Syntax

Anubis uses Go’s regexp package (RE2 syntax):

# Case-insensitive
user_agent_regex: "(?i:bot|crawler|scraper)"

# Character classes
path_regex: "^/[a-z0-9]+$"

# Anchors
user_agent_regex: "^curl/"  # Starts with
path_regex: "\\.json$"       # Ends with

# Escaping special characters
user_agent_regex: "example\\.com"  # Literal dot

Test regex at regex101.com (select Golang flavor).

Common Patterns

Allow Static Assets

- name: static-assets
  path_regex: "\\.(css|js|jpg|png|gif|svg|woff2?)$"
  action: ALLOW

Block Known Bad Actors

- name: scrapers
  user_agent_regex: "(?i:scraper|download|extract|harvest)"
  action: DENY

Protect POST Endpoints

- name: post-requests
  action: CHALLENGE
  expression:
    all:
      - 'method == "POST"'
      - '!verifyFCrDNS(remoteAddress)'
  challenge:
    algorithm: fast
    difficulty: 3

Dynamic Load Protection

- name: high-load-stricter
  action: WEIGH
  expression: 'load_1m >= 16.0'
  weight:
    adjust: 30

- name: low-load-lenient
  action: WEIGH
  expression: 'load_15m <= 2.0'
  weight:
    adjust: -15

Best Practices

Order matters: Place specific ALLOW rules before generic DENY rules
Test expressions: Use --debug-benchmark-js to test without blocking
Use FCrDNS: Verify bot IP addresses with verifyFCrDNS()
Prefer CHALLENGE over DENY: Legitimate users can solve challenges
Monitor metrics: Track rule matches via Prometheus metrics
Use weights: Build gradual suspicion instead of binary decisions

Generating Rules from robots.txt

Anubis includes the robots2policy tool to automatically convert robots.txt files into Anubis policy rules.

Usage

# Convert local robots.txt file
robots2policy -input robots.txt -output policy.yaml

# Convert from URL
robots2policy -input https://example.com/robots.txt -format json

# Read from stdin
curl https://example.com/robots.txt | robots2policy -input -

Options

Flag	Default	Description
`-input`	(required)	Path to robots.txt file, URL, or `-` for stdin
`-output`	stdout	Output file path or `-` for stdout
`-format`	`yaml`	Output format: `yaml` or `json`
`-action`	`CHALLENGE`	Default action for disallowed paths
`-deny-user-agents`	`DENY`	Action for blocked user agents
`-name`	`robots-txt-policy`	Name for the generated policy
`-crawl-delay-weight`	`0`	Weight adjustment based on crawl-delay

Example Output

Input robots.txt:

User-agent: GPTBot
Disallow: /

User-agent: *
Crawl-delay: 10
Disallow: /admin/
Disallow: /api/private/

Generated policy:

bots:
  - name: robots-txt-gptbot
    action: DENY
    expression:
      all:
        - user_agent.contains("GPTBot")
  
  - name: robots-txt-admin
    action: CHALLENGE
    expression:
      all:
        - path.startsWith("/admin/")
  
  - name: robots-txt-api-private
    action: CHALLENGE
    expression:
      all:
        - path.startsWith("/api/private/")

Next Steps

Challenge Configuration - Configure proof-of-work settings
Policy Configuration - Complete policy file structure

Documentation Index

​Rule Structure

​Required Fields

​Matchers

​Matcher Types

​User Agent Matching

​Path Matching

​Header Matching

​IP Range Matching

​Combined Matching

​CEL Expressions

​Single Expression

​Multiple Conditions (all)

​Multiple Conditions (any)

​Available Variables

​DNS Functions

​Helper Functions

​Rule Actions

​ALLOW

​DENY

​CHALLENGE

​WEIGH

​Rule Evaluation Order

​Weight-Based Rules

​Regular Expression Syntax

​Common Patterns

​Allow Static Assets

​Block Known Bad Actors

​Protect POST Endpoints

​Dynamic Load Protection

​Best Practices

​Generating Rules from robots.txt

​Usage

​Options

​Example Output

​Next Steps