Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TecharoHq/Anubis/llms.txt

Use this file to discover all available pages before exploring further.

Bot rules are the core of Anubis policy configuration. They define how to identify and respond to different types of traffic.

Rule Structure

A bot rule consists of matchers and an action:
bots:
  - name: amazonbot
    user_agent_regex: Amazonbot
    action: DENY

Required Fields

FieldTypeDescription
namestringUnique identifier for the rule
actionstringAction to take: ALLOW, DENY, CHALLENGE, WEIGH

Matchers

Rules must include at least one matcher:
MatcherTypeDescription
user_agent_regexregex stringMatch User-Agent header
path_regexregex stringMatch request path
headers_regexmap[string]regexMatch arbitrary headers
remote_addresses[]CIDRMatch client IP ranges
expressionCEL expressionAdvanced matching logic

Matcher Types

User Agent Matching

- name: googlebot
  user_agent_regex: "Googlebot"
  action: ALLOW

Path Matching

- name: api-endpoints
  path_regex: "^/api/.*$"
  action: ALLOW

Header Matching

- name: cloudflare-workers
  headers_regex:
    CF-Worker: ".*"
  action: DENY

IP Range Matching

- name: internal-network
  action: ALLOW
  remote_addresses:
    - 10.0.0.0/8
    - 192.168.0.0/16
    - 100.64.0.0/10

Combined Matching

Combine IP ranges with other matchers:
- name: qwantbot
  user_agent_regex: "\\+https\\://help\\.qwant\\.com/bot/"
  action: ALLOW
  remote_addresses:
    - 91.242.162.0/24

CEL Expressions

For advanced matching, use Common Expression Language (CEL) expressions:

Single Expression

- name: no-user-agent
  action: DENY
  expression: userAgent == ""

Multiple Conditions (all)

All conditions must be true:
- name: api-json-requests
  action: ALLOW
  expression:
    all:
      - '"Accept" in headers'
      - 'headers["Accept"] == "application/json"'
      - 'path.startsWith("/api/")'

Multiple Conditions (any)

At least one condition must be true:
- name: banned-ips
  action: DENY
  expression:
    any:
      - remoteAddress == "8.8.8.8"
      - remoteAddress == "1.1.1.1"

Available Variables

VariableTypeExample
remoteAddressstring"1.2.3.4"
userAgentstring"Mozilla/5.0..."
pathstring"/api/users"
methodstring"GET", "POST"
hoststring"example.com"
headersmap[string]string{"User-Agent": "..."}
querymap[string]string{"page": "1"}
contentLengthint641024
load_1mdouble2.5 (system load average)
load_5mdouble3.1
load_15mdouble2.8

DNS Functions

# Verify Forward-Confirmed Reverse DNS
- name: require-fcrdns
  action: DENY
  expression: "!verifyFCrDNS(remoteAddress)"

# Check reverse DNS pattern
- name: googlebot
  action: ALLOW
  expression:
    all:
      - 'userAgent.matches("Googlebot")'
      - 'verifyFCrDNS(remoteAddress, "\\.googlebot\\.com$")'
Available DNS functions:
  • reverseDNS(ip) - Get PTR records
  • lookupHost(hostname) - Get A/AAAA records
  • verifyFCrDNS(ip) - Verify FCrDNS
  • verifyFCrDNS(ip, pattern) - Verify FCrDNS with regex pattern
  • arpaReverseIP(ip) - Convert to ARPA notation

Helper Functions

# Check for missing headers
- name: old-chrome
  action: WEIGH
  weight:
    adjust: 10
  expression:
    all:
      - 'userAgent.matches("Chrome/[1-9][0-9]?\\.0\\.0\\.0")'
      - 'missingHeader(headers, "Sec-Ch-Ua")'

# Random behavior (use sparingly)
- name: deny-sometimes
  action: DENY
  expression: 'randInt(4) == 0'  # 25% chance

# Path segments
- name: deep-paths
  action: WEIGH
  weight:
    adjust: 5
  expression: 'size(segments(path)) > 5'

Rule Actions

ALLOW

Bypass all checks and forward to backend:
- name: health-check
  path_regex: "^/health$"
  action: ALLOW

DENY

Block with a deceptive success page:
- name: scrapers
  user_agent_regex: "(?i:scraper|crawler)"
  action: DENY

CHALLENGE

Present a proof-of-work challenge:
- name: browsers
  user_agent_regex: "Mozilla"
  action: CHALLENGE
  challenge:
    algorithm: fast
    difficulty: 2

WEIGH

Adjust request suspicion score:
# Remove suspicion
- name: session-cookie
  action: WEIGH
  expression: 'headers["Cookie"].contains("session=")'
  weight:
    adjust: -5

# Add suspicion
- name: high-load
  action: WEIGH
  expression: 'load_1m >= 10.0'
  weight:
    adjust: 20

Rule Evaluation Order

Rules are evaluated in the order they appear in the policy file. The first matching rule determines the action.
bots:
  # Specific rules first
  - name: googlebot-verified
    user_agent_regex: "Googlebot"
    expression: 'verifyFCrDNS(remoteAddress, "\\.googlebot\\.com$")'
    action: ALLOW
  
  # Generic rules last
  - name: all-bots
    user_agent_regex: "(?i:bot)"
    action: DENY

Weight-Based Rules

Weight rules accumulate. All matching WEIGH rules apply:
- name: has-session
  action: WEIGH
  expression: '"session_id" in headers["Cookie"]'
  weight:
    adjust: -10

- name: high-load
  action: WEIGH  
  expression: 'load_1m >= 8.0'
  weight:
    adjust: 15

# Final weight determines threshold action
thresholds:
  - name: low-suspicion
    expression: 'weight < 5'
    action: ALLOW
  
  - name: moderate-suspicion
    expression:
      all:
        - weight >= 5
        - weight < 15
    action: CHALLENGE
    challenge:
      algorithm: metarefresh
      difficulty: 1

Regular Expression Syntax

Anubis uses Go’s regexp package (RE2 syntax):
# Case-insensitive
user_agent_regex: "(?i:bot|crawler|scraper)"

# Character classes
path_regex: "^/[a-z0-9]+$"

# Anchors
user_agent_regex: "^curl/"  # Starts with
path_regex: "\\.json$"       # Ends with

# Escaping special characters
user_agent_regex: "example\\.com"  # Literal dot
Test regex at regex101.com (select Golang flavor).

Common Patterns

Allow Static Assets

- name: static-assets
  path_regex: "\\.(css|js|jpg|png|gif|svg|woff2?)$"
  action: ALLOW

Block Known Bad Actors

- name: scrapers
  user_agent_regex: "(?i:scraper|download|extract|harvest)"
  action: DENY

Protect POST Endpoints

- name: post-requests
  action: CHALLENGE
  expression:
    all:
      - 'method == "POST"'
      - '!verifyFCrDNS(remoteAddress)'
  challenge:
    algorithm: fast
    difficulty: 3

Dynamic Load Protection

- name: high-load-stricter
  action: WEIGH
  expression: 'load_1m >= 16.0'
  weight:
    adjust: 30

- name: low-load-lenient
  action: WEIGH
  expression: 'load_15m <= 2.0'
  weight:
    adjust: -15

Best Practices

  1. Order matters: Place specific ALLOW rules before generic DENY rules
  2. Test expressions: Use --debug-benchmark-js to test without blocking
  3. Use FCrDNS: Verify bot IP addresses with verifyFCrDNS()
  4. Prefer CHALLENGE over DENY: Legitimate users can solve challenges
  5. Monitor metrics: Track rule matches via Prometheus metrics
  6. Use weights: Build gradual suspicion instead of binary decisions

Generating Rules from robots.txt

Anubis includes the robots2policy tool to automatically convert robots.txt files into Anubis policy rules.

Usage

# Convert local robots.txt file
robots2policy -input robots.txt -output policy.yaml

# Convert from URL
robots2policy -input https://example.com/robots.txt -format json

# Read from stdin
curl https://example.com/robots.txt | robots2policy -input -

Options

FlagDefaultDescription
-input(required)Path to robots.txt file, URL, or - for stdin
-outputstdoutOutput file path or - for stdout
-formatyamlOutput format: yaml or json
-actionCHALLENGEDefault action for disallowed paths
-deny-user-agentsDENYAction for blocked user agents
-namerobots-txt-policyName for the generated policy
-crawl-delay-weight0Weight adjustment based on crawl-delay

Example Output

Input robots.txt:
User-agent: GPTBot
Disallow: /

User-agent: *
Crawl-delay: 10
Disallow: /admin/
Disallow: /api/private/
Generated policy:
bots:
  - name: robots-txt-gptbot
    action: DENY
    expression:
      all:
        - user_agent.contains("GPTBot")
  
  - name: robots-txt-admin
    action: CHALLENGE
    expression:
      all:
        - path.startsWith("/admin/")
  
  - name: robots-txt-api-private
    action: CHALLENGE
    expression:
      all:
        - path.startsWith("/api/private/")

Next Steps