Skip to content

Access engine

For operators · running scans

Explanation

The access engine is the cluster of routing primitives that lets Adler reach sites a plain HTTP client can’t see. Together they’re how Adler flips an honest Uncertain(reason) into a real Found / NotFound on the hard subset of the registry — Cloudflare-walled, TLS-fingerprinted, geo-restricted, login-walled.

Every probe walks the same decision tree. Pre-flight checks fire first (username regex, session resolution); then the router picks a primary transport based on the site’s protection tags; an Uncertain reason that a browser could resolve triggers automatic escalation; an operator-policy Uncertain (geo / session / robots / etc.) is kept as-is.

flowchart TD
    Start([Probe a site]) --> Regex{regex_check
matches?} Regex -->|No| UnameU[Uncertain
username_not_allowed] Regex -->|Yes| Session{access.session
named?} Session -->|Yes, missing| SessU[Uncertain
session_required] Session -->|None or supplied| Bot{tagged
bot-protected?} Bot -->|Yes, browser ok| BrowserPath[Browser fetch] Bot -->|Yes, no browser| AlwaysU[Uncertain
always] Bot -->|No| TLS{protection
= tls-fingerprint?} TLS -->|Yes + impersonate built| ImpPath[Impersonate fetch
wreq + Chrome 134] TLS -->|No| Egress{access policy
satisfied by pool?} Egress -->|No matching egress| GeoU[Uncertain
geo_unavailable] Egress -->|Yes| HttpPath[HTTP fetch
through chosen egress] HttpPath --> Verdict{Verdict?} ImpPath --> Verdict BrowserPath --> Final([Verdict]) Verdict -->|Found / NotFound| Final Verdict -->|Uncertain CF / 429| Esc{escalation
budget left?} Verdict -->|Other Uncertain| Final Esc -->|No| Final Esc -->|Yes, browser ok| EscFetch[Escalated browser fetch] EscFetch --> Final

The primitives the tree refers to are documented per-section below.

When the cheap transport returns an Uncertain reason a browser would resolve (Cloudflare interstitial, 429-style rate-limit), the router automatically escalates to the browser backend if one is configured — bounded by a separate --escalation-budget.

Every outcome carries transport and escalations telemetry so the operator can see which path produced each verdict.

A small subset of sites — currently Instagram and Twitter (adler --list-tags shows the live count; the tag is kept narrow because every additional candidate either detects fine without a browser or is structurally unscrapable even with one) — serve a JavaScript login wall or a Cloudflare challenge to a plain HTTP request. They’re tagged bot-protected and on the raw HTTP path will always return Uncertain because the response looks identical for an existing account and a missing one.

With --browser-backend Adler routes those sites (and only those — everything else stays on the fast HTTP path) through a real headless Chrome that runs JS, accepts cookies, and returns the final post-render DOM. The same detection signals then apply, and a verdict becomes possible.

Two backends are supported, picked at the CLI:

FlagWhat it doesCostRequirements
--browser-backend localLaunches headless Chrome on your machine via chromiumoxideFreeChrome / Chromium installed locally
--browser-backend browserbaseOpens a remote session on Browserbase and connects over the CDP WebSocketPay per session-minute (≈ $0.05 / min)ADLER_BROWSERBASE_API_KEY and ADLER_BROWSERBASE_PROJECT_ID env vars. Drives CDP through a small in-tree async client (adler-core/src/browser/cdp.rs) — neither chromiumoxide nor headless_chrome could attach to Browserbase’s remote browser cleanly (issue #5), so we wrote our own.

Both reuse a single browser instance across all routed fetches for the scan, so cost / setup overhead is one-time.

Terminal window
# Local Chrome — pairs cleanly with --proxy (passed through as
# --proxy-server to the child process).
adler --browser-backend local --proxy socks5h://USER:PASS@HOST:PORT alice
# Cloud session with residential / mobile IP and anti-fingerprint baked in.
export ADLER_BROWSERBASE_API_KEY=bb_live_...
export ADLER_BROWSERBASE_PROJECT_ID=...
adler --browser-backend browserbase alice
# Cap browser-routed probes (default 50). Once exceeded, remaining
# bot-protected sites return Uncertain(browser_budget_exceeded).
adler --browser-backend browserbase --browser-budget 10 alice
# Disable for one run even if the env / a shell alias has it on.
adler --no-browser alice
  • Per-scan budget--browser-budget N caps how many browser fetches a single scan may consume. Default is 50, ≈ 5× the bot-protected subset of the registry, so the cap only ever fires if a flag is misconfigured.
  • No surprise routing — only sites tagged bot-protected are sent through the browser. Everything else is unaffected. Use adler --list-tags to see what’s tagged.
  • Privacy — the browserbase backend sends the URLs you scan to a third-party US-based service. The local backend doesn’t leave your machine (modulo whatever proxy you’ve configured Chrome to use).

Browser fetches are inherently 5–10× slower than raw HTTP and (for browserbase) cost real money. They’re the only way to detect accounts on the bot-protected subset, but on the rest of the registry they’d add latency for no recall gain — which is why routing is opt-in and tag-driven, not blanket.

The pre-tag routing above handles sites the registry has already marked as bot-protected. It can’t help with the long tail — sites that look like a normal HTTP target until the moment they sit behind a Cloudflare edge or a 429 rate-limit and return an interstitial page. Without help, those sites land in Uncertain(cloudflare_challenge | rate_limited) on every scan from the cheap path.

When a browser backend is configured, Adler watches for those escalation-worthy Uncertain reasons on the cheap path and automatically retries through the browser — flipping the verdict from Uncertain to Found / NotFound without the operator having to pre-tag the site. Each retry consumes one slot of a separate --escalation-budget (default 30), so a Cloudflare-walled long tail doesn’t quietly blow up your Browserbase bill.

Terminal window
adler --browser-backend local alice # escalation on, default budget 30
adler --browser-backend local --escalation-budget 50 alice
adler --browser-backend local --no-escalation alice # cheap-path verdicts only

Outcomes carry a transport field (http / impersonate / browser) and an escalations count (0 in the happy path, 1 when escalation fired) so downstream tools can tell which path produced each verdict. Sites that never escalate stay on the cheap, fast HTTP path; only the ones that hit a wall pay the browser-fetch cost.

Escalation only triggers on reasons a browser plausibly resolves — CloudflareChallenge and RateLimited. Operator-policy Uncertains (robots_disallowed, session_required, geo_unavailable, username_not_allowed, deadline / scheduler / captcha) are kept as-is so escalation doesn’t waste budget on hopeless cases.

Some sites only answer from a particular country, or block datacenter IP ranges. A site can declare what egress it needs via its access policy in the registry (a country and/or an IP type); --proxy-pool supplies the proxies that satisfy those requirements.

--proxy still routes everything through one proxy (the default egress). --proxy-pool is additive and only kicks in for sites whose access policy requires a specific egress — everything else keeps using the default. If a site needs an egress the pool can’t provide, it’s reported Uncertain(geo_unavailable) rather than fetched from the wrong place — a location you can’t reach is not evidence the account is absent.

The pool is a TOML file of [[egress]] entries:

pool.toml
[[egress]]
name = "pl-residential" # optional; needed for per-scan subset selection in --web
url = "socks5://user:pass@pl.example.com:1080"
country = "pl" # ISO-3166-1 alpha-2 (lowercased)
kind = "residential" # datacenter (default) | residential | mobile | tor
[[egress]]
name = "de-datacenter"
url = "http://de.example.com:8080"
country = "de"
# kind omitted → datacenter
Terminal window
adler --proxy-pool pool.toml alice

Bring your own proxies — Adler ships the routing, not the egress. The browser backend keeps its own egress (e.g. Browserbase’s residential IPs); --proxy-pool routes the raw-HTTP path.

When adler --web is running, the SPA can restrict a single scan to a subset of the pool by name since v0.11, without re-launching the server.

Sessions (reach login-walled sites) since v0.10

Section titled “Sessions (reach login-walled sites) since v0.10”

Some sites only show a profile to a logged-in user (Instagram, Threads, Reddit’s JSON). A site can declare access.session = "<name>" in the registry; --sessions <file> supplies that named session’s headers — your own (or a sock-puppet) account’s — applied to the site’s probe so it sees a real session instead of a login wall.

This is “use a real account”, not evasion: Adler doesn’t solve challenges or forge anything; you bring a session you’re entitled to. If a site names a session you didn’t supply, it’s reported Uncertain(session_required) rather than a login-wall false negative.

The file is TOML; each [name] table is a set of HTTP headers (copy them from your browser’s devtools):

sessions.toml
[ig]
Cookie = "sessionid=...; csrftoken=..."
X-IG-App-ID = "936619743392459"
[reddit]
Cookie = "reddit_session=..."
Terminal window
adler --sessions sessions.toml alice

Header values are secrets — redacted from logs, never written to scan output. Using a sock-puppet account may breach a site’s ToS; that’s an operator decision within your engagement’s scope.

Some sites read the TLS handshake’s JA3 / JA4 fingerprint and serve a block page to anything that doesn’t look like a real browser — rustls or reqwest’s default fingerprints are well-known and easy to filter. Sites tagged protection: tls-fingerprint in the registry declare this.

Build Adler with the impersonate feature to enable an in-process wreq HTTP client emulating Chrome 134 (BoringSSL handshake matches real Chrome’s JA3 / JA4 / HTTP-2 fingerprint). Sites whose protection is only TLS fingerprint then route through it — much cheaper than spinning up a real browser:

Terminal window
cargo install adler-cli --features impersonate

The feature pulls in BoringSSL and needs cmake, a C++ compiler, and libclang at build time:

  • Fedora: sudo dnf install cmake gcc-c++ clang
  • Debian / Ubuntu: sudo apt install cmake clang libclang-dev

cargo binstall adler-cli ships impersonate-enabled binaries for x86_64-linux, both macOS targets, and Windows. The aarch64-unknown-linux-gnu binary is built without the feature (cross-compiled BoringSSL toolchain isn’t wired up), so on aarch64 Linux use cargo install adler-cli --features impersonate instead.

Sites with mixed protections (e.g. tls-fingerprint + cloudflare) stay on the browser-backend path — impersonate alone won’t get past Cloudflare’s JS challenge.

Telemetry: transport and escalations on outcomes

Section titled “Telemetry: transport and escalations on outcomes”

Every outcome stamps which transport actually produced its verdict, so downstream tools (the doctor, the bench harness, the web UI, your own JSON consumers) can tell the difference between a Found that came back from raw HTTP and a Found that required a browser fetch.

sequenceDiagram
    participant R as Router
    participant H as HTTP
    participant B as Browser backend
    participant O as CheckOutcome
    participant U as Consumer
(SPA / JSON / bench) Note over R: Cheap transport first R->>H: fetch(site, headers) H-->>R: response alt Found / NotFound R->>O: stamp transport=http, escalations=0 else Uncertain(cloudflare / 429) Note over R: Escalation budget ok? R->>B: retry via browser B-->>R: response R->>O: stamp transport=browser, escalations=1 else Other Uncertain R->>O: stamp transport=http, reason end O-->>U: outcome with telemetry

Two concrete shapes:

{
"site": "GitHub",
"kind": "found",
"transport": "http",
"escalations": 0,
"elapsed_ms": 124
}
{
"site": "Patreon",
"kind": "found",
"transport": "browser",
"escalations": 1,
"reason": null,
"elapsed_ms": 980
}

In the second example, the cheap path returned Uncertain(cloudflare_challenge) and the router escalated to the browser backend; one escalation budget slot was consumed. Tools surfacing the field include the web UI (a small transport chip on each ResultRow), adler --explain, and the JSON / NDJSON output formats.