Lesson 0006 · What your crawler misses

The JS-Rendering Gap

Your page has two versions: the HTML the server sends, and the DOM after JavaScript runs. Half the bots only ever see the first one — and they decide whether you exist.

Recap from Lesson 0002: you ran crawl_audit.py and it said “crawlable + indexable.” But that tool fetches with urllib — it reads the raw response and never runs a line of JS. So does the page it blessed actually contain anything? This lesson finds the gap.

Google doesn’t index your page in one shot. It crawls the raw HTML first, then — separately, later, when its resources allow — a headless Chrome renders the page, runs the JavaScript, and re-indexes whatever the JS produced.^[1] Rendering is a deferred queue, not part of the crawl. If your content only exists after JS, it’s invisible until that second wave — and to many bots, invisible forever.

Your win: run render_gap.py on a page’s raw HTML vs its rendered DOM and get a per-signal diff — words, links, JSON-LD, headings — flagging exactly which ones exist only after JavaScript. That’s the content a no-JS bot never sees and Google sees late.

Two waves, not one

Straight from Google’s JavaScript-SEO doc: pages are processed in phases, and rendering sits in its own queue.^[1]

Google's pipeline · two waves

Crawl wave 1 — fetch raw HTML, index it. No JS yet.

→

Render deferred queue — headless Chrome runs the JS when resources allow

→

Re-index wave 2 — JS-produced content, links + schema get indexed, if render succeeded

▲ Render is its own deferred queue — it can take seconds, or much longer.

Google does run your JavaScript “using a recent version of Chrome, similar to how your browser renders pages.”^[2] The catch is when, and who else doesn’t.

Your audit tool is a no-JS bot — and so are most AI crawlers

crawl_audit.py from 0002 is exactly the kind of fetcher that sees wave-1 only:

# crawl_audit.py — the fetch is plain HTTP. No browser, no JS.
req = urllib.request.Request(url, headers={"User-Agent": "Googlebot"})
body = urllib.request.urlopen(req).read()   # <- raw HTML only

So if your <title>, content, and canonical are injected by React after load, crawl_audit.py can’t see them — and it might still print PASS off a near-empty shell. The same blind spot hits the AEO side hard: standalone AI crawlers commonly fetch raw HTML and don’t execute JavaScript, and there’s no settled standard for controlling them.^[3] Client-side content that Googlebot eventually renders may never reach an answer engine at all.

Server-rendered (SSR / SSG) — content in the first byte

The HTML response already holds the text, links, and JSON-LD. JS only enhances.

Seen by wave-1 Googlebot, no-JS AI crawlers, your own audit — everyone, immediately.

✓ indexed now · citable now

Client-rendered (CSR) — empty shell + bundle

Raw HTML is <div id="root"></div>. All content arrives via JS.

Invisible to no-JS bots; Googlebot indexes it only on the deferred render wave — if it renders cleanly.

✗ delayed for Google · missing for AEO

The tool: diff the two versions

You give render_gap.py two things — the raw HTML (it can fetch this, like 0002’s tool) and the rendered DOM (you capture it: DevTools → Elements → right-click <html> → Copy outerHTML). It extracts the same signals from each and flags any that jump from ≈0 to a lot:

# share of each signal present WITHOUT JS
visible = raw[key] / rendered[key]          # 1.0 = fully server-rendered
js_dep  = rendered[key] > 0 and visible < 0.5   # needs JS to exist
# headline keys on visible TEXT: >=90% server-rendered, <=10% client-rendered

Do this now:

Self-check (offline): python3 tools/render_gap.py --demo
Pick a page. Save its rendered DOM as rendered.html (DevTools → Copy outerHTML), then: python3 tools/render_gap.py https://yoursite.com/page rendered.html
Test a known SPA (a React/Vue app route) the same way — watch every signal flag JS-DEP. That’s what a no-JS crawler sees.

$ python3 tools/render_gap.py https://app.example.com/blog/geo rendered.html

Render gap — raw HTML (no JS)  vs  rendered DOM (after JS)
──────────────────────────────────────────────
[FAIL] visible words      no JS 0 · with JS 209 · visible 0% · JS-DEP
[FAIL] a links            no JS 0 · with JS 3   · visible 0% · JS-DEP
[FAIL] JSON-LD blocks     no JS 0 · with JS 1   · visible 0% · JS-DEP
[FAIL] h1 headings        no JS 0 · with JS 1   · visible 0% · JS-DEP
[FAIL] h2 headings        no JS 0 · with JS 1   · visible 0% · JS-DEP
──────────────────────────────────────────────
VERDICT: CLIENT-SIDE RENDERED — the raw HTML is an empty shell. No-JS AI crawlers see ~nothing; Googlebot indexes it only on the deferred render wave.

The JSON-LD trap: your perfect schema from 0003 can be invisible. If you inject structured data through Google Tag Manager or client-side JS, it lives only in the rendered DOM — so schema_tool.py validates it, Googlebot may pick it up on wave 2, but a no-JS AI crawler never sees a single property.^[3] Same story for client-rendered <title> and canonical: crawl_audit.py can wave a page through while the raw response is an empty shell. Passing the crawl/index gate ≠ the content is actually there. Put facts in the HTML the server sends.

Ceiling to know: render_gap.py doesn’t run a browser — you supply the rendered DOM, the same way 0005’s tracker takes engine output you captured. Its “visible words” counts text nodes, not layout; a fully JS-built page that Google renders fine will still flag here, and that’s the point — it shows the no-JS view, which is one real audience, not a verdict on Googlebot.

Retrieval practice · no peeking

Mind the gap

Answer from memory — that effort is what makes it stick. One try each; pick before you read the others.

Question 1 / 4

In Google's pipeline, when does your JavaScript actually run?

Question 2 / 4

Why can crawl_audit.py print PASS on a page with no visible content?

Question 3 / 4

Why is client-side rendering worse for AEO than for classic SEO?

Question 4 / 4

A signal jumps from 0 (raw) to many (rendered). What does render_gap.py call it, and what's the fix?

Primary source — read this next (≈10 min)

"Understand the JavaScript SEO basics" — Google Search Central

The three phases (crawl → render → index) and why render is a deferred queue, from the engine itself. Pair it with the rendering note in <a href="https://developers.google.com/search/docs/fundamentals/how-search-works">"How Search Works"</a>, and the AI-crawler-control gap in <a href="/resources/"><code>RESOURCES.md</code></a>.

Stuck or curious? This agent is your teacher. Ask it anything — “show me a real robots.txt”, “do Claude and Perplexity retrieve differently?” — followups are the fastest way to learn.