Lesson 0006 · What your crawler misses
The JS-Rendering Gap
Your page has two versions: the HTML the server sends, and the DOM after JavaScript runs. Half the bots only ever see the first one — and they decide whether you exist.
Recap from Lesson 0002: you ran crawl_audit.py and it said “crawlable + indexable.” But that tool fetches with urllib — it reads the raw response and never runs a line of JS. So does the page it blessed actually contain anything? This lesson finds the gap.
Google doesn’t index your page in one shot. It crawls the raw HTML first, then — separately, later, when its resources allow — a headless Chrome renders the page, runs the JavaScript, and re-indexes whatever the JS produced.[1] Rendering is a deferred queue, not part of the crawl. If your content only exists after JS, it’s invisible until that second wave — and to many bots, invisible forever.
render_gap.py on a page’s raw HTML vs its rendered DOM and get a per-signal diff — words, links, JSON-LD, headings — flagging exactly which ones exist only after JavaScript. That’s the content a no-JS bot never sees and Google sees late. Two waves, not one
Straight from Google’s JavaScript-SEO doc: pages are processed in phases, and rendering sits in its own queue.[1]
▲ Render is its own deferred queue — it can take seconds, or much longer.
Google does run your JavaScript “using a recent version of Chrome, similar to how your browser renders pages.”[2] The catch is when, and who else doesn’t.
Your audit tool is a no-JS bot — and so are most AI crawlers
crawl_audit.py from 0002 is exactly the kind of fetcher that sees wave-1 only:
# crawl_audit.py — the fetch is plain HTTP. No browser, no JS.
req = urllib.request.Request(url, headers={"User-Agent": "Googlebot"})
body = urllib.request.urlopen(req).read() # <- raw HTML only
So if your <title>, content, and canonical are injected by React after load, crawl_audit.py can’t see them — and it might still print PASS off a near-empty shell. The same blind spot hits the AEO side hard: standalone AI crawlers commonly fetch raw HTML and don’t execute JavaScript, and there’s no settled standard for controlling them.[3] Client-side content that Googlebot eventually renders may never reach an answer engine at all.
The HTML response already holds the text, links, and JSON-LD. JS only enhances.
Seen by wave-1 Googlebot, no-JS AI crawlers, your own audit — everyone, immediately.
✓ indexed now · citable nowRaw HTML is <div id="root"></div>. All content arrives via JS.
Invisible to no-JS bots; Googlebot indexes it only on the deferred render wave — if it renders cleanly.
✗ delayed for Google · missing for AEOThe tool: diff the two versions
You give render_gap.py two things — the raw HTML (it can fetch this, like 0002’s tool) and the rendered DOM (you capture it: DevTools → Elements → right-click <html> → Copy outerHTML). It extracts the same signals from each and flags any that jump from ≈0 to a lot:
# share of each signal present WITHOUT JS
visible = raw[key] / rendered[key] # 1.0 = fully server-rendered
js_dep = rendered[key] > 0 and visible < 0.5 # needs JS to exist
# headline keys on visible TEXT: >=90% server-rendered, <=10% client-rendered
- Self-check (offline):
python3 tools/render_gap.py --demo - Pick a page. Save its rendered DOM as
rendered.html(DevTools → Copy outerHTML), then:python3 tools/render_gap.py https://yoursite.com/page rendered.html - Test a known SPA (a React/Vue app route) the same way — watch every signal flag JS-DEP. That’s what a no-JS crawler sees.
$ python3 tools/render_gap.py https://app.example.com/blog/geo rendered.html Render gap — raw HTML (no JS) vs rendered DOM (after JS) ────────────────────────────────────────────── [FAIL] visible words no JS 0 · with JS 209 · visible 0% · JS-DEP [FAIL] a links no JS 0 · with JS 3 · visible 0% · JS-DEP [FAIL] JSON-LD blocks no JS 0 · with JS 1 · visible 0% · JS-DEP [FAIL] h1 headings no JS 0 · with JS 1 · visible 0% · JS-DEP [FAIL] h2 headings no JS 0 · with JS 1 · visible 0% · JS-DEP ────────────────────────────────────────────── VERDICT: CLIENT-SIDE RENDERED — the raw HTML is an empty shell. No-JS AI crawlers see ~nothing; Googlebot indexes it only on the deferred render wave.
schema_tool.py validates it, Googlebot may pick it up on wave 2, but a no-JS AI crawler never sees a single property.[3] Same story for client-rendered <title> and canonical: crawl_audit.py can wave a page through while the raw response is an empty shell. Passing the crawl/index gate ≠ the content is actually there. Put facts in the HTML the server sends. Ceiling to know: render_gap.py doesn’t run a browser — you supply the rendered DOM, the same way 0005’s tracker takes engine output you captured. Its “visible words” counts text nodes, not layout; a fully JS-built page that Google renders fine will still flag here, and that’s the point — it shows the no-JS view, which is one real audience, not a verdict on Googlebot.
Retrieval practice · no peeking
Mind the gap
Answer from memory — that effort is what makes it stick. One try each; pick before you read the others.
crawl_audit.py print PASS on a page with no visible content?render_gap.py call it, and what's the fix?The three phases (crawl → render → index) and why render is a deferred queue, from the engine itself. Pair it with the rendering note in <a href="https://developers.google.com/search/docs/fundamentals/how-search-works">"How Search Works"</a>, and the AI-crawler-control gap in <a href="/resources/"><code>RESOURCES.md</code></a>.