Lesson 0007 · Telling engines you changed

Sitemaps + IndexNow

Publishing a page is half the job. The other half is the ping: one mechanism every engine pulls on its own slow schedule, one that pushes instantly — to everyone except Google.

Recap from Lesson 0006: we made sure your content is actually in the HTML the server sends. Now the page exists and renders. But an engine still has to find out it exists, or changed. That discovery step is its own pipeline stage — and it’s the most automatable one in the whole course.

Two notification mechanisms, and the asymmetry is the lesson. A sitemap is passive: a list of your URLs the engine pulls when it feels like it — slow, but universal, Google included. IndexNow is active: you POST changed URLs and participating engines fetch within minutes, then share the ping with each other.^[2] The catch: Google doesn’t participate in IndexNow — it relies on its own crawl scheduling plus your sitemap.^[3] So a real publishing pipeline fires both.

Your win: sitemap_ping.py generates a spec-valid sitemap.xml from a URL list, validates an existing one against the real sitemaps.org limits, and builds a correct IndexNow payload (key + keyLocation + urlList) — the exact two calls a publish hook should make.

Who listens to what

Mechanism	Google	Bing / Yandex / Naver / Seznam	Speed
`sitemap.xml` (passive pull)	yes	yes	slow — on the engine’s schedule
IndexNow (active push)	no — ignores it^[3]	yes — shared across all^[2]	minutes

Why a builder cares about the engines Google snubs: several of them feed AI answers (Bing’s index sits behind Copilot and others), so an instant IndexNow push is partly an AEO discovery move, not just classic SEO.

sitemap.xml — the passive list

One file, ≤ 50,000 URLs and ≤ 50 MB uncompressed. Required tag: <loc>. Optional <lastmod> (W3C date).^[1]

All URLs must be on one host. Submit once via Search Console or robots.txt; re-fetched on Google's schedule.

✓ universal · ✗ slow, no "it changed!" signal

IndexNow — the active push

POST {host, key, keyLocation, urlList} to the endpoint, ≤ 10,000 URLs per call.^[2]

Key = 8–128 chars [a-zA-Z0-9-], hosted in a text file at your root so the engine can verify you own the site.

✓ instant + shared · ✗ Google not included

The tool: build, validate, push

Both halves are pure data transforms — perfect for a publish hook, and offline-testable. The IndexNow side validates everything before it would ever hit the network:

# IndexNow payload is built + checked offline; --send is the only network call
key matches  ^[A-Za-z0-9-]{8,128}$     # or it's rejected
all URLs share one host                # keyLocation must be on it too
<= 10,000 URLs per post
# sitemap side: well-formed, <=50k URLs, <=50MB, one host, valid lastmod

That generated sitemap.xml is plain spec-valid XML — one <loc> per URL, optional <lastmod>, all on one host:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/new-page</loc>
    <lastmod>2026-06-22</lastmod>
  </url>
</urlset>

…and the IndexNow dry run prints the exact POST body before any --send:

{
  "host": "yoursite.com",
  "key": "a1b2c3d4e5f6a7b8",
  "keyLocation": "https://yoursite.com/a1b2c3d4e5f6a7b8.txt",
  "urlList": ["https://yoursite.com/new-page"]
}

Do this now:

Self-check (offline): python3 tools/sitemap_ping.py --demo
Make a urls.txt (one URL per line, optional tab + lastmod), generate + validate: python3 tools/sitemap_ping.py gen urls.txt > sitemap.xml then python3 tools/sitemap_ping.py check sitemap.xml
Build an IndexNow push (dry run — prints the key-file step + the exact POST body): python3 tools/sitemap_ping.py indexnow https://yoursite.com/new-page --key <your-key>

$ python3 tools/sitemap_ping.py check sitemap.xml

Sitemap check
──────────────────────────────────────────────
[PASS] well-formed XML
[PASS] root is <urlset>
[PASS] every <url> has a <loc>  (412 urls)
[PASS] <= 50,000 URLs  (412)
[PASS] <= 50 MB uncompressed
[FAIL] all URLs share one host  (site.com, cdn.site.com)
[FAIL] lastmod dates valid (W3C)  (2 bad: ['19/06/2026'])
──────────────────────────────────────────────
VERDICT: 2 problem(s) — engines may reject it.

Three pipeline traps. (1) IndexNow ≠ Google. Pinging IndexNow and watching nothing happen in Google is expected — Google never took it.^[3] Google’s only push API is the Indexing API, limited to JobPosting and livestream pages; for everything else it’s sitemap + crawl schedule, with a manual “Request indexing” in Search Console. (2) The key file must actually be reachable at keyLocation and contain the key, or every push is rejected — same trust model as robots.txt. (3) A sitemap is discovery, not a ranking lever. Listing a URL gets it found; it does nothing for whether it ranks or gets cited. Don’t expect traffic from a sitemap alone.

Ceiling to know: sitemap_ping.py writes a single sitemap; past 50k URLs you need a sitemap index (a sitemap of sitemaps) — noted in the tool as the next upgrade. It defaults IndexNow to a dry run; --send is the one line that touches the network.

Retrieval practice · no peeking

Ping check

Answer from memory — that effort is what makes it stick. One try each; pick before you read the others.

Question 1 / 4

Which engine ignores IndexNow and relies on sitemaps + its own crawl schedule?

Question 2 / 4

What's the core difference between a sitemap and IndexNow?

Question 3 / 4

Why must the IndexNow key live in a file at your site root?

Question 4 / 4

Your single sitemap just passed 50,000 URLs. What does the spec require next?

Primary source — read this next (≈12 min)

"The Sitemap protocol" — sitemaps.org

The authoritative spec: tags, the 50,000-URL / 50 MB limits, W3C date format, single-host rule. Pair with Google's <a href="https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap">"Build and submit a sitemap"</a> for submission, and the <a href="https://www.indexnow.org/documentation">IndexNow protocol docs</a> (key + payload). Google's non-participation is tracked in <a href="/resources/">RESOURCES</a>.

Stuck or curious? This agent is your teacher. Ask it anything — “show me a real robots.txt”, “do Claude and Perplexity retrieve differently?” — followups are the fastest way to learn.