What Google Sees: A Deep Dive into JSON-LD Metadata Harvesting

ysskrishna
3 min read

You're optimizing for readers. Google isn't one of them.

Headlines, layout, type, and copy are what people see. Under that, there is another layer: machine-readable data in the HTML. Two pages can read the same to a human and still rank differently if one explains itself more clearly to crawlers.

That explanation often shows up as JSON-LD.

The Invisible Layer Behind Every Page

You get layout, typography, images, and body text. Under it, the document still carries a second description: structured data.

JSON-LD (JavaScript Object Notation for Linked Data) is the common way sites tell search engines what a page is—often alongside a schema.org vocabulary in @context. It does not change the visual page. It changes what gets filed as fact instead of guesswork. Google’s overview is in Understand how structured data works (Search Central).

What Google Actually Reads

Roughly what a crawler pulls from a typical blog post:

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Move Windows Between Monitors on macOS Without Dragging",
  "author": {
    "@type": "Person",
    "name": "ysskrishna"
  },
  "datePublished": "2026-04-23T00:00:00.000Z"
}

That block says: blog post, author, publish time. Without it, the engine infers. With it, those fields are explicit.

Metadata Harvesting: What's Actually Happening

A crawl is not only reading visible text. For JSON-LD and similar, something like this runs:

  1. Download HTML
  2. Find structured data (including JSON-LD in script tags)
  3. Pull out entities and links between them
  4. Feed that into how results and rich features are built

Rich snippets and article-style results lean on this. You are not stuck guessing: you can pull the same JSON your page serves and read it.

Try It Yourself (No Setup Needed)

These one-liners run in a shell. No clone, no install.

Python

Standalone gist: Python JSON-LD fetch

python3 -c "
import urllib.request, json, re

url = 'YOUR_URL'

with urllib.request.urlopen(url) as res:
    buf = ''
    while True:
        chunk = res.read(4096).decode('utf-8', errors='ignore')
        if not chunk:
            break
        buf += chunk
        m = re.search(r'<script type=\"application/ld\+json\">([\s\S]*?)</script>', buf)
        if m:
            print(json.dumps(json.loads(m.group(1)), indent=2))
            break  # stop reading early, like res.destroy()
"

Node.js

Standalone gist: Node JSON-LD fetch

node -e "
const https = require('https');
const url = 'YOUR_URL';

https.get(url, res => {
  let buffer = '';
  res.on('data', chunk => {
    buffer += chunk;

    const match = buffer.match(/<script type=\"application\/ld\\+json\">([\\s\\S]*?)<\\/script>/);
    if (match) {
      console.log(JSON.stringify(JSON.parse(match[1]), null, 2));
      res.destroy(); // stop streaming once we have the first ld+json block
    }
  });
});
"

What You'll See

Point either script at a post URL and you get output along these lines:

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Move Windows Between Monitors on macOS Without Dragging",
  "url": "https://ysskrishna.vercel.app/blog/move-windows-between-monitors-macos-raycast",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://ysskrishna.vercel.app/blog/move-windows-between-monitors-macos-raycast"
  },
  "publisher": {
    "@type": "Organization",
    "name": "ysskrishna",
    "url": "https://ysskrishna.vercel.app"
  },
  "description": "Move the focused window between monitors on macOS with Raycast...",
  "datePublished": "2026-04-23T00:00:00.000Z",
  "image": [
    "https://ysskrishna.vercel.app/media/blog/covers/move-windows-between-monitors-macos-raycast/coverImage.avif"
  ],
  "author": {
    "@type": "Person",
    "name": "ysskrishna"
  }
}

That is the structured view of the same page. It is the version search systems treat as first-class when they have it.

Why This Changes How You Think About SEO

Keywords, titles, and writing quality still matter. The extra piece is how easy you make it for a machine to classify the page.

SEO is not only “write better copy.” It is also “make the page unambiguous to parsers.” JSON-LD does that. It nudges the record from this looks like a blog post to this is a BlogPosting, by this person, on this date, at this URL.

Two Layers, One Website

You can think of a site as two channels at once:

LayerWho it is forExamples
HumanVisitorsLayout, interaction, prose
MachineCrawlers and indexesJSON-LD, other structured data, entity links

Most effort goes to the human side. The machine side is where a lot of what this page is gets decided for search.

Further reading

What to Do Next

Pick one URL you care about. Run a script, inspect the JSON, and check: does @type match the page? Is author filled? Is there a description that matches what you want indexed?

If something is wrong, fix that field and redeploy. One page is enough to start; get it right in the index, then repeat.

Structured data is not a gimmick. It is a straight way to state what you published so systems do not have to infer it.

Similar posts