Skip to content
AI Content Systems

How to structure a B2B page so AI quotes it

AI engines retrieve 100-300 word passages, not whole pages. Here's how to structure each section so it survives being lifted out of context and cited, with a real before-and-after from duo.ca.

By Justin DeMarchiJune 8, 20268 min read
In this article· 6 sections
How to structure a B2B page so AI quotes it

In March 2026, a team of researchers changed nothing about a body of content except how it was formatted, and AI citation rates went up 17.3% across six engines (Yu et al., "Structural Feature Engineering for Generative Engine Optimization," arXiv:2603.29979). Not a word of the writing changed. They reorganized chunks, fixed headings, surfaced answers. The machines started quoting it more.

That number is the whole argument for this piece. If you want AI engines to quote your B2B pages, the lever isn't writing more, it's structuring what you've already written so a retriever can lift a clean passage out of it. Most pages fail this not because the content is bad, but because the good part is buried three paragraphs into a section that never says what it's about.

AI retrieves passages, not pages

When ChatGPT or Perplexity answers a question, it doesn't read your whole article. It pulls a passage. A retrieval system breaks your page into chunks of roughly 100 to 300 words, scores each chunk against the query, and lifts the best one into the answer. The unit that gets cited is the section, not the page.

That changes the job. You're not optimizing a page for citation. You're optimizing every section to survive being torn out of the page. A passage that only makes sense with the three paragraphs above it is invisible to a retriever, because the retriever never sees the three paragraphs above it.

This is why a strong article can sit uncited while a thinner one gets quoted. The thinner one chunked cleanly.

What a citable chunk actually looks like

A citable chunk does three things:

  • Names its own subject. The chunk can't say "the first version" or "this approach" and assume the reader scrolled past the part that defined it. The retriever didn't. Every chunk has to carry its own noun.
  • Answers up front. Engines reward the answer that arrives early and skip the one saved for a dramatic payoff. Journalists call this bottom-line-up-front. LLMs were trained on a lot of it.
  • Stands alone. Read the chunk with everything above and below it deleted. If it still answers the question, it's citable. If it needs context you removed, it isn't.

On the "answer up front" point, the data is blunt. Kevin Indig's analysis of 18,012 verified ChatGPT citations found that 44.2% of citations came from the first 30% of the content on a page, with the pattern holding across randomized batches at a P-value of 0.0 (Kevin Indig, "The science of how AI picks its sources," Growth Memo).

One real duo.ca section, rewritten to chunk cleanly

Here's a live one. The duo.ca article on founder-led growth opens like this:

Every founder has heard the term by now.

Founder-led growth. It comes up in startup circles, investor conversations, go-to-market strategy decks. Sometimes it means the founder is doing the selling. Sometimes it means the founder is the face of the brand.

It reads fine for a human scrolling top to bottom. To a retriever, it's close to useless. The chunk never states what founder-led growth is. It gestures at the term, lists where the term shows up, and saves the actual definition for later. Lift this passage into an AI answer and it says nothing quotable.

The first section header makes it worse:

What are the two versions most founders don't distinguish?

The first version is where almost every B2B company starts. You work your personal network...

"The first version" of what? The chunk doesn't say. Pulled out of context, it's an orphan.

Here's the same content restructured to chunk cleanly, without changing the argument:

Founder-led growth has two distinct versions

Founder-led growth is a founder using their own credibility to drive demand, and it runs in two distinct modes. Version one is closing through your existing network: investor intros, events, one-to-one conversations. Version two is building a consistent public presence so your perspective reaches buyers you've never met, in the research phase before they're ready to talk. Most B2B founders are running version one. Very few have built version two.

Same claim. Same opinion. But now the chunk names its subject in the first four words, defines the term in the first sentence, and delivers the two-versions answer without needing anything above it. Lift it into a ChatGPT answer and it stands.

Why each change matters to a retriever

Three specific edits, three specific reasons.

  • The subject moved into the first sentence. A retriever matching "what is founder-led growth" needs the phrase and its definition in the same chunk it's scoring. Burying the definition two paragraphs down splits the match across chunks and weakens both.
  • The header became a claim, not a question with a delayed answer. "Founder-led growth has two distinct versions" is itself a quotable line. A retriever can cite the header. "What are the two versions most founders don't distinguish?" forces the engine to find and trust the answer somewhere below it.
  • The definition stopped depending on context. "The first version" became "Version one is closing through your existing network." The pronoun reference is gone, so the chunk no longer needs its neighbors to make sense.

None of this is dumbing the writing down. The human-facing version can keep its warmer opening if you want it. The point is that the answer exists somewhere as a clean, self-contained block a machine can grab.

The structure that helps, and the structure that's cargo-cult

Some formatting genuinely lifts citations. Some is theatre. Here's the line.

Lists and tables earn citations when the content is actually list-shaped. Evertune analyzed roughly 25,000 of the most-cited URLs across six engines and found 50% of those most-cited URLs were listicles; separately, across nearly 400 million total citations, 63% pointed to listicles (Evertune, via Search Engine Land). The two numbers measure different things, unique URLs versus raw citation volume, but both point the same way. Tables get cited for the same mechanical reason: a retriever can lift a cell without parsing a sentence around it. If you're comparing three roles or four tools, a table is the right call, and it'll get quoted.

The cargo-cult version is forcing a table around things that aren't parallel, or chopping a coherent argument into a bulleted list to look "AI-friendly." A retriever can tell the difference between a real comparison and three unrelated points wearing a table costume. The format helps because it matches the content's actual shape. Faked, it just makes the prose worse.

Schema sits in the same bucket. FAQPage and Article JSON-LD help an engine parse a page you've already structured well. They do not rescue a buried definition. Schema makes good structure machine-readable; it can't manufacture structure that isn't there. Add it after the prose chunks cleanly, not as a substitute for doing that work.

One more honest caveat: structure is necessary, not sufficient. The 17.3% lift was measured on content that was already worth citing. Clean chunking gets your good answer in front of the retriever. It doesn't make a generic answer good. If the underlying take is consensus mush, no amount of formatting earns the quote, because the engine has a thousand other pages saying the same thing.

The Upshot

Write every section as if it'll be lifted out and quoted with nothing around it, because that's exactly what happens. The whole page is a courtesy to human readers. The chunk is what the machine sees.

The practical test takes ten seconds per section: delete everything above and below it, read what's left, and ask whether it answers the question and names its own subject. If it does, it's citable. If it leans on context you just deleted, rewrite the first sentence until it stands on its own. Do that across a page and you've done most of what the 17.3% study did, by hand, for free.

This is the per-article layer of a larger practice. If you want the system that produces these mechanics across a whole corpus, the AI content systems guide covers it. For the channel-specific versions, see how to show up in ChatGPT for B2B and GEO vs AEO vs SEO for B2B.

Frequently asked

Common questions.

  • How do AI engines decide what to quote from a page?

    They retrieve passages, not whole pages. A retriever breaks your page into roughly 100-300 word chunks and matches the chunk against a query, so the unit that gets cited is the section, not the article. Citability is a passage-level property. Each section has to make sense lifted out of context, which means it opens with a self-contained answer that names its own subject. Kevin Indig's analysis of 18,012 verified ChatGPT citations found 44.2% came from the first 30% of a page, so the answer has to be near the top of the chunk, not saved for a payoff.

  • Does adding a table or a list actually increase AI citations?

    Yes, with caveats. Evertune's analysis of roughly 25,000 most-cited URLs across six engines found 50% of the most-cited URLs were listicles, and across nearly 400 million citations 63% pointed to listicles. Tables get cited more than prose for the same reason, because a retriever can lift the cell without parsing a sentence. But the format only helps if the content is genuinely list-shaped or comparison-shaped. A forced table around three things that aren't parallel reads as cargo-cult and does nothing.

  • Where should the answer go in a section?

    In the first one or two sentences, and it has to restate the subject. 'The first version is where most companies start' is not citable on its own because the chunk doesn't say the first version of what. 'Founder-led growth has two versions: closing through your own network, and building a public presence that reaches buyers you've never met' survives the lift because it carries its own subject. Put the answer up front, name the noun, then defend it underneath.

  • Does FAQ schema help with AI citations?

    Schema helps an engine parse your page, but it doesn't manufacture citability that the prose lacks. FAQPage and Article JSON-LD make your question-and-answer pairs machine-readable and are worth adding. What they can't do is rescue a buried definition or a section that never names its subject. Structure the prose to be citable first, then add schema so the engine can read the structure you already built. Schema on top of weak chunks is the cargo-cult version.

Justin DeMarchi
Written by

Justin DeMarchi

B2B Content Operator and founder of DUO. Eight-plus years running marketing and content systems for brands in tech, SaaS, and AI.

More in AI Content Systems