The Fan-Out Query: How AI Search Actually Works (and What to Optimize)

May 27, 2026

WRITTEN BY

Trevor Gage is Director of Marketing at Webserv, specializing in digital marketing for behavioral healthcare. Since 2019, he has developed deep expertise in technical SEO and content quality optimization to drive measurable results for addiction treatment and mental health providers. Trevor holds a BA in English from the University of San Francisco and an MA in Integrated Marketing Communication from Emerson College.

The single biggest misunderstanding about AI search optimization is that it works the way Google did in 2015. Operators think there is one good answer per query, and the optimization job is to rank for that answer. AI search engines do not work that way.

They decompose the user’s question into 8 to 12 sub-queries, run them in parallel, and synthesize the results into a single response that cites multiple sources.

A real AEO program for a behavioral health operator is built around that decomposition pattern, not around page-level keyword targeting. The optimization unit is the passage, not the page. The asset that gets cited is the well-scoped paragraph that answers one sub-query cleanly.

This piece on behavioral health marketing covers what query fan-out actually does, why the synthesis stage rewards different content patterns than traditional SEO, and the specific structural moves operators should make on their treatment center pages to earn citations across ChatGPT, Perplexity, Claude, and Google AI Overviews.

Key Takeaways

AI search engines do not retrieve a single best answer. They decompose the query into 8 to 12 sub-queries (sometimes hundreds in Deep Search), retrieve in parallel, and synthesize the results.
The synthesis stage uses Reciprocal Rank Fusion. Passages that show up across multiple sub-queries get boosted. Passages that only answer one narrow sub-query well still get cited cleanly.
Long-form ultimate guides get small fragments cited; atomic well-scoped passages get cited as full units. The optimization unit is the passage, not the page.
Schema, semantic triples, and topic-cluster coverage do the work that page-level keyword targeting used to do. Topical coverage beats individual page perfection.
Sites with 80% or higher topical coverage retain 85% of AI visibility through fan-out instability. Single-page optimization without cluster coverage is the losing strategy.

A treatment center’s marketing director sent us a 4,200-word ultimate guide last fall. It was beautifully written, exhaustively researched, and structured around the head term they wanted to own in AI Overviews. Six months later, they pulled the citation logs.

Two paragraphs of the piece had been cited inside AI Mode answers, neither of which was the section they thought would win. The other 3,800 words had not been surfaced at all.

That outcome is the most predictable thing in AI search right now, and almost no one drafting content for behavioral health understands why. The model that ranks for AI Overviews is not picking your best 4,000-word answer.

It is breaking the user’s question into eight to twelve smaller questions, running them in parallel, and synthesizing the results.

Your job is not to write the perfect answer to the user’s prompt. Your job is to write the perfect answer to one of the sub-queries the model fans out to.

This is the architecture Google calls “query fan-out,” and it powers AI Mode, AI Overviews, ChatGPT Search, and Perplexity. Our AEO program is built around it.

Below is what fan-out actually does, why it changes what good content looks like, and the specific shifts behavioral health operators should make to start earning citations instead of just publishing word count.

What query fan-out actually does

The mechanic has four stages, each of which matters for content design.

Stage 1: query decomposition. When a user submits a complex prompt, the model does not search for that exact prompt. A custom Gemini 2.5 model analyzes the prompt for intent, complexity, and information type, then generates a list of sub-queries that each address one facet of the original question.

The architecture was rolled out in parallel with Google’s broader mobile-first indexing shift, which is part of why mobile rendering quality now influences AI citation eligibility too.

For a prompt like “what’s the best inpatient rehab in southern california that takes my insurance,” the decomposition might produce sub-queries about: levels of care offered, insurance acceptance for specific payers, geographic coverage, treatment specialties, accreditation status, average length of stay, family involvement, post-discharge support, and so on.

The decomposition is intent-driven, not keyword-driven.

Stage 2: parallel retrieval. Each sub-query gets run independently against Google’s full index plus specialized databases (Knowledge Graph, Shopping Graph, Maps, and others). Each retrieval pulls a ranked list of passages, not pages.

The retrieval system is looking for specific passages within pages that answer the specific sub-query, not for the most authoritative page on the broader topic.

Stage 3: Reciprocal Rank Fusion. The model collects the retrieved passages and ranks them using a technique called Reciprocal Rank Fusion (RRF). RRF combines the ranked lists from each sub-query into a single unified ranking.

A passage that ranks in the top 3 results for one sub-query gets a high score. A passage that ranks in the top 5 across two or three sub-queries gets a higher score, because the cross-sub-query relevance compounds.

Stage 4: synthesis. The model takes the top-ranked passages and writes the answer the user sees, citing the source pages that produced the highest-RRF passages. Passages that aren’t cited verbatim still influence the synthesis. The retrieval-augmented generation layer is reading from the passages even when it doesn’t quote them.

The whole sequence runs in roughly the same time as a traditional search because the sub-queries fire in parallel, not in series. From the user’s perspective, it is one question and one answer. From the content optimization perspective, it is eight to twelve separate ranking competitions stitched together.

Why this changes what good content looks like

In the pre-fan-out world, the question was “which page deserves to rank for query X?” In the fan-out world, the question is “which passage deserves to be cited for sub-query Y inside the answer to question X?”

The shift in optimization unit changes almost everything about how content should be structured.

Google’s Search Central documentation describes the same decomposition mechanic. AI features in Search “break the information need into parts” before retrieving across multiple sources and synthesizing a response (Google Search Central, AI features in Search).

The implication for treatment center content is that page-level ranking and passage-level citation are now two separate optimization problems with different inputs.

Industry analysis of query fan-out makes the design implication clear: content has to be modular, semantically rich, and structured so that each passage can stand on its own as the answer to a specific question.

Pages that bury their best answers in 3,000 words of context do not get cited cleanly because the retrieval layer cannot extract them as standalone passages.

Three specific content patterns lose under fan-out and three patterns win.

Patterns that lose

Long pillar pages that try to answer everything. The model retrieves a fragment, the fragment is taken out of context, and the citation does not feel earned to the user.

Dense paragraphs with multiple concepts woven together. The model cannot extract a clean answer from “On the one hand X, on the other hand Y, but also Z, depending on…” because the passage does not answer a sub-query directly.

Generic agency-evaluation framing. “Find the right partner for your needs” reads as filler to the retrieval layer and gets skipped in favor of pages with operator-specific, claim-specific answers. This pattern is also what gives templated city pages the doorway-page signature that suppresses them during core updates.

Patterns that win

Atomic answer paragraphs that lead with the answer in 40 to 60 words, then elaborate. This is the same structure that wins featured snippets, and it is now the structure that wins fan-out citations because the leading paragraph is the passage the retrieval layer extracts.

Question-format H2s and H3s with direct paragraph answers underneath. The H2 signals to the retrieval layer what sub-query this section answers. The paragraph beneath delivers the answer. This pattern is what powers AI Overview and ChatGPT citations on the sites that earn them consistently.

Semantic triples in plain prose. A semantic triple is the subject-predicate-object pattern that knowledge graphs use to encode facts. Writing in patterns like “Webserv runs paid search for treatment centers” gives the retrieval layer a clean, extractable fact.

We covered the full pattern in our semantic triples for AEO piece; the short version is that the model can parse a triple-shaped sentence faster than it can parse a sentence buried in qualifiers.

How to predict your own fan-out

The most useful exercise an operator can do is predict the fan-out for a query they want to rank for, then audit which of their pages can answer which sub-query.

Pick a representative complex query for your facility. Something like: “what’s the difference between PHP and IOP for someone leaving a 30-day residential program, and how do I find one in [your city] that takes my insurance?”

Write out the sub-queries you would expect the model to fan out to. A realistic list for that prompt is:

What is PHP?
What is IOP?
How is PHP different from IOP in terms of structure?
How is PHP different from IOP in terms of clinical intensity?
When does someone step down from residential to PHP versus IOP?
What does the typical PHP schedule look like?
What does the typical IOP schedule look like?
How long does each typically last?
What insurance accepts PHP coverage?
What insurance accepts IOP coverage?
How do I find PHP in [city]?
How do I find IOP in [city]?

That is twelve sub-queries from one prompt, which is the high end of the standard fan-out range. Now open your site and audit which page on your site contains the best paragraph-level answer to each one.

If you have one strong service page per level of care plus one strong location page per city you serve, the audit will find six to ten clean matches.

If you have one 4,000-word ultimate guide that tries to cover the entire topic in a single page, the audit will find one or two matches at best, with the rest of the sub-queries unanswered or only obliquely addressed.

The gap between those two outcomes is the gap between earning AI citations and being invisible in AI search. The fix is not more words. The fix is more passages.

What to optimize at the passage level

Passage-level optimization is a different skill from page-level SEO, and most content teams have not retrained for it. Five practices do most of the work.

Google’s guidance on helpful, reliable, people-first content reinforces the same principle from a different angle.

The document directs creators to demonstrate first-hand expertise and depth of knowledge and to answer questions directly rather than padding around them (Google Search Central, Creating helpful content). The five passage-level practices below are the operational application of that direction.

Lead every section with a direct answer. The first paragraph under any H2 or H3 should answer the question that section poses in 40 to 60 words, complete and standalone.

Save the elaboration for paragraph two and beyond. The leading paragraph is what the retrieval layer extracts; the elaboration is what the human reader keeps reading.

Use question-shaped H2s and H3s where appropriate. Not every section needs to be a question. But sections that map to predictable sub-queries should use the sub-query language.

“What is verification of benefits and how long does it take?” is a more retrievable section header than “VOB Process Overview.” Question headers also pair cleanly with FAQPage schema.

Write in semantic triples. Concrete subject-predicate-object sentences (“Webserv built the SEO program at [facility name],” “PHP runs five to six hours per day for five days per week,” “Cigna covers PHP for 14 to 21 days at most levels of care”) extract more cleanly than qualifier-heavy prose.

Layer FAQ-style content where it earns its keep. A real Rank Math FAQ block at the bottom of a page gives the retrieval layer four to six additional discrete passages with explicit question framing.

Plus FAQPage schema that tells Google those Q&A pairs are intended as standalone answers. This is the cheapest passage volume any page can carry.

Cover the cluster, not just the page.

The 85% topical coverage stat is real: sites that own the full Subservice cluster (Macros + Micros + key listicles + service pages) hold AI visibility through fan-out instability, while sites that optimized one page perfectly without cluster coverage lose their visibility on each model refresh.

This is the same logic that powers the healthcare content creation work that earns external links and AI citations together.

OON - Inpatient substance use

How SoCal Sunrise generated 85 admissions and 2,297% ROI from SEO in 6 months

A ground-up SEO rebuild using the Pathfinder Parents Methodology turned an invisible online presence into a top-ranking admissions engine.

Read the case study →

6 month result

2,297% SEO return on investment

85 admits and 3,152 leads attributed to organic

The compounding effect of cluster coverage

Topical coverage matters more under fan-out than under traditional search because the model is testing your site across many sub-queries instead of one query. Every additional well-built page in the cluster increases the surface area for fan-out hits.

Cite-side data from the 2026 AI search landscape backs this up. Sites with 80% or higher topical coverage retain 85% of their AI visibility through the 73% of fan-out instances where the sub-query distribution shifts between model refreshes.

Sites with one well-optimized page and no cluster around it lose visibility on each refresh because the model’s sub-query distribution moved past the single page’s coverage.

This is the same logic behind the service-pages-first ordering we recommend for behavioral health SEO. Build the commercial layer first so that fan-out has clean, modular, intent-specific pages to retrieve from.

Add the authority and blog layer on top so the cluster’s depth signals to the model that your site can answer the educational sub-queries as well. The two together compound. One without the other underperforms.

The implication for content teams is that you should plan for a cluster of 8 to 12 modular pages per Subservice, each carrying 3 to 5 well-scoped passages, instead of one 6,000-word pillar with the same content compressed in.

The total word count can be the same. The retrieval performance is not.

The model is not asking which one of your pages is best. It is asking which of your passages answers each of twelve different questions. Pages that try to answer all twelve at once answer none of them well enough to cite.
Trevor Gage, Director of Earned & Owned Media, Webserv

The pull quote is uncomfortable because it inverts what most SEO teams were trained on. Long pillar pages were the right answer in 2020. They are the wrong answer in 2026.

What to stop doing

Three habits are actively expensive in the fan-out era.

Stop merging short articles into long ones. The “content consolidation” pass that used to lift rankings under Helpful Content Updates often hurts AI visibility because it buries discrete passages inside long pages.

Consolidate when the topical fit is genuinely the same. Do not consolidate when you have two pages answering two different sub-queries cleanly.

Stop writing introductions that delay the answer. “In recent years, the addiction treatment landscape has evolved…” is a four-line throat-clear before the actual answer. The retrieval layer will skip it and extract the answer that follows, or worse, extract the throat-clear and synthesize a vague response. Lead with the answer.

Stop ignoring schema. FAQPage schema, HowTo schema, and structured data on service pages give the retrieval layer explicit identity cues for each passage.

The schema does not rank a passage on its own, but it tells the model “this paragraph is the answer to this question,” which is the exact signal the fan-out architecture is built to use.

The technical SEO foundation work that powers schema across a site is the single highest-ROI investment for AEO right now.

How this connects to the rest of the SEO program

Fan-out is not a separate game from traditional SEO. It is the new top layer of the same game. The commercial pages, location pages, and technical foundation underneath still have to perform for traditional search. The blog and AEO layer earns the AI citations on top.

This is the same four-layer build order we use for every behavioral health SEO program. Service pages, location pages, technical foundation, and authority content.

The reason that order works for AI search is the same reason it works for traditional search: commercial layers earn the conversions, authority layers earn the citations, and the cluster compounds across both surfaces.

Sites that try to optimize for AI search without the underlying commercial layer end up with citations that drive zero admissions because the user lands on a thin service page.

Sites that have a strong commercial layer plus a deep authority layer earn citations that convert because the cluster delivers both the AI visibility and the conversion path.

Frequently asked questions about query fan-out and AI search optimization

How is query fan-out different from traditional Google search?

Traditional Google search retrieves pages that match a user’s query, ranks them by relevance and authority, and shows the top ten results. The user picks one to click. Query fan-out decomposes the user’s query into sub-queries, retrieves passages for each sub-query in parallel, ranks the passages using Reciprocal Rank Fusion, and synthesizes them into one direct answer with citations.

The optimization implication is that traditional SEO competes for the page click, while AI search competes for the passage citation. Both still matter because Google still serves blue links underneath AI Overviews and inside AI Mode answers. But the high-value real estate in AI surfaces is the citation, not the click, and the citation rewards different content patterns.

A site that wins both layers does it by building strong commercial pages (for blue-link clicks) on top of a deep authority layer (for AI citations). The two layers complement each other instead of competing for the same content.

How many sub-queries does AI Mode actually generate?

The standard range is 8 to 12 sub-queries per user prompt. Deep Search, which is AI Mode’s intensive research mode for complex queries, can issue hundreds of sub-queries. The number depends on the prompt complexity, the user intent signals, and whether the model identifies the prompt as a comparison, a how-to, a fact lookup, or a decision-support task.

For behavioral health prompts, the standard fan-out tends to sit at the high end (10 to 12) because the queries usually combine clinical, financial, geographic, and logistical sub-questions. A prompt like “should I send my son to inpatient or outpatient rehab” generates sub-queries about clinical criteria, family considerations, financial implications, insurance coverage, geographic options, and outcomes data. That is at least six to eight discrete sub-queries before the model starts synthesizing.

The practical takeaway is to assume your content is being fanned out across at least 8 sub-queries for any non-trivial query, and to audit which sub-queries your pages actually answer.

Does this only apply to Google AI Mode, or does ChatGPT do the same thing?

The fan-out pattern is the dominant retrieval architecture across all major AI search engines in 2026. ChatGPT Search uses fan-out with similar sub-query counts. Perplexity uses fan-out with an explicit “related questions” surface that exposes some of the sub-queries to the user. Claude’s web search uses a similar retrieval pattern. Google AI Mode is the most documented because of the published patents, but the architectural pattern is consistent across the category.

The optimization implications are the same regardless of which surface you are optimizing for. Modular passages, clean H2/H3 structure, semantic triples, schema, and cluster coverage all work across the category. Optimizing for Google’s fan-out and optimizing for ChatGPT’s retrieval are functionally the same work.

The one exception is platform-specific surfaces (ChatGPT GPT directories, Perplexity Spaces, etc.), which carry their own visibility levers on top of the underlying retrieval. Those are smaller optimization plays compared to the cluster-level work, but they exist.

What is the single highest-impact change to make right now?

Run the fan-out audit on one representative complex query for your facility. Write out the 8 to 12 sub-queries you expect the model to generate. Open your site and find the best paragraph-level answer to each sub-query. The gaps in that audit are your optimization roadmap.

For most behavioral health operators, the gaps cluster in three areas. Insurance and payer sub-queries: service pages do not name specific payers or coverage details. Clinical sub-queries: pages describe the program but not the criteria for stepping up or down. Outcomes sub-queries: pages talk about quality but do not name specific outcomes data.

Filling those three gaps closes the gap between “we publish content” and “we earn citations.” The audit takes an hour. The fix takes a quarter. The compounding effect runs for years.

Is there a way to monitor how often we are being cited inside AI Mode?

The visibility tracking layer is improving fast in 2026 but is still imperfect. Tools like the AI optimization agencies and platforms we evaluated for the BH category are starting to expose citation tracking, prompt monitoring, and sub-query coverage analysis. Search Console added an “AI features” section in early 2026 that shows partial impression data, though the breakdown into AI Mode versus AI Overviews versus traditional blue links is still inconsistent.

The directional answer is to monitor citation appearance for a tracked set of prompts (10 to 30 representative queries for your facility) on a monthly basis. Use a tool or a manual check; either works. Track citation rate, the specific pages being cited, and the specific passages being extracted.

Over six months, you will see whether your passage-level optimization work is moving the citation rate or not. The instrument is not perfect, but the trend it shows is more reliable than ranking trackers were in 2018. Pay attention to the trend, not the absolute number. If you want a second opinion on how your site stacks up across the fan-out audit and citation tracking, reach out for an AEO assessment and we can walk through your representative prompts together.

About Webserv

The perspective in this article comes from 9 years working exclusively inside behavioral health.

We are a team built by people in recovery who understand that behind every admission is someone asking for help. If that resonates, get to know us.

Get to know us →

Trevor Gage is the Director of Earned & Owned Media at Webserv, a digital marketing agency for treatment centers.

ABOUT THE AUTHOR

Trevor Gage