Back to Blog
White PapersJune 21, 2026

Who actually shapes LLM answers: top-10 domains from 710,000 RankCaster AI measurements

The RankCaster AI research teamThe RankCaster AI research team
Who actually shapes LLM answers: top-10 domains from 710,000 RankCaster AI measurements
Free AI-readiness audit

See how AI-ready your site is — in under a minute.

Five analyzers across AIO, AEO, and GEO — AI crawler access, schema.org, llms.txt, MCP discovery, on-page citability. Scores plus a prioritized fix list, no signup beyond your email.

Run Free AI-Readiness Audit

Analysis based on 710,000 successful LLM measurements, 5.08M citations and 26,238 unique domains

In short

42% of all top-10 citations come from Wikipedia. If your brand has no Wikipedia entry, you are missing from the very foundation of LLM answers.

Reddit is the leading UGC source for LLMs: 111,220 citations and 2.06 citations per answer. Models pull multiple Reddit threads into a single response.

YouTube is cited 1× per answer (41,291 / 41,261 ≈ 1.001). Video acts as "one link, one answer" — you have to win that single slot.

Facebook and Instagram are practically invisible to LLMs (130 citations combined). "AI SEO" budget on these platforms is largely wasted.

arXiv lands in the top-3 by answer count — the scientific preprint server has become a default source even for general queries.

What we measured

RankCaster AI runs prompts across major LLMs and tracks which sources models cite. As of this analysis:

— 710,000 successful LLM measurements — 5,080,000 citations in sourced answers — 26,238 unique cited domains

That volume is enough to speak confidently about domain patterns within the collected sample. Sampling limitations are covered in the Methodology section at the end.

The top-10 domains

Ranked by total citations. For each domain: total citations, number of answers containing at least one citation, average citations per answer, and share of all 5.08M citations.

1. reddit.com — 111,220 citations · 54,021 answers · 2.06 cit./answer · 2.19% of all citations

2. youtube.com — 41,291 · 41,261 · 1.00 · 0.81%

3. arxiv.org — 12,866 · 8,623 · 1.49 · 0.25%

4. en.wikipedia.org131,386 · 70,900 · 1.85 · 2.59%

5. linkedin.com — 7,042 · 6,916 · 1.02 · 0.14%

6. medium.com — 4,432 · 4,425 · 1.00 · 0.09%

7. researchgate.net — 2,736 · 2,734 · 1.00 · 0.05%

8. techradar.com — 824 · 472 · 1.75 · 0.02%

9. facebook.com — 48 · 34 · 1.41 · 0.001%

10. instagram.com — 82 · 33 · 2.48 · 0.002%

Top-10 = 311,927 citations ≈ 6.14% of the 5.08M total. The remaining ~94% are distributed across 26,228 domains — the long tail.

What these numbers actually mean

1. The headline insight: LLMs have two citation modes

Looking at "citations per answer" with a noise threshold of n ≥ 1,000 answers, the top-10 splits cleanly into two clusters.

"Corpus mode" (cit./answer > 1.4):

— Wikipedia — 1.85

— Reddit — 2.06

— arXiv — 1.49

The model references the source multiple times in a single answer. The domain is treated as a knowledge corpus from which many facts can be pulled.

"Document mode" (cit./answer ≈ 1.00):

— YouTube — 1.001

— LinkedIn — 1.02

— Medium — 1.00

— ResearchGate — 1.00

One link, one answer. The source is used as a standalone document or voice, not as a corpus.

This split drives strategy: growing presence in "document mode" requires many standalone publications; growing in "corpus mode" requires depth and density of facts on one domain.

2. Wikipedia — the dataset's most cited domain

Wikipedia produces 131,386 citations across 70,900 answers — the largest absolute count in the top-10 and 2.59% of all 5.08M citations. When a topic touches Wikipedia, the model pulls multiple facts from a single article.

3. Reddit — the #1 UGC source

Reddit is #2 by absolute citations (111,220) and #1 by cit./answer among non-encyclopedia sources with n ≥ 1,000 (2.06). The model bundles multiple threads into one answer. Hypothesis: threads serve as both factual references and as a "community voice" — the exact proportion needs a separate analysis.

4. YouTube — the only video platform in the top

No TikTok, no Vimeo, no Twitch. 41,261 YouTube citations vs 0 from competitors in this slice. For video-led brands the implication is direct: YouTube is the primary video channel for LLM answers in the current sample.

5. Social media — barely registers

Facebook (48 citations) and Instagram (82) combined produce 0.04% of the top-10 and 0.003% of all 5.08M. The reasons are predictable:

— content is closed to indexing; — short format yields little extractable knowledge; — authority cannot be verified.

Disclaimer: if the sample is light on prompts about local business, events and products, social media may perform better there. Worth a dedicated check.

6. arXiv — the scientific shift

12,866 citations across 8,623 answers (cit./answer = 1.49). arXiv is #4 by answer count and outperforms most tech media on citation density. Signal: LLMs pull scientific sources even for general queries — particularly relevant for B2B, deeptech, healthcare and finance.

Classification: which source types win

Encyclopedias (en.wikipedia.org) — 131,386 citations · 42.1% of top-10

Communities & UGC (reddit.com, medium.com) — 115,652 · 37.1%

Video (youtube.com) — 41,291 · 13.2%

Scientific archives (arxiv.org, researchgate.net) — 15,602 · 5.0% — Professional networks (linkedin.com) — 7,042 · 2.3%

Tech media (techradar.com) — 824 · 0.3%

Social networks (facebook.com, instagram.com) — 130 · 0.04%

Three source types deliver 92% of top-10 weight: encyclopedias, communities and video.

What this means for AI SEO / GEO strategy

High priority

1. Wikipedia — the foundation. The single most cited domain; a quality article with verified sources gives direct exposure to 2.59% of all citations and 42% of top-10 weight.

2. Reddit presence is a channel, not a campaign. Useful threads in relevant subreddits (real value, not astroturfing) are a long-term AI-authority investment.

3. YouTube — the primary video channel. Titles, descriptions and transcripts should be optimized for knowledge extraction, not just for clicks.

Medium priority

4. arXiv / ResearchGate — for technical, B2B, healthcare and deeptech products: publishing research or white papers there gives a high citation probability with relatively low competitive pressure.

5. Medium / LinkedIn — cit./answer ≈ 1.0 and shares of 0.09–0.14% of all citations. Useful as part of a thought-leadership stack, but framing them as a standalone AI-citation channel isn't supported by these numbers.

Low priority (for this goal)

6. Facebook, Instagram — practically don't function as LLM citation sources in the current sample. Still valuable for performance marketing.

7. Tier-2 tech media like techradar.com generate few absolute citations, but 1.75 cit./answer shows LLMs "trust" these outlets once reached. PR into niche outlets is a targeted, not mass, tactic.

Methodology and limitations

Source: 710,000 successful LLM measurements on the RankCaster AI platform.

Metrics:

Answers — number of LLM answers citing the domain at least once.

Citations — total reference count to the domain.

Cit./answer — how often the LLM cites the source multiple times within a single answer.

— When interpreting cit./answer, an n ≥ 1,000 answers threshold is applied; smaller-base domains (techradar.com, facebook.com, instagram.com) are flagged separately.

Sampling limitations to keep in mind:

— The prompt mix, language and geography distribution, collection period, and set of LLMs in the sample all shape the resulting picture. If the sample skews toward English-language general-information queries, the leadership of en.wikipedia.org and reddit.com partly reflects that skew.

— The top-10 covers 6.14% of all citations; the long tail of 26,228 domains carries important niche signal and is not analyzed here.

— Conclusions are valid within the collected sample; extending them to "all LLM answers in general" requires broader coverage (more LLMs, languages, prompt types).

What's next

Coming in the next RankCaster AI publications:

The long tail — which niche domains win specific verticals. — Model comparison — which LLMs lean Reddit, which lean Wikipedia, which lean scientific archives. — Cuts by query category — how the domain top shifts in e-commerce, B2B SaaS, healthcare, finance.

Want to see how your brand looks across these 710,000 measurements? Get in touch.

AV-Index Monthly

Get the AV-Index in your inbox.

New AI-visibility research, monthly. APR snapshots from real agencies. Pages and prompts that moved. No fluff, no daily noise.

~1 email per month · unsubscribe in one click · privacy

Ready to act on this?

Run RankCaster on your brand.

Launch RankCaster