Guide

What is llms.txt? The file that lets AI engines read your site

Q: Where does the file go, and what does it contain?

The file lives at the root of your domain, at /llms.txt — for example https://citeable.eu/llms.txt. Its format is deliberately simple Markdown: a single H1 with the site's name, a blockquote summarizing what the site does, then H2 sections containing lists of links to key pages, each with a one-line description. The specification also allows an optional section, conventionally named 'Optional', for secondary links a model can skip when its context is tight. A companion convention, llms-full.txt, goes further and inlines the full text of the important pages into one file. The essential rule: it is written for machines that read like fast, literal humans — clear naming, a real summary, and links that point to pages actually worth reading.

Q: Do AI engines actually read llms.txt?

Partly — and honesty matters here. llms.txt is a proposed standard: neither OpenAI nor Google has officially committed to using it for ranking or citations. What is verifiable today: AI crawlers such as GPTBot, ClaudeBot and PerplexityBot do request /llms.txt on sites that publish one, and a growing number of companies — Anthropic, Zapier, Cloudflare, and most documentation platforms such as Mintlify — publish the file. It is also immediately useful for live browsing: when ChatGPT, Perplexity or Claude fetch your site in real time to answer a question, a clean llms.txt is the shortest path to being understood correctly rather than paraphrased from a noisy HTML page. The cost-benefit is lopsided in your favor: one static text file, no downside, and a head start if adoption keeps growing.

Q: What's the difference between llms.txt, robots.txt and sitemap.xml?

They answer three different questions. robots.txt handles permission: which bots may crawl which parts of your site — it says nothing about content. sitemap.xml handles inventory: the list of URLs you want indexed, with dates — useful for coverage, silent about meaning. llms.txt handles comprehension: what your site is about, in a form a language model can load in one pass and actually understand. The three are complementary, not competing. A site that wants to be visible to AI engines should have all three: a robots.txt that explicitly allows AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), a sitemap for classic indexing, and an llms.txt for meaning. Removing any one of them weakens a different link in the chain: access, discovery, or understanding.

Q: Is llms.txt enough to get cited by AI engines?

No — and that is the most common misconception. llms.txt makes your site readable; it does not, by itself, make it quotable. AI answer engines cite passages that answer a specific question directly, in a self-contained way. That is a content-shape problem: pages structured as questions with direct answers, plus structured data — Q&A markup in schema.org format (JSON-LD) — that exposes those question-and-answer pairs in a machine-readable form on your pages. That combination is what turns 'the AI can read me' into 'the AI can quote me word for word'. It is also exactly the pair Citeable generates: a clean llms.txt built from your real content, plus the Q&A schema.org markup to place on your pages.

Q: How do you create an llms.txt file?

Two ways. Manually: read the specification at llmstxt.org, write the Markdown yourself — one H1, a blockquote summary, sections of links with a one-line description each — and keep it updated when your site changes. For a small site with a handful of pages, budget an hour or two; the hard part is editorial: an llms.txt that merely mirrors your sitemap adds nothing, the value is in genuine summaries of real content. Automatically: Citeable does it from a URL — it crawls your public pages, extracts the actual content, and generates both a structured llms.txt and the Q&A schema.org markup, for a one-time payment. Either way: publish the file at /llms.txt, check that it loads, and reference your best pages, not all of them.

Updated · July 5, 2026 — Joffrey Bonifay

If you care about your site's visibility in ChatGPT, Perplexity, Gemini or Claude, you've run into this filename: llms.txt. Here is what it is, what it actually does, and what it doesn't — no jargon, no overselling.

What is an llms.txt file?

llms.txt is a plain-text file, written in Markdown, placed at the root of a website, that gives AI systems a clean, curated summary of what the site is about: who is behind it, what it offers, and where its most important pages are. It is an open standard proposed in September 2024 by Jeremy Howard, co-founder of Answer.AI; the specification is published at llmstxt.org.

The problem it solves is simple: language models have limited context windows, and a typical web page buries the actual content under navigation, scripts and cookie banners. llms.txt hands the model the essential version directly. Where robots.txt talks to crawlers about permissions, llms.txt talks to language models about meaning — for an AI answer engine, it is the shortest path to understanding a site correctly.

Where does the file go, and what does it contain?

At the root of your domain, at /llms.txt — for example citeable.eu/llms.txt. Its format is deliberately simple Markdown: a single H1 (the site's name), a blockquote summarizing what the site does, then H2 sections listing links to key pages, each with a one-line description. An optional section, conventionally named “Optional”, holds secondary links a model can skip when its context is tight.

# Acme Bakery

> Family bakery in Lyon. Sourdough bread and pastries baked
> daily. Click & collect orders before 11am.

## Pages

- [Our breads](https://acme.fr/breads): the daily range and prices
- [Ordering](https://acme.fr/order): click & collect, cut-off times

## Optional

- [Our story](https://acme.fr/story): the family behind the ovens

A companion convention, llms-full.txt, goes further and inlines the full text of the important pages into one file. The essential rule: llms.txt is written for machines that read like fast, literal humans — clear naming, a real summary, useful links.

Do AI engines actually read llms.txt?

Partly — and honesty matters here. llms.txt is a proposed standard: neither OpenAI nor Google has officially committed to using it for ranking or citations. What is verifiable today: AI crawlers (GPTBot, ClaudeBot, PerplexityBot…) do request /llms.txt on sites that publish one, and a growing number of companies — Anthropic, Zapier, Cloudflare, most documentation platforms such as Mintlify — publish the file.

It is also immediately useful for live browsing: when ChatGPT, Perplexity or Claude fetch your site at answer time, a clean llms.txt is the shortest path to being understood correctly rather than paraphrased from a noisy HTML page. The cost-benefit is lopsided in your favor: one static text file, no downside, and a head start if adoption keeps growing.

What's the difference between llms.txt, robots.txt and sitemap.xml?

They answer three different questions. robots.txt handles permission: which bots may crawl which parts of your site — it says nothing about content. sitemap.xml handles inventory: the list of URLs you want indexed — useful for coverage, silent about meaning. llms.txt handles comprehension: what your site is about, in a form a language model can load in one pass.

The three are complementary, not competing. A site that wants to be visible to AI engines should have all three: a robots.txt that explicitly allows AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended…), a sitemap for classic indexing, and an llms.txt for meaning. Removing any one of them weakens a different link in the chain: access, discovery, or understanding.

Is llms.txt enough to get cited by AI engines?

No — and that is the most common misconception. llms.txt makes your site readable; it does not, by itself, make it quotable. AI answer engines cite passages that answer a specific question directly, in a self-contained way. That is a content-shape problem: pages structured as questions with direct answers, plus structured data — Q&A markup in schema.org format (JSON-LD) — that exposes those question-and-answer pairs in a machine-readable form on your pages.

That combination is what turns “the AI can read me” into “the AI can quote me word for word”. It is also exactly the pair Citeable generates: a clean llms.txt built from your real content, plus the Q&A schema.org markup to place on your pages. (This very page uses that markup — view its source.)

How do you create an llms.txt file?

Two ways. Manually: read the specification at llmstxt.org, write the Markdown yourself and keep it updated when your site changes. For a small site, budget an hour or two; the hard part is editorial: an llms.txt that merely mirrors your sitemap adds nothing — the value is in genuine summaries of real content.

Automatically: Citeable does it from a URL — it crawls your public pages, extracts the actual content, and generates both a structured llms.txt and the Q&A schema.org markup, for a one-time payment. Either way: publish the file at /llms.txt, check that it loads, and reference your best pages — not all of them.