Created
October 28, 2025 22:17
-
-
Save Voltra/0be71892b636762123738b61585cb6ab to your computer and use it in GitHub Desktop.
Basic Robots.txt (with anti AI stuff)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # As a condition of accessing this website, you agree to abide by the following | |
| # content signals: | |
| # (a) If a content-signal = yes, you may collect content for the corresponding | |
| # use. | |
| # (b) If a content-signal = no, you may not collect content for the | |
| # corresponding use. | |
| # (c) If the website operator does not include a content signal for a | |
| # corresponding use, the website operator neither grants nor restricts | |
| # permission via content signal with respect to the corresponding use. | |
| # The content signals and their meanings are: | |
| # search: building a search index and providing search results (e.g., returning | |
| # hyperlinks and short excerpts from your website's contents). Search does not | |
| # include providing AI-generated search summaries. | |
| # ai-input: inputting content into one or more AI models (e.g., retrieval | |
| # augmented generation, grounding, or other real-time taking of content for | |
| # generative AI search answers). | |
| # ai-train: training or fine-tuning AI models. | |
| # ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF | |
| # RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT | |
| # AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET. | |
| User-agent: AddSearchBot | |
| User-agent: AI2Bot | |
| User-agent: Ai2Bot-Dolma | |
| User-agent: aiHitBot | |
| User-agent: amazon-kendra- | |
| User-agent: Amazonbot | |
| User-agent: Andibot | |
| User-agent: Anomura | |
| User-agent: anthropic-ai | |
| User-agent: Applebot | |
| User-agent: Applebot-Extended | |
| User-agent: Awario | |
| User-agent: bedrockbot | |
| User-agent: bigsur.ai | |
| User-agent: Bravebot | |
| User-agent: Brightbot 1.0 | |
| User-agent: Bytespider | |
| User-agent: CCBot | |
| User-agent: ChatGPT Agent | |
| User-agent: ChatGPT-User | |
| User-agent: Claude-SearchBot | |
| User-agent: Claude-User | |
| User-agent: Claude-Web | |
| User-agent: ClaudeBot | |
| User-agent: Cloudflare-AutoRAG | |
| User-agent: CloudVertexBot | |
| User-agent: cohere-ai | |
| User-agent: cohere-training-data-crawler | |
| User-agent: Cotoyogi | |
| User-agent: Crawlspace | |
| User-agent: Datenbank Crawler | |
| User-agent: DeepSeekBot | |
| User-agent: Devin | |
| User-agent: Diffbot | |
| User-agent: DuckAssistBot | |
| User-agent: Echobot Bot | |
| User-agent: EchoboxBot | |
| User-agent: FacebookBot | |
| User-agent: facebookexternalhit | |
| User-agent: Factset_spyderbot | |
| User-agent: FirecrawlAgent | |
| User-agent: FriendlyCrawler | |
| User-agent: Gemini-Deep-Research | |
| User-agent: Google-CloudVertexBot | |
| User-agent: Google-Extended | |
| User-agent: Google-Firebase | |
| User-agent: Google-NotebookLM | |
| User-agent: GoogleAgent-Mariner | |
| User-agent: GoogleOther | |
| User-agent: GoogleOther-Image | |
| User-agent: GoogleOther-Video | |
| User-agent: GPTBot | |
| User-agent: iaskspider/2.0 | |
| User-agent: IbouBot | |
| User-agent: ICC-Crawler | |
| User-agent: ImagesiftBot | |
| User-agent: img2dataset | |
| User-agent: ISSCyberRiskCrawler | |
| User-agent: Kangaroo Bot | |
| User-agent: LinerBot | |
| User-agent: Linguee Bot | |
| User-agent: meta-externalagent | |
| User-agent: Meta-ExternalAgent | |
| User-agent: meta-externalfetcher | |
| User-agent: Meta-ExternalFetcher | |
| User-agent: meta-webindexer | |
| User-agent: MistralAI-User | |
| User-agent: MistralAI-User/1.0 | |
| User-agent: MyCentralAIScraperBot | |
| User-agent: netEstate Imprint Crawler | |
| User-agent: NovaAct | |
| User-agent: OAI-SearchBot | |
| User-agent: omgili | |
| User-agent: omgilibot | |
| User-agent: OpenAI | |
| User-agent: Operator | |
| User-agent: PanguBot | |
| User-agent: Panscient | |
| User-agent: panscient.com | |
| User-agent: Perplexity-User | |
| User-agent: PerplexityBot | |
| User-agent: PetalBot | |
| User-agent: PhindBot | |
| User-agent: Poseidon Research Crawler | |
| User-agent: QualifiedBot | |
| User-agent: QuillBot | |
| User-agent: quillbot.com | |
| User-agent: SBIntuitionsBot | |
| User-agent: Scrapy | |
| User-agent: SemrushBot-OCOB | |
| User-agent: SemrushBot-SWA | |
| User-agent: ShapBot | |
| User-agent: Sidetrade indexer bot | |
| User-agent: TerraCotta | |
| User-agent: Thinkbot | |
| User-agent: TikTokSpider | |
| User-agent: Timpibot | |
| User-agent: VelenPublicWebCrawler | |
| User-agent: WARDBot | |
| User-agent: Webzio-Extended | |
| User-agent: wpbot | |
| User-agent: YaK | |
| User-agent: YandexAdditional | |
| User-agent: YandexAdditionalBot | |
| User-agent: YouBot | |
| Content-signal: search=no,ai-train=no,ai-input=no | |
| Disallow: / | |
| User-Agent: * | |
| Content-signal: search=yes,ai-train=no,ai-input=no | |
| Allow: / |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment