Merge pull request #205 from fiskhandlarn/fix/editorconfig

Fix/editorconfig
This commit is contained in:
Glyn Normington 2025-11-29 10:02:08 +00:00 committed by GitHub
commit 56010ef913
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 8 additions and 8 deletions

View file

@ -4,3 +4,6 @@ root = true
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[{Caddyfile,haproxy-block-ai-bots.txt,nginx-block-ai-bots.conf}]
insert_final_newline = false

View file

@ -4,7 +4,7 @@
This list contains AI-related crawlers of all types, regardless of purpose. We encourage you to contribute to and implement this list on your own site. See [information about the listed crawlers](./table-of-bot-metrics.md) and the [FAQ](https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.md).
A number of these crawlers have been sourced from [Dark Visitors](https://darkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers.
A number of these crawlers have been sourced from [Dark Visitors](https://darkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers.
If you'd like to add information about a crawler to the list, please make a pull request with the bot name added to `robots.txt`, `ai.txt`, and any relevant details in `table-of-bot-metrics.md` to help people understand what's crawling.
@ -86,8 +86,8 @@ Alternatively, you can also subscribe to new releases with your GitHub account b
## License content with RSL
It is also possible to license your content to AI companies in `robots.txt` using
the [Really Simple Licensing](https://rslstandard.org) standard, with an option of
It is also possible to license your content to AI companies in `robots.txt` using
the [Really Simple Licensing](https://rslstandard.org) standard, with an option of
collective bargaining. A [plugin](https://github.com/Jameswlepage/rsl-wp) currently
implements RSL as well as payment processing for WordPress sites.
@ -103,5 +103,3 @@ But even if you don't use Cloudflare's hard block, their list of [verified bots]
- [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman
- [Blocking AI web crawlers](https://underlap.org/blocking-ai-web-crawlers) by Glyn Normington
- [Block AI Bots from Crawling Websites Using Robots.txt](https://originality.ai/ai-bot-blocking) by Jonathan Gillham, Originality.AI

View file

@ -1,7 +1,7 @@
# Intro
If you're using Traefik as your reverse proxy in your docker setup, you might want to use it as well to centrally serve the ```/robots.txt``` for all your Traefik fronted services.
This can be achieved by configuring a single lightweight service to service static files and defining a high priority Traefik HTTP Router rule.
This can be achieved by configuring a single lightweight service to service static files and defining a high priority Traefik HTTP Router rule.
# Setup
Define a single service to serve the one robots.txt to rule them all. I'm using a lean nginx:alpine docker image in this example:
@ -31,7 +31,6 @@ networks:
external: true
```
The Traefik HTTP Routers rule explicitly does not contain a Hostname. Traefik will print a warning about this for the TLS setup but it will work. The high priority of 3000 should ensure this rule is evaluated first for incoming requests.
The Traefik HTTP Routers rule explicitly does not contain a Hostname. Traefik will print a warning about this for the TLS setup but it will work. The high priority of 3000 should ensure this rule is evaluated first for incoming requests.
Place your robots.txt in the local `./static/` directory and NGINX will serve it for all services behind your Traefik proxy.