diff --git a/.editorconfig b/.editorconfig index 589f816..ef3e1a7 100644 --- a/.editorconfig +++ b/.editorconfig @@ -4,3 +4,6 @@ root = true end_of_line = lf insert_final_newline = true trim_trailing_whitespace = true + +[{Caddyfile,haproxy-block-ai-bots.txt,nginx-block-ai-bots.conf}] +insert_final_newline = false diff --git a/README.md b/README.md index 191104b..586e9f7 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ This list contains AI-related crawlers of all types, regardless of purpose. We encourage you to contribute to and implement this list on your own site. See [information about the listed crawlers](./table-of-bot-metrics.md) and the [FAQ](https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.md). -A number of these crawlers have been sourced from [Dark Visitors](https://darkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers. +A number of these crawlers have been sourced from [Dark Visitors](https://darkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers. If you'd like to add information about a crawler to the list, please make a pull request with the bot name added to `robots.txt`, `ai.txt`, and any relevant details in `table-of-bot-metrics.md` to help people understand what's crawling. @@ -86,8 +86,8 @@ Alternatively, you can also subscribe to new releases with your GitHub account b ## License content with RSL -It is also possible to license your content to AI companies in `robots.txt` using -the [Really Simple Licensing](https://rslstandard.org) standard, with an option of +It is also possible to license your content to AI companies in `robots.txt` using +the [Really Simple Licensing](https://rslstandard.org) standard, with an option of collective bargaining. A [plugin](https://github.com/Jameswlepage/rsl-wp) currently implements RSL as well as payment processing for WordPress sites. @@ -103,5 +103,3 @@ But even if you don't use Cloudflare's hard block, their list of [verified bots] - [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman - [Blocking AI web crawlers](https://underlap.org/blocking-ai-web-crawlers) by Glyn Normington - [Block AI Bots from Crawling Websites Using Robots.txt](https://originality.ai/ai-bot-blocking) by Jonathan Gillham, Originality.AI - - diff --git a/docs/traefik-manual-setup.md b/docs/traefik-manual-setup.md index 2bb8d33..0760016 100644 --- a/docs/traefik-manual-setup.md +++ b/docs/traefik-manual-setup.md @@ -1,7 +1,7 @@ # Intro If you're using Traefik as your reverse proxy in your docker setup, you might want to use it as well to centrally serve the ```/robots.txt``` for all your Traefik fronted services. -This can be achieved by configuring a single lightweight service to service static files and defining a high priority Traefik HTTP Router rule. +This can be achieved by configuring a single lightweight service to service static files and defining a high priority Traefik HTTP Router rule. # Setup Define a single service to serve the one robots.txt to rule them all. I'm using a lean nginx:alpine docker image in this example: @@ -31,7 +31,6 @@ networks: external: true ``` - -The Traefik HTTP Routers rule explicitly does not contain a Hostname. Traefik will print a warning about this for the TLS setup but it will work. The high priority of 3000 should ensure this rule is evaluated first for incoming requests. +The Traefik HTTP Routers rule explicitly does not contain a Hostname. Traefik will print a warning about this for the TLS setup but it will work. The high priority of 3000 should ensure this rule is evaluated first for incoming requests. Place your robots.txt in the local `./static/` directory and NGINX will serve it for all services behind your Traefik proxy.