Allow text substitutions when generating slugs

The `slugify()` function used by Pelican is in general very good at coming up with something both readable and URL-safe. However, there are a few specific cases where it causes conflicts. One that I've run into is using the strings `C++` and `C` as tags, both of which transform to the slug `c`. This commit adds an optional `SLUG_SUBSTITUTIONS` setting which is a list of 2-tuples of substitutions to be carried out case-insensitively just prior to stripping out non-alphanumeric characters. This allows cases like `C++` to be transformed to `CPP` or similar. This can also improve the readability of slugs.
2025-10-15 20:28:56 +02:00 · 2013-06-14 15:54:06 +01:00 · 2013-06-14 15:54:06 +01:00 · 39518e15ef
commit 39518e15ef
parent 7ec4d5faa2
6 changed files with 28 additions and 8 deletions
--- a/pelican/utils.py
+++ b/pelican/utils.py
@ -231,7 +231,7 @@ class pelican_open(object):
        pass


-def slugify(value):
+def slugify(value, substitutions=()):
    """
    Normalizes string, converts to lowercase, removes non-alpha characters,
    and converts spaces to hyphens.
@ -249,8 +249,10 @@ def slugify(value):
    if isinstance(value, six.binary_type):
        value = value.decode('ascii')
    # still unicode
-    value = unicodedata.normalize('NFKD', value)
-    value = re.sub('[^\w\s-]', '', value).strip().lower()
+    value = unicodedata.normalize('NFKD', value).lower()
+    for src, dst in substitutions:
+        value = value.replace(src.lower(), dst.lower())
+    value = re.sub('[^\w\s-]', '', value).strip()
    value = re.sub('[-\s]+', '-', value)
    # we want only ASCII chars
    value = value.encode('ascii', 'ignore')