Allow text substitutions when generating slugs

The `slugify()` function used by Pelican is in general very good at
coming up with something both readable and URL-safe. However, there are
a few specific cases where it causes conflicts. One that I've run into
is using the strings `C++` and `C` as tags, both of which transform to
the slug `c`. This commit adds an optional `SLUG_SUBSTITUTIONS` setting
which is a list of 2-tuples of substitutions to be carried out
case-insensitively just prior to stripping out non-alphanumeric
characters. This allows cases like `C++` to be transformed to `CPP` or
similar. This can also improve the readability of slugs.
This commit is contained in:
Andy Pearce 2013-06-14 15:54:06 +01:00
commit 39518e15ef
6 changed files with 28 additions and 8 deletions

View file

@ -231,7 +231,7 @@ class pelican_open(object):
pass
def slugify(value):
def slugify(value, substitutions=()):
"""
Normalizes string, converts to lowercase, removes non-alpha characters,
and converts spaces to hyphens.
@ -249,8 +249,10 @@ def slugify(value):
if isinstance(value, six.binary_type):
value = value.decode('ascii')
# still unicode
value = unicodedata.normalize('NFKD', value)
value = re.sub('[^\w\s-]', '', value).strip().lower()
value = unicodedata.normalize('NFKD', value).lower()
for src, dst in substitutions:
value = value.replace(src.lower(), dst.lower())
value = re.sub('[^\w\s-]', '', value).strip()
value = re.sub('[-\s]+', '-', value)
# we want only ASCII chars
value = value.encode('ascii', 'ignore')