We want to hint at the location of our default set of substitutions.
This should allow easier reuse for plugin authors who need to access
this utility as well.
Closes#2845
Improve _HTMLWordTruncator by using more than one unicode block in
_word_regex, making word count function behave properly with CJK,
Cyrillic, and more Latin characters when generating summary.
Combined file and folder watchers under a class and refactored
common watcher related code from __init__.py to the class.
This simplifies the main and autoreload functions in __init__
as well as fix the problem with crashes related to multiprocessing
on systems where default spawn mode is "spawn" instead of "fork".
Adds a use_unicode kwarg to slugify to keep unicode
characters as is (no ASCII-fying) and add tests for
it. Also reworks how slugification logic.
slugify started with the Django method for slugiying:
- Normalize to compatibility decomposed from (NFKD)
- Encode and decode with 'ascii'
This works fine if the decomposed form contains ASCII
characters (i.e. ç can be changed in to c+CEDILLA and
ASCII would keep c only), but fails when decomposition
doesn't result in ASCII characters (i.e. Chinese). To
solve that 'unidecode' was added, which works fine for
both cases. However, old method is now redundant but
was kept. This commit removes the old method and
adjusts logic slightly.
Now slugify will normalize all text with composition
mode (NFKC) to unify format for regex substitutions.
And then if use_unicode is False, uses unidecode to
convert it to ASCII.
Adds a `preserve_case` parameter to the `slugify()` function and uses it
to preserve capital letters in category names when using the Pelican
importer.
This commit removes Six as a dependency for Pelican, replacing the
relevant aliases with the proper Python 3 imports. It also removes
references to Python 2 logic that did not require Six.
Invalid references like those missing semicolons (e.g. `&mdash`) or
those causing overflows (e.g. `�`) are now gracefully
handled and no exception is thrown.
This commit also adds tests and comments where needed.
Starting with python 3.6 warnings are issued for invalid escape
sequences in regular expressions. This commit corrects all
DeprecationWarning's via properly declaring the offending
regular expressions as raw strings.
Resolves#2095.
Python's shutil.copy2 fails on Android when copying a file's meta data (perm bits, access times) onto certain filesystems. This is documented as python issue28141 https://bugs.python.org/issue28141
These commits workaround that bug by
+ creating a new function copy_file_metadata in utils.py
+ wrapping calls to copy2 in a try/except clause that logs any errors that occur but keep execution going
+ changing the calls to shutil.copy2 to calls to the new function
Also update HTML output by running (after making sure to have the fr_FR.utf8
locale installed):
```sh
LC_ALL=en_US.utf8 pelican -o pelican/tests/output/custom/ -s samples/pelican.conf.py samples/content/
LC_ALL=fr_FR.utf8 pelican -o pelican/tests/output/custom_locale/ -s samples/pelican.conf_FR.py samples/content/
LC_ALL=en_US.utf8 pelican -o pelican/tests/output/basic/ samples/content/
```
as described at
http://docs.getpelican.com/en/3.6.3/contribute.html#running-the-test-suite
* Wrap HTML attributes in quotes according to their content. If it contains a double quote use single quotes, otherwise escape with double quotes.
* Add escape_html utility to ensure quote entities are converted identically across Python versions.
Fixes#1260