Currently it was only possible to use page "save as" name or part of it
for generating pagination links. That's not sufficient when page URLs
differ a lot from actual filenames. With this patch it's possible to use
the `{url}` placeholder in PAGINATION_PATTERNS setting. For example, the
paginated archives would be saved to:
blog/index.html
blog/2/index.html
blog/3/index.html
while the actual URLs would be like this (with the help of Apache's
mod_rewrite):
http://blog.my.site/http://blog.my.site/2/http://blog.my.site/3/
The configuration that corresponds to this is roughly the following:
ARCHIVES_SAVE_AS = 'blog/index.html'
ARCHIVES_URL = 'http://blog.my.site/'
PAGINATION_PATTERNS = [
(1, '/{url}', '{base_name}/index.html'),
(2, '/{url}{number}/', '{base_name}/{number}/index.html')
]
Also added YEAR_ARCHIVE_URL, MONTH_ARCHIVE_URL and DAY_ARCHIVE_URL
settings, as they were missing and now they make sense.
Fix intrasite links for non-'summary' metadata
Metadata like `MyArticleBanner: ` would be properly parsed (as defined in `FORMATTED_FIELDS`), but the intrasite links would not be processed.
Only the summary gets its intrasite links processed, has its value is either generated from the content (calling self._update_content at some point) or self._update_content is explicitly called if summary is passed as metadata.
This PR expands the paths as soon as possible in (`Content.__init__`) for metadata defined in `FORMATTED_FIELDS`.
With this patch, there are new FEED_*_URL configuration options that
allow to specify custom URLs for feeds, which is helpful in case the
feed filename and the actual URL differ a lot -- for example if a feed
is saved to
blog/feeds/all.atom.xml
but the actual URL from the user PoV is
http://blog.your.site/feeds/all.atom.xml
This setting currently affects only the generated feed XML. This change
is also fully backwards compatible, so if the FEED_*_URL setting is not
present, the value of FEED_* is used for both file location and URL.
Allow for overriding individual templates from the theme by configuring
the Jinja2 `Environment` loader to search for templates in the
`THEME_TEMPLATES_OVERRIDES` path before the theme's `templates/`
directory.
* Consolidate validation of content
Previously we validated content outside of the content class via
calls to `is_valid_content` and some additional checks in page /
article generators (valid status).
This commit moves those checks all into content.valid() resulting
in a cleaner code structure.
This allows us to restructure how generators interact with content,
removing several old bugs in pelican (#1748, #1356, #2098).
- move verification function into content class
- move generator verifying content to contents class
- remove unused quote class
- remove draft class (no more rereading drafts)
- move auto draft status setter into Article.__init__
- add now parsing draft to basic test output
- remove problematic DEFAULT_STATUS setting
- add setter/getter for content.status
removes need for lower() calls when verifying status
* expand c4b184fa32
Mostly implement feedback by @iKevinY.
* rename content.valid to content.is_valid
* rename valid_* functions to has_valid_*
* update tests and function calls in code accordingly
STATIC_CREATE_LINKS = False
Create links instead of copying files. If the content and output
directories are on the same device, then create hard links. Falls
back to symbolic links if the output directory is on a different
filesystem. If symlinks are created, don’t forget to add the -L or
--copy-links option to rsync when uploading your site.
STATIC_CHECK_IF_MODIFIED = False
If set to True, and STATIC_CREATE_LINKS is False, compare mtimes of
content and output files, and only copy content files that are newer
than existing output files.
fix flake8 warnings
Set jinja environment defaults within settings
updating docs to remove JINJA_EXTENSIONS
update logger warning and defaults documentation
better way to grab jinja environment
updating settings after refactor
Python's shutil.copy2 fails on Android when copying a file's meta data (perm bits, access times) onto certain filesystems. This is documented as python issue28141 https://bugs.python.org/issue28141
These commits workaround that bug by
+ creating a new function copy_file_metadata in utils.py
+ wrapping calls to copy2 in a try/except clause that logs any errors that occur but keep execution going
+ changing the calls to shutil.copy2 to calls to the new function
The `PageGenerator` was building hidden pages, but was not making them
available in the context. This makes it difficult for other plugins to
operate on hidden pages.
This patch updates `PageGenerator` to export the hidden pages it finds
in the context as `hidden_pages`.
It also updates the article generator to export `drafts`.
ARTICLE_ORDER_BY wasn't doing anything because the ArticlesGenerator
was sorting articles after ARTICLE_ORDER_BY was applied. This fixes
that by adding the ability to reverse metadata order by adding the
option prefix 'reversed-' to metadata and changing the default value
to 'reversed-date'.
Relevant documentation is also updated and moved into a more appropriate
place ('Ordering Content' instead of 'URL settings').
* break out cache into cache.py
* break out cache-tests into test_cache.py
* fix broken cache tests
* replace non existing assert calls with self.assertEqual
* fix path for page caching test (was invalid)
* cleanup test code
* restructure generate_context in Article and Path Generator
* destinguish between valid/invalid files correctly and cache accordingly
* use cPickle if available for increased performance
Some metadata values cause problems when empty. For example, a markdown file
containing a Slug: line with no additional text causing Pelican to produce a
file named ".html" instead of generating a proper file name. Others, like
those created by a PATH_METADATA regex, must be preserved even if empty,
so things like PAGE_URL="filename{customvalue}.html" will always work.
Essentially, we want to discard empty metadata that we know will be useless
or problematic. This is better than raising an exception because (a) it
allows users to deliberately keep empty metadata in their source files for
filling in later, and (b) users shouldn't be forced to fix empty metadata
created by blog migration tools (see #1398).
The metadata processors are the ideal place to do this, because they know
the type of data they are handling and whether an empty value is wanted.
Unfortunately, they can't discard items, and neither can process_metadata(),
because their return values are always saved by calling code. We can't
safely change the calling code, because some of it lives in custom reader
classes out in the field, and we don't want to break those working systems.
Discarding empty values at the time of use isn't good enough, because that
still allows useless empty values in a source file to override configured
defaults.
My solution:
- When processing a list of values, a metadata processor will omit any
unwanted empty ones from the list it returns.
- When processing an entirely unwanted value, it will return something easily
identifiable that will pass through the reader code.
- When collecting the processed metadata, read_file() will filter out items
identified as unwanted.
These metadata are affected by this change:
author, authors, category, slug, status, tags.
I also removed a bit of now-superfluous code from generators.py that was
discarding empty authors at the time of use.
The old code was naively comparing the strings in PAGE_EXCLUDES to the
subdirectory names produced by os.walk(). (Same with ARTICLE_EXCLUDES.)
This had two surprising effects:
Setting PAGE_EXCLUDES=['foo'] would exclude all directories named foo,
regardless of whether they were in the top-level content directory or
nested deep within a directory whose contents should not be excluded.
Setting PAGE_EXCLUDES=['subdir/foo'] would never exclude any directories.
In other words, there is no way to exclude a subdirectory without risking
the accidental exclusion of other directories with the same name elsewhere
in the file system.
This change fixes the problem, so 'subdir/foo' and 'foo' will be distinct
and both work as expected. If anyone out there is depending on the old
behavior, they will have to update their settings. I don't expect it to
affect most users yet, since Pelican doesn't yet make nested directory
structures very useful. When it does, this fix will become important to
more people.
This change partially addresses issue #1019, by teaching Pelican to distinguish
between static files and content source files. A user can now safely add the
same directory to both STATIC_PATHS and PAGE_PATHS (or ARTICLE_PATHS). Pelican
will then process the content source files in that directory normally, and
treat the remaining files as static, without copying the raw content source
files to the output directory. (The OUTPUT_SOURCES setting still works.)
In other words, images and markdown/reST files can now safely live together.
To keep those files together in the generated site, STATIC_SAVE_AS and
PAGE_SAVE_AS (or ARTICLE_SAVE_AS) should point to the same output directory.
There are two new configuration settings:
STATIC_EXCLUDES=[] # This works just like PAGE_EXCLUDES and ARTICLE_EXCLUDES.
STATIC_EXCLUDE_SOURCES=True # Set this to False to get the old behavior.
Two small but noteworthy internal changes:
StaticGenerator now runs after all the other generators. This allows it to see
which files are meant to be processed by other generators, and avoid them.
Generators now include files that they fail to process (e.g. those with missing
mandatory metadata) along with all the other paths in context['filenames'].
This allows such files to be excluded from StaticGenerator's file list, so they
won't end up accidentally published. Since these files have no Content object,
their value in context['filenames'] is None. The code that uses that dict has
been updated accordingly.
Old system was using manual string formatting for log messages.
This caused issues with common operations like exception logging
because often they need to be handled differently for Py2/Py3
compatibility. In order to unify the effort:
- All logging is changed to `logging.level(msg, arg1, arg2)` style.
- A `SafeLogger` is implemented to auto-decode exceptions properly
in the args (ref #1403).
- Custom formatters were overriding useful logging functionality
like traceback outputing (ref #1402). They are refactored to be
more transparent. Traceback information is provided in `--debug`
mode for `read_file` errors in generators.
- Formatters will now auto-format multiline log messages in order
to make them look related. Similarly, traceback will be formatted in
the same fashion.
- `pelican.log.LimitFilter` was (ab)using logging message which
would result in awkward syntax for argumented logging style. This
functionality is moved to `extra` keyword argument.
- Levels for errors that would result skipping a file (`read_file`)
changed from `warning` to `error` in order to make them stand out
among other logs.
- Small consistency changes to log messages (i.e. changing all
to start with an uppercase letter) and quality-of-life improvements
(some log messages were dumping raw object information).
reverts getpelican/pelican@ddcccfeaa9
If one used a locale that made use of unicode characters (like fr_FR.UTF-8)
the files on disk would be in correct locale while links would be to C.
Uses a SafeDatetime class that works with unicode format strigns
by using custom strftime to prevent ascii decoding errors with Python2.
Also added unicode decoding for the calendar module to fix period
archives.
Instead of one path a list can be given. This is due to popular request.
Should help people not wanting to use Pelican for blogging.
Maintain backward compatibility though.
Thanks to @ingwinlu for pointing out the change in StaticGenerator.