Check for 0 dates.
For my own blog this means it doesn't break during import, but I don't know pelican well enough yet to say if this is correct or not.
Error that I was getting before applying this fix
```
Traceback (most recent call last):
File "/mnt/data/home/stu/.virtualenvs/blog/bin/pelican-import", line 11, in <module>
sys.exit(main())
File "/mnt/data/home/stu/.virtualenvs/blog/local/lib/python2.7/site-packages/pelican/tools/pelican_import.py", line 896, in main
attachments=attachments or None)
File "/mnt/data/home/stu/.virtualenvs/blog/local/lib/python2.7/site-packages/pelican/tools/pelican_import.py", line 684, in fields2pelican
kind, in_markup) in fields:
File "/mnt/data/home/stu/.virtualenvs/blog/local/lib/python2.7/site-packages/pelican/tools/pelican_import.py", line 163, in wp2fields
date_object = time.strptime(raw_date, '%Y-%m-%d %H:%M:%S')
File "/usr/lib/python2.7/_strptime.py", line 478, in _strptime_time
return _strptime(data_string, format)[0]
File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data u'0000-00-00 00:00:00' does not match format u'%Y-%m-%d %H:%M:%S'
```
* Consolidate validation of content
Previously we validated content outside of the content class via
calls to `is_valid_content` and some additional checks in page /
article generators (valid status).
This commit moves those checks all into content.valid() resulting
in a cleaner code structure.
This allows us to restructure how generators interact with content,
removing several old bugs in pelican (#1748, #1356, #2098).
- move verification function into content class
- move generator verifying content to contents class
- remove unused quote class
- remove draft class (no more rereading drafts)
- move auto draft status setter into Article.__init__
- add now parsing draft to basic test output
- remove problematic DEFAULT_STATUS setting
- add setter/getter for content.status
removes need for lower() calls when verifying status
* expand c4b184fa32
Mostly implement feedback by @iKevinY.
* rename content.valid to content.is_valid
* rename valid_* functions to has_valid_*
* update tests and function calls in code accordingly
The RstReader class can now use user-specified writer/translator classes
instead of the hardcoded ones from docutils. This allows for far easier
overriding of the default HTML output -- in the past one would need to
override the internal _parse_metadata() and _get_publisher() functions.
With hypothetical Html5Writer and Html5FieldBodyTranslator classes,
based for example on docutils.writers.html5_polyglot.Writer and
docutils.writers.html5_polyglot.HTMLTranslator, a plugin that overrides
the default behavior would now look just like this:
# (definition of Writer / Translator classes omitted)
class Html5RstReader(RstReader):
writer_class = Html5Writer
field_body_translator_class = Html5FieldBodyTranslator
def add_reader(readers):
readers.reader_classes['rst'] = Html5RstReader
def register():
pelican.signals.readers_init.connect(add_reader)
a html tag always starts with <[a-z], < [a-z] is invalid
a space can be found after the = in href='bleh'
This function is taking 10% of the compilation time, with caching enabled,
maybe it's worth optimising the regexp a bit more, I don't know.
Starting with python 3.6 warnings are issued for invalid escape
sequences in regular expressions. This commit corrects all
DeprecationWarning's via properly declaring the offending
regular expressions as raw strings.
Resolves#2095.
This is relevant when using optional items in the expression. E.g. if an
optional captured group is not matched, the result of
`match.groupdict()` contains the captured group with value `None`.