Specific options passed to pandoc2 in order to get similar results than
with pandoc1:
- Disable smart quotes from the markdown output.
- Enable raw parsing from html.
Check for 0 dates.
For my own blog this means it doesn't break during import, but I don't know pelican well enough yet to say if this is correct or not.
Error that I was getting before applying this fix
```
Traceback (most recent call last):
File "/mnt/data/home/stu/.virtualenvs/blog/bin/pelican-import", line 11, in <module>
sys.exit(main())
File "/mnt/data/home/stu/.virtualenvs/blog/local/lib/python2.7/site-packages/pelican/tools/pelican_import.py", line 896, in main
attachments=attachments or None)
File "/mnt/data/home/stu/.virtualenvs/blog/local/lib/python2.7/site-packages/pelican/tools/pelican_import.py", line 684, in fields2pelican
kind, in_markup) in fields:
File "/mnt/data/home/stu/.virtualenvs/blog/local/lib/python2.7/site-packages/pelican/tools/pelican_import.py", line 163, in wp2fields
date_object = time.strptime(raw_date, '%Y-%m-%d %H:%M:%S')
File "/usr/lib/python2.7/_strptime.py", line 478, in _strptime_time
return _strptime(data_string, format)[0]
File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data u'0000-00-00 00:00:00' does not match format u'%Y-%m-%d %H:%M:%S'
```
Upon import of a dotclear backup, `pelican-import` returned this stacktrace:
```
File "/usr/bin/pelican-import", line 11, in <module>
sys.exit(main())
File "/usr/lib/python3.4/site-packages/pelican/tools/pelican_import.py", line 809, in main
attachments = attachments or None)
File "/usr/lib/python3.4/site-packages/pelican/tools/pelican_import.py", line 621, in fields2pelican
kind, in_markup) in fields:
File "/usr/lib/python3.4/site-packages/pelican/tools/pelican_import.py", line 262, in dc2fields
if int(tag[:1]) == 1:
ValueError: invalid literal for int() with base 10: 'a'
```
* Fix {filename} links on Windows.
Otherwise '{filename}/foo/bar.jpg' doesn't work
* Clean up relative Posix path handling in contents.
* Use Posix paths in readers
* Environment for Popen must be strs, not unicodes.
* Ignore Git CRLF warnings.
* Replace CRLFs with LFs in inputs on Windows.
* Fix importer tests
* Fix test_contents
* Fix one last backslash in paginated output
* Skip the remaining failing locale tests on Windows.
* Document the use of forward slashes on Windows.
* Add some Fabric and ghp-import notes
Old system was using manual string formatting for log messages.
This caused issues with common operations like exception logging
because often they need to be handled differently for Py2/Py3
compatibility. In order to unify the effort:
- All logging is changed to `logging.level(msg, arg1, arg2)` style.
- A `SafeLogger` is implemented to auto-decode exceptions properly
in the args (ref #1403).
- Custom formatters were overriding useful logging functionality
like traceback outputing (ref #1402). They are refactored to be
more transparent. Traceback information is provided in `--debug`
mode for `read_file` errors in generators.
- Formatters will now auto-format multiline log messages in order
to make them look related. Similarly, traceback will be formatted in
the same fashion.
- `pelican.log.LimitFilter` was (ab)using logging message which
would result in awkward syntax for argumented logging style. This
functionality is moved to `extra` keyword argument.
- Levels for errors that would result skipping a file (`read_file`)
changed from `warning` to `error` in order to make them stand out
among other logs.
- Small consistency changes to log messages (i.e. changing all
to start with an uppercase letter) and quality-of-life improvements
(some log messages were dumping raw object information).
reverts getpelican/pelican@ddcccfeaa9
If one used a locale that made use of unicode characters (like fr_FR.UTF-8)
the files on disk would be in correct locale while links would be to C.
Uses a SafeDatetime class that works with unicode format strigns
by using custom strftime to prevent ascii decoding errors with Python2.
Also added unicode decoding for the calendar module to fix period
archives.
The download_attachments error is triggered in the unit tests by a japanese
error message (接続を拒否されました) (connexion denied), that
python is not able to serialize the into a byte string.
This error weirdly does not appear every time the unit tests are run.
It might be related to the order in which the tests are run.
This error was found and fixed during the PyconUS 2014 pelican
sprint. It was discovered on a Linux Fedora20 computer running
Python2.7 in virtualenv
When importing from Wordpress, the --dir-page directive (disabled by
default) automatically adds files to the pages/ when they are recognised
as pages, as opposed to posts.
Turn invalid characters into underscores, remove leading dots and enforce
a maximum length. Should be fine on main file systems used by Windows, Mac OS
and Linux.
Thanks to @Avaris for helping to clean my code.
When a WP XML file is imported, items with missing title are generated with a
title which is probably not the good one (instead of being dropped), and a
warning is displayed to the user.
Quick fix for this traceback:
$ pelican-import --wpfile ~/Downloads/mysite.wordpress.2013-02-24.xml
Traceback (most recent call last):
File "/Users/me/.virtualenvs/pelican/bin/pelican-import", line 8, in <module>
load_entry_point('pelican==3.2', 'console_scripts', 'pelican-import')()
File "/Users/me/.virtualenvs/pelican/src/pelican/pelican/tools/pelican_import.py", line 363, in main
disable_slugs=args.disable_slugs or False)
File "/Users/me/.virtualenvs/pelican/src/pelican/pelican/tools/pelican_import.py", line 238, in fields2pelican
for title, content, filename, date, author, categories, tags, in_markup in fields:
File "/Users/me/.virtualenvs/pelican/src/pelican/pelican/tools/pelican_import.py", line 37, in wp2fields
if item.fetch('wp:status')[0].contents[0] == "publish":
TypeError: 'NoneType' object is not callable
I'm a BeautifulSoup novice but these changes allowed me to import two of my wordpress.xml files.