When importing from Wordpress, the --dir-page directive (disabled by
default) automatically adds files to the pages/ when they are recognised
as pages, as opposed to posts.
Turn invalid characters into underscores, remove leading dots and enforce
a maximum length. Should be fine on main file systems used by Windows, Mac OS
and Linux.
Thanks to @Avaris for helping to clean my code.
When a WP XML file is imported, items with missing title are generated with a
title which is probably not the good one (instead of being dropped), and a
warning is displayed to the user.
Quick fix for this traceback:
$ pelican-import --wpfile ~/Downloads/mysite.wordpress.2013-02-24.xml
Traceback (most recent call last):
File "/Users/me/.virtualenvs/pelican/bin/pelican-import", line 8, in <module>
load_entry_point('pelican==3.2', 'console_scripts', 'pelican-import')()
File "/Users/me/.virtualenvs/pelican/src/pelican/pelican/tools/pelican_import.py", line 363, in main
disable_slugs=args.disable_slugs or False)
File "/Users/me/.virtualenvs/pelican/src/pelican/pelican/tools/pelican_import.py", line 238, in fields2pelican
for title, content, filename, date, author, categories, tags, in_markup in fields:
File "/Users/me/.virtualenvs/pelican/src/pelican/pelican/tools/pelican_import.py", line 37, in wp2fields
if item.fetch('wp:status')[0].contents[0] == "publish":
TypeError: 'NoneType' object is not callable
I'm a BeautifulSoup novice but these changes allowed me to import two of my wordpress.xml files.
Argument index is included in .format() method format string in order to be friendly with various Python versions and consistent with the rest of the code.