1
0
Fork 0
forked from github/pelican
pelican-theme/docs/importer.rst

143 lines
5.5 KiB
ReStructuredText
Raw Normal View History

.. _import:
Importing an existing site
##########################
2012-12-18 14:03:05 +01:00
2011-08-03 22:06:10 +02:00
Description
===========
2012-12-18 14:03:05 +01:00
``pelican-import`` is a command-line tool for converting articles from other
2013-04-16 10:13:47 -07:00
software to reStructuredText or Markdown. The supported import formats are:
2011-08-03 22:06:10 +02:00
2018-08-07 14:06:46 +02:00
- Blogger XML export
2011-08-03 22:06:10 +02:00
- Dotclear export
- Posterous API
- Tumblr API
2018-08-07 14:06:46 +02:00
- WordPress XML export
- RSS/Atom feed
2011-08-03 22:06:10 +02:00
2012-12-18 14:03:05 +01:00
The conversion from HTML to reStructuredText or Markdown relies on `Pandoc`_.
For Dotclear, if the source posts are written with Markdown syntax, they will
not be converted (as Pelican also supports Markdown).
.. note::
Unlike Pelican, Wordpress supports multiple categories per article. These
are imported as a comma-separated string. You have to resolve these
manually, or use a plugin that enables multiple categories per article
(like `more_categories`_).
2012-12-18 14:03:05 +01:00
Dependencies
============
2013-04-16 10:13:47 -07:00
``pelican-import`` has some dependencies not required by the rest of Pelican:
2018-11-02 20:53:15 -06:00
- *BeautifulSoup4* and *lxml*, for WordPress and Dotclear import. Can be
installed like any other Python package (``pip install BeautifulSoup4
lxml``).
2012-12-18 14:03:05 +01:00
- *Feedparser*, for feed import (``pip install feedparser``).
- *Pandoc*, see the `Pandoc site`_ for installation instructions on your
operating system.
2012-12-18 14:03:05 +01:00
.. _Pandoc: http://johnmacfarlane.net/pandoc/
.. _Pandoc site: http://johnmacfarlane.net/pandoc/installing.html
2011-08-03 22:06:10 +02:00
Usage
2012-12-18 14:03:05 +01:00
=====
::
2011-08-03 22:06:10 +02:00
2018-08-07 14:06:46 +02:00
pelican-import [-h] [--blogger] [--dotclear] [--posterous] [--tumblr] [--wpfile] [--feed]
[-o OUTPUT] [-m MARKUP] [--dir-cat] [--dir-page] [--strip-raw] [--wp-custpost]
[--wp-attach] [--disable-slugs] [-e EMAIL] [-p PASSWORD] [-b BLOGNAME]
input|api_token|api_key
2011-08-03 22:06:10 +02:00
Positional arguments
2012-12-18 14:03:05 +01:00
--------------------
============= ============================================================================
``input`` The input file to read
``api_token`` (Posterous only) api_token can be obtained from http://posterous.com/api/
``api_key`` (Tumblr only) api_key can be obtained from http://www.tumblr.com/oauth/apps
============= ============================================================================
Optional arguments
2012-12-18 14:03:05 +01:00
------------------
2011-08-03 22:06:10 +02:00
2012-12-18 14:03:05 +01:00
-h, --help Show this help message and exit
2018-08-07 14:06:46 +02:00
--blogger Blogger XML export (default: False)
--dotclear Dotclear export (default: False)
--posterous Posterous API (default: False)
--tumblr Tumblr API (default: False)
2018-08-07 14:06:46 +02:00
--wpfile WordPress XML export (default: False)
--feed Feed to parse (default: False)
2011-08-03 22:06:10 +02:00
-o OUTPUT, --output OUTPUT
Output path (default: content)
-m MARKUP, --markup MARKUP
Output markup format (supports rst & markdown)
(default: rst)
2011-08-03 22:06:10 +02:00
--dir-cat Put files in directories with categories name
(default: False)
--dir-page Put files recognised as pages in "pages/" sub-
2018-08-07 14:06:46 +02:00
directory (blogger and wordpress import only)
(default: False)
--filter-author Import only post from the specified author
--strip-raw Strip raw HTML code that can't be converted to markup
such as flash embeds or iframes (wordpress import
only) (default: False)
--wp-custpost Put wordpress custom post types in directories. If
used with --dir-cat option directories will be created
as "/post_type/category/" (wordpress import only)
--wp-attach Download files uploaded to wordpress as attachments.
Files will be added to posts as a list in the post
header and links to the files within the post will be
updated. All files will be downloaded, even if they
aren't associated with a post. Files will be downloaded
with their original path inside the output directory,
e.g. "output/wp-uploads/date/postname/file.jpg".
(wordpress import only) (requires an internet
connection)
--disable-slugs Disable storing slugs from imported posts within
output. With this disabled, your Pelican URLs may not
be consistent with your original posts. (default:
False)
-e EMAIL, --email=EMAIL
Email used to authenticate Posterous API
-p PASSWORD, --password=PASSWORD
Password used to authenticate Posterous API
-b BLOGNAME, --blogname=BLOGNAME
Blog name used in Tumblr API
2011-08-03 22:06:10 +02:00
2012-12-18 14:03:05 +01:00
2011-08-03 22:06:10 +02:00
Examples
========
2018-08-07 14:06:46 +02:00
For Blogger::
2011-08-03 22:06:10 +02:00
2018-08-07 14:06:46 +02:00
$ pelican-import --blogger -o ~/output ~/posts.xml
2011-08-03 22:06:10 +02:00
For Dotclear::
2011-08-03 22:06:10 +02:00
$ pelican-import --dotclear -o ~/output ~/backup.txt
2011-08-03 22:06:10 +02:00
for Posterous::
$ pelican-import --posterous -o ~/output --email=<email_address> --password=<password> <api_token>
2012-12-18 14:03:05 +01:00
For Tumblr::
$ pelican-import --tumblr -o ~/output --blogname=<blogname> <api_token>
2013-04-16 10:13:47 -07:00
2018-08-07 14:06:46 +02:00
For WordPress::
$ pelican-import --wpfile -o ~/output ~/posts.xml
2011-08-03 22:06:10 +02:00
Tests
=====
To test the module, one can use sample files:
- for WordPress: http://www.wpbeginner.com/wp-themes/how-to-add-dummy-content-for-theme-development-in-wordpress/
- for Dotclear: http://media.dotaddict.org/tda/downloads/lorem-backup.txt
.. _more_categories: http://github.com/getpelican/pelican-plugins/tree/master/more_categories