2011-08-11 23:34:53 +02:00
|
|
|
.. _import:
|
|
|
|
|
|
2014-05-12 07:48:37 -07:00
|
|
|
Importing an existing site
|
|
|
|
|
##########################
|
2012-12-18 14:03:05 +01:00
|
|
|
|
2011-08-03 22:06:10 +02:00
|
|
|
Description
|
|
|
|
|
===========
|
|
|
|
|
|
2012-12-18 14:03:05 +01:00
|
|
|
``pelican-import`` is a command-line tool for converting articles from other
|
2013-04-16 10:13:47 -07:00
|
|
|
software to reStructuredText or Markdown. The supported import formats are:
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2018-08-07 14:06:46 +02:00
|
|
|
- Blogger XML export
|
2011-08-03 22:06:10 +02:00
|
|
|
- Dotclear export
|
2013-03-24 19:19:26 +05:00
|
|
|
- Posterous API
|
2013-05-04 06:13:11 +08:00
|
|
|
- Tumblr API
|
2018-08-07 14:06:46 +02:00
|
|
|
- WordPress XML export
|
2012-03-06 06:13:17 -08:00
|
|
|
- RSS/Atom feed
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2012-12-18 14:03:05 +01:00
|
|
|
The conversion from HTML to reStructuredText or Markdown relies on `Pandoc`_.
|
|
|
|
|
For Dotclear, if the source posts are written with Markdown syntax, they will
|
|
|
|
|
not be converted (as Pelican also supports Markdown).
|
2012-04-12 19:53:03 -07:00
|
|
|
|
2018-11-06 09:10:00 +01:00
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
Unlike Pelican, Wordpress supports multiple categories per article. These
|
|
|
|
|
are imported as a comma-separated string. You have to resolve these
|
2021-03-19 12:58:58 +01:00
|
|
|
manually, or use a plugin such as `More Categories`_ that enables multiple
|
|
|
|
|
categories per article.
|
2012-04-12 19:53:03 -07:00
|
|
|
|
2012-12-18 14:03:05 +01:00
|
|
|
Dependencies
|
|
|
|
|
============
|
2012-04-12 19:53:03 -07:00
|
|
|
|
2013-04-16 10:13:47 -07:00
|
|
|
``pelican-import`` has some dependencies not required by the rest of Pelican:
|
2012-04-12 19:53:03 -07:00
|
|
|
|
2018-11-02 20:53:15 -06:00
|
|
|
- *BeautifulSoup4* and *lxml*, for WordPress and Dotclear import. Can be
|
|
|
|
|
installed like any other Python package (``pip install BeautifulSoup4
|
|
|
|
|
lxml``).
|
2012-12-18 14:03:05 +01:00
|
|
|
- *Feedparser*, for feed import (``pip install feedparser``).
|
|
|
|
|
- *Pandoc*, see the `Pandoc site`_ for installation instructions on your
|
|
|
|
|
operating system.
|
2012-04-12 19:53:03 -07:00
|
|
|
|
2020-04-12 09:38:35 -05:00
|
|
|
.. _Pandoc: https://pandoc.org/
|
|
|
|
|
.. _Pandoc site: https://pandoc.org/installing.html
|
2012-04-12 19:53:03 -07:00
|
|
|
|
|
|
|
|
|
2011-08-03 22:06:10 +02:00
|
|
|
Usage
|
2012-12-18 14:03:05 +01:00
|
|
|
=====
|
|
|
|
|
|
|
|
|
|
::
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2018-08-07 14:06:46 +02:00
|
|
|
pelican-import [-h] [--blogger] [--dotclear] [--posterous] [--tumblr] [--wpfile] [--feed]
|
|
|
|
|
[-o OUTPUT] [-m MARKUP] [--dir-cat] [--dir-page] [--strip-raw] [--wp-custpost]
|
2018-06-22 22:36:43 +02:00
|
|
|
[--wp-attach] [--disable-slugs] [-e EMAIL] [-p PASSWORD] [-b BLOGNAME]
|
2013-05-04 06:13:11 +08:00
|
|
|
input|api_token|api_key
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2012-11-07 19:14:12 +10:00
|
|
|
Positional arguments
|
2012-12-18 14:03:05 +01:00
|
|
|
--------------------
|
2014-11-25 17:38:37 +02:00
|
|
|
============= ============================================================================
|
|
|
|
|
``input`` The input file to read
|
|
|
|
|
``api_token`` (Posterous only) api_token can be obtained from http://posterous.com/api/
|
2020-04-12 09:38:35 -05:00
|
|
|
``api_key`` (Tumblr only) api_key can be obtained from https://www.tumblr.com/oauth/apps
|
2014-11-25 17:38:37 +02:00
|
|
|
============= ============================================================================
|
2012-11-07 19:14:12 +10:00
|
|
|
|
2012-04-12 19:53:03 -07:00
|
|
|
Optional arguments
|
2012-12-18 14:03:05 +01:00
|
|
|
------------------
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2012-12-18 14:03:05 +01:00
|
|
|
-h, --help Show this help message and exit
|
2018-08-07 14:06:46 +02:00
|
|
|
--blogger Blogger XML export (default: False)
|
2012-11-07 19:14:12 +10:00
|
|
|
--dotclear Dotclear export (default: False)
|
2013-03-24 19:19:26 +05:00
|
|
|
--posterous Posterous API (default: False)
|
2013-05-04 06:13:11 +08:00
|
|
|
--tumblr Tumblr API (default: False)
|
2018-08-07 14:06:46 +02:00
|
|
|
--wpfile WordPress XML export (default: False)
|
2012-11-07 19:14:12 +10:00
|
|
|
--feed Feed to parse (default: False)
|
2011-08-03 22:06:10 +02:00
|
|
|
-o OUTPUT, --output OUTPUT
|
2017-11-09 06:56:05 +01:00
|
|
|
Output path (default: content)
|
2012-11-07 19:14:12 +10:00
|
|
|
-m MARKUP, --markup MARKUP
|
2020-04-16 08:01:10 +02:00
|
|
|
Output markup format: ``rst``, ``markdown``, or ``asciidoc``
|
|
|
|
|
(default: ``rst``)
|
2011-08-03 22:06:10 +02:00
|
|
|
--dir-cat Put files in directories with categories name
|
2012-11-07 19:14:12 +10:00
|
|
|
(default: False)
|
2013-04-19 23:06:59 +01:00
|
|
|
--dir-page Put files recognised as pages in "pages/" sub-
|
2018-08-07 14:06:46 +02:00
|
|
|
directory (blogger and wordpress import only)
|
|
|
|
|
(default: False)
|
2018-06-22 22:36:43 +02:00
|
|
|
--filter-author Import only post from the specified author
|
2012-11-07 19:14:12 +10:00
|
|
|
--strip-raw Strip raw HTML code that can't be converted to markup
|
|
|
|
|
such as flash embeds or iframes (wordpress import
|
|
|
|
|
only) (default: False)
|
2018-06-22 22:36:43 +02:00
|
|
|
--wp-custpost Put wordpress custom post types in directories. If
|
|
|
|
|
used with --dir-cat option directories will be created
|
|
|
|
|
as "/post_type/category/" (wordpress import only)
|
|
|
|
|
--wp-attach Download files uploaded to wordpress as attachments.
|
|
|
|
|
Files will be added to posts as a list in the post
|
2018-07-09 11:26:50 +02:00
|
|
|
header and links to the files within the post will be
|
|
|
|
|
updated. All files will be downloaded, even if they
|
2018-06-22 22:36:43 +02:00
|
|
|
aren't associated with a post. Files will be downloaded
|
|
|
|
|
with their original path inside the output directory,
|
2018-07-09 11:26:50 +02:00
|
|
|
e.g. "output/wp-uploads/date/postname/file.jpg".
|
2018-06-22 22:36:43 +02:00
|
|
|
(wordpress import only) (requires an internet
|
|
|
|
|
connection)
|
2012-11-07 19:14:12 +10:00
|
|
|
--disable-slugs Disable storing slugs from imported posts within
|
|
|
|
|
output. With this disabled, your Pelican URLs may not
|
|
|
|
|
be consistent with your original posts. (default:
|
|
|
|
|
False)
|
2013-03-24 19:19:26 +05:00
|
|
|
-e EMAIL, --email=EMAIL
|
|
|
|
|
Email used to authenticate Posterous API
|
|
|
|
|
-p PASSWORD, --password=PASSWORD
|
|
|
|
|
Password used to authenticate Posterous API
|
2013-05-04 06:13:11 +08:00
|
|
|
-b BLOGNAME, --blogname=BLOGNAME
|
|
|
|
|
Blog name used in Tumblr API
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2012-12-18 14:03:05 +01:00
|
|
|
|
2011-08-03 22:06:10 +02:00
|
|
|
Examples
|
|
|
|
|
========
|
|
|
|
|
|
2018-08-07 14:06:46 +02:00
|
|
|
For Blogger::
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2018-08-07 14:06:46 +02:00
|
|
|
$ pelican-import --blogger -o ~/output ~/posts.xml
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2012-11-07 19:14:12 +10:00
|
|
|
For Dotclear::
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2011-08-11 23:34:53 +02:00
|
|
|
$ pelican-import --dotclear -o ~/output ~/backup.txt
|
2011-08-03 22:06:10 +02:00
|
|
|
|
2013-03-24 19:19:26 +05:00
|
|
|
for Posterous::
|
|
|
|
|
|
|
|
|
|
$ pelican-import --posterous -o ~/output --email=<email_address> --password=<password> <api_token>
|
2012-12-18 14:03:05 +01:00
|
|
|
|
2013-05-04 06:13:11 +08:00
|
|
|
For Tumblr::
|
|
|
|
|
|
|
|
|
|
$ pelican-import --tumblr -o ~/output --blogname=<blogname> <api_token>
|
2013-04-16 10:13:47 -07:00
|
|
|
|
2018-08-07 14:06:46 +02:00
|
|
|
For WordPress::
|
|
|
|
|
|
|
|
|
|
$ pelican-import --wpfile -o ~/output ~/posts.xml
|
|
|
|
|
|
2011-08-03 22:06:10 +02:00
|
|
|
Tests
|
|
|
|
|
=====
|
|
|
|
|
|
|
|
|
|
To test the module, one can use sample files:
|
|
|
|
|
|
2020-04-12 09:38:35 -05:00
|
|
|
- for WordPress: https://www.wpbeginner.com/wp-themes/how-to-add-dummy-content-for-theme-development-in-wordpress/
|
2015-02-18 09:41:27 -08:00
|
|
|
- for Dotclear: http://media.dotaddict.org/tda/downloads/lorem-backup.txt
|
2018-11-06 09:10:00 +01:00
|
|
|
|
2021-03-19 12:58:58 +01:00
|
|
|
.. _More Categories: https://github.com/pelican-plugins/more-categories
|