.. _import: Importing an existing site ########################## Description =========== ``pelican-import`` is a command-line tool for converting articles from other software to reStructuredText or Markdown. The supported import formats are: - Blogger XML export - Dotclear export - Posterous API - Tumblr API - WordPress XML export - RSS/Atom feed The conversion from HTML to reStructuredText or Markdown relies on `Pandoc`_. For Dotclear, if the source posts are written with Markdown syntax, they will not be converted (as Pelican also supports Markdown). .. note:: Unlike Pelican, Wordpress supports multiple categories per article. These are imported as a comma-separated string. You have to resolve these manually, or use a plugin that enables multiple categories per article (like `more_categories`_). Dependencies ============ ``pelican-import`` has some dependencies not required by the rest of Pelican: - *BeautifulSoup4* and *lxml*, for WordPress and Dotclear import. Can be installed like any other Python package (``pip install BeautifulSoup4 lxml``). - *Feedparser*, for feed import (``pip install feedparser``). - *Pandoc*, see the `Pandoc site`_ for installation instructions on your operating system. .. _Pandoc: https://pandoc.org/ .. _Pandoc site: https://pandoc.org/installing.html Usage ===== :: pelican-import [-h] [--blogger] [--dotclear] [--posterous] [--tumblr] [--wpfile] [--feed] [-o OUTPUT] [-m MARKUP] [--dir-cat] [--dir-page] [--strip-raw] [--wp-custpost] [--wp-attach] [--disable-slugs] [-e EMAIL] [-p PASSWORD] [-b BLOGNAME] input|api_token|api_key Positional arguments -------------------- ============= ============================================================================ ``input`` The input file to read ``api_token`` (Posterous only) api_token can be obtained from http://posterous.com/api/ ``api_key`` (Tumblr only) api_key can be obtained from https://www.tumblr.com/oauth/apps ============= ============================================================================ Optional arguments ------------------ -h, --help Show this help message and exit --blogger Blogger XML export (default: False) --dotclear Dotclear export (default: False) --posterous Posterous API (default: False) --tumblr Tumblr API (default: False) --wpfile WordPress XML export (default: False) --feed Feed to parse (default: False) -o OUTPUT, --output OUTPUT Output path (default: content) -m MARKUP, --markup MARKUP Output markup format: ``rst``, ``markdown``, or ``asciidoc`` (default: ``rst``) --dir-cat Put files in directories with categories name (default: False) --dir-page Put files recognised as pages in "pages/" sub- directory (blogger and wordpress import only) (default: False) --filter-author Import only post from the specified author --strip-raw Strip raw HTML code that can't be converted to markup such as flash embeds or iframes (wordpress import only) (default: False) --wp-custpost Put wordpress custom post types in directories. If used with --dir-cat option directories will be created as "/post_type/category/" (wordpress import only) --wp-attach Download files uploaded to wordpress as attachments. Files will be added to posts as a list in the post header and links to the files within the post will be updated. All files will be downloaded, even if they aren't associated with a post. Files will be downloaded with their original path inside the output directory, e.g. "output/wp-uploads/date/postname/file.jpg". (wordpress import only) (requires an internet connection) --disable-slugs Disable storing slugs from imported posts within output. With this disabled, your Pelican URLs may not be consistent with your original posts. (default: False) -e EMAIL, --email=EMAIL Email used to authenticate Posterous API -p PASSWORD, --password=PASSWORD Password used to authenticate Posterous API -b BLOGNAME, --blogname=BLOGNAME Blog name used in Tumblr API Examples ======== For Blogger:: $ pelican-import --blogger -o ~/output ~/posts.xml For Dotclear:: $ pelican-import --dotclear -o ~/output ~/backup.txt for Posterous:: $ pelican-import --posterous -o ~/output --email= --password= For Tumblr:: $ pelican-import --tumblr -o ~/output --blogname= For WordPress:: $ pelican-import --wpfile -o ~/output ~/posts.xml Tests ===== To test the module, one can use sample files: - for WordPress: https://www.wpbeginner.com/wp-themes/how-to-add-dummy-content-for-theme-development-in-wordpress/ - for Dotclear: http://media.dotaddict.org/tda/downloads/lorem-backup.txt .. _more_categories: https://github.com/getpelican/pelican-plugins/tree/master/more_categories