[IMP] pelican_import with gmf instead of markdown

The markdown import of pandoc is their own flavour of markdown. It for
instance uses fenced divs[1] which are not supported by
python-markdown.  When importing content from Wordpress, there is
several issues as explained in discussion 3113[2]
This change follows a discussion with pandoc developer[3]

[1] https://pandoc.org/MANUAL.html#divs-and-spans
[2] https://github.com/getpelican/pelican/discussions/3113
[3] https://fosstodon.org/@pandoc/110105559949588768

Take the following Wordpress blog post sample:
```html
<p><!-- wp:paragraph --></p>
<p>Paragraph content</p>
<p><!-- /wp:paragraph --></p>
<p><!-- wp:image {"align":"center","id":3747,"sizeSlug":"full"} --></p>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img src="https://test.com/test.jpg" alt="" class="wp-image-3747" title="Some title"/><br />
<figcaption><em>Some caption</em></figcaption>
</figure>
</div>
<p><!-- /wp:image --></p>
```
Before this commit:
was imported as

```md
`<!-- wp:paragraph -->`{=html}

Paragraph content

`<!-- /wp:paragraph -->`{=html}

`<!-- wp:image {"align":"center","id":3747,"sizeSlug":"full"} -->`{=html}

::: wp-block-image
<figure class="aligncenter size-full">
<img src="https://test.com/test.jpg" title="Some title"
class="wp-image-3747" /><br />

<figcaption><em>Some caption</em></figcaption>
</figure>
:::

`<!-- /wp:image -->`{=html}
```

After this change:
```md
<!-- wp:paragraph -->

Paragraph content

<!-- /wp:paragraph -->

<!-- wp:image {"align":"center","id":3747,"sizeSlug":"full"} -->

<div class="wp-block-image">

<figure class="aligncenter size-full">
<img src="https://test.com/test.jpg" title="Some title"
class="wp-image-3747" /><br />

<figcaption><em>Some caption</em></figcaption>
</figure>

</div>

<!-- /wp:image -->
```

Fixes #3113
This commit is contained in:
Martin (mart-e) 2023-03-29 13:39:27 +02:00
commit 219c01afb0
2 changed files with 6 additions and 3 deletions

View file

@ -317,7 +317,7 @@ class TestWordpressXmlImporter(unittest.TestCase):
self.posts)
with temporary_folder() as temp:
md = [r(f) for f in silent_f2p(test_post, 'markdown', temp)][0]
sample_line = re.search(r'- This is a code sample', md).group(0)
sample_line = re.search(r'- This is a code sample', md).group(0)
code_line = re.search(r'\s+a = \[1, 2, 3\]', md).group(0)
self.assertTrue(sample_line.rindex('This') < code_line.rindex('a'))