Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turndown not recognizing <figure> #293

Closed
valnub opened this issue Sep 27, 2019 · 5 comments · Fixed by #326
Closed

Turndown not recognizing <figure> #293

valnub opened this issue Sep 27, 2019 · 5 comments · Fixed by #326

Comments

@valnub
Copy link

valnub commented Sep 27, 2019

I'm trying to convert the following html to markdown:

<p>My friend Alex asked me to teach her how to code and sure thing – we sat down and started hacking! :-)</p>

<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<span class="embed-youtube" style="text-align:center; display: block;"><iframe class='youtube-player' type='text/html' width='634' height='357' src='https://www.youtube.com/embed/oSga19FU5bA?version=3&#038;rel=1&#038;fs=1&#038;autohide=2&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' allowfullscreen='true' style='border:0;'></iframe></span>
</div></figure>

I want to keep the youtube iframe in my markdown. With default settings turndown returns this:

My friend Alex asked me to teach her how to code and sure thing – we sat down and started hacking! :-)

So, I added the keep option:

turndownService.keep(['figure', 'iframe'])

The result is still the same:

My friend Alex asked me to teach her how to code and sure thing – we sat down and started hacking! :-)

Then I started playing with filters:

  turndownService.addRule('figure-with-youtube', {
    filter: function(node, options) {
      console.log(node.nodeName)
      const classAttr = node.getAttribute('class')
      const isYoutube = classAttr && classAttr.indexOf('youtube-player') !== -1
      return node.nodeName === 'IFRAME' && isYoutube
    },
    replacement: function(content, node, options) {
      const src = node.getAttribute('src')
      return `<iframe src=${src}></iframe>`
    }
  })

The console.log(node.nodeName) statement returns me this:

P
IFRAME

As you can see the iframe is found but not the <figure> element. Also, the replacement function above doesn't work. I suspect it's because the iframe is wrapped inside the figure element. So I thought I'll just replace the whole figure but that's not possible because turndown doesn't even find it. Why is that?

@Dashing-Nelson
Copy link

did you find a solution for it?

@valnub
Copy link
Author

valnub commented Feb 20, 2020

Nope :-(

@bambax
Copy link

bambax commented Feb 20, 2020

The reason

is ignored is because it's an empty block element (no text content). You would need to add a blank replacement option when initializing, to preserve figure elements even when empty; something like:

turndownService = new TurndownService({blankReplacement: (content, node) => node.isBlock && !node.matches("figure") ? "\n\n" : node.outerHTML})

(The blank replacement rule has precedence on everything else, that's why keep and new rules don't seem to work).

@domchristie
Copy link
Collaborator

Yes, I think @bambax is correct. I wonder if the blank rule should have a lesser precedence over added rules and keep/remove rules? I think the blank rule had top priority for performance reasons, but perhaps it's not a big issue. What do you think?

@bambax
Copy link

bambax commented Feb 22, 2020

Yes it seems at least 'keep' rules should have priority over blank replacement, because that makes it hard to understand what's happening otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants