Home  >  Q&A  >  body text

Rewrite title as: Extract HTML and WordPress shortcodes as array of objects

I'm developing a headless WordPress website using Nuxt as the frontend.

The website has thousands of articles with shortcodes. I get all the page data via graphql and render the content using v-html and everything is fine, but the shortcode apparently only renders as plain text.

Most of them are very simple shortcodes, so I will create Vue components to replace them

<component :is="someshortcode">

What I need to do is split my html into an array of objects that I can use to render parts of the page into html or into components, depending on what it is.

I think the best way to do this is to use regular expressions, and this is where I'm confused.

Suppose I have the following html and some shortcodes

<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>

[someshortcode attr1="value1" attr2="value2"]

<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>

[someshortcode attr1="value1" attr2="value2"]

<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>

What I want to do is return an array of objects as shown below

[
    {
        type: 'html',
        content: `<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>`
    },
    {
        type: 'shortcode',
        content: `[someshortcode attr1="value1" attr2="value2"]`
    },
    {
        type: 'html',
        content: `<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>`
    },
    {
        type: 'shortcode',
        content: `[someshortcode attr1="value1" attr2="value2"]`
    },
    {
        type: 'html',
        content: `<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>`
    },
]

This is the base of what I need and then I will be able to break down the shortcode further by getting properties etc.

What is the best way to solve this problem? Are regular expressions the best approach?

P粉504920992P粉504920992407 days ago460

reply all(1)I'll reply

  • P粉714844743

    P粉7148447432023-09-09 09:21:27

    You can use a DOM parser and iterate over the top elements of the DOM. If such an element is a text node and has shortcode format, create a separate object for it in the output array, otherwise get iterate over the HTML of the element and accumulate it if it is not a shortcode, and finally output it as an object:

    const html = `<h1>Lorem ipsum dolor sit amet</h1>
    <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
    
    [someshortcode attr1="value1" attr2="value2"]
    
    <h2>Lorem ipsum dolor sit amet</h2>
    <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
    
    [someshortcode attr1="value1" attr2="value2"]
    
    <h2>Lorem ipsum dolor sit amet</h2>
    <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>`;
    
    const {body} = new DOMParser().parseFromString(html, 'text/html');
    let content = "";
    const arr = [];
    for (const child of [...body.childNodes]) {
        if (child.nodeType === 3 && child.textContent.trim()[0] == "[") {
            if (content) arr.push({ type: "html", content });
            content = "";
            arr.push({ type: "shortcode", content: child.textContent.trim() });
        } else {
            content += (child.outerHTML ?? child.textContent);
        }
    }
    if (content) arr.push({ type: "html", content });
    console.log(arr);

    reply
    0
  • Cancelreply