How to extract text separated by different HTML tags in Cheerio

Question

I'm trying to extract the following specific text strings as separate outputs, for example (grabbing them from the HTML below): lettext="This is the first text I need"; lettext2="This is the text I need The second text ";lettext3="This is the third text I need"; I really don't know how to get text separated by different HTML tags. Count:31Something:This is the first text S I need

P粉198670603 · Answer

Try something like this and see if it works:

html = `your sample html above`

domdoc = new DOMParser().parseFromString(html, "text/html")
result = domdoc.evaluate('//text()[not(ancestor::span)]', domdoc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

for (let i = 0; i < result.snapshotLength; i++) {
  target = result.snapshotItem(i).textContent.trim()
  if (target.length > 0) {
    console.log(target);
  }
}

Using your example html, the output should be:

"That's the first text I need"
"The second text I need"
"The third text I need"

P粉386318086 · Answer

You can iterate over the child nodes of

and get the nodeType === Node.TEXT_NODE:

for any non-empty content

for (const e of document.querySelector("p").childNodes) {
  if (e.nodeType === Node.TEXT_NODE && e.textContent.trim()) {
    console.log(e.textContent.trim());
  }
}

// 或者创建一个数组：
const result = [...document.querySelector("p").childNodes]
  .filter(e =>
    e.nodeType === Node.TEXT_NODE && e.textContent.trim()
  )
  .map(e => e.textContent.trim());
console.log(result);

Count: 31
Something: That's the first text I need Something2: The second text I need
Something3: The third text I need

How to extract text separated by different HTML tags in Cheerio

reply all(2)I'll reply