Home  >  Q&A  >  body text

Word counting in HTML using regular expressions

This is the same problem as this one. But since I'm not using javascript, "innerText" is not a solution for me and I want to know if the regex can combine /(<.*?>)/g and > /\S /g Get the actual word count without doing a bunch of string manipulation.

The language I'm using here is Dart, if a solution I haven't found already exists in it, that might also serve as an answer. Thanks!

Edit: Someone edited the tag? This question is not Dart specific but about regular expressions so I put them back as is.

Edit 2: The question is closed because it's not "focused" but I don't know how to make "if the regex can combine /(<.*?>)/g and /\S /g" More concentrated.

P粉153503989P粉153503989199 days ago473

reply all(1)I'll reply

  • P粉399090746

    P粉3990907462024-04-02 22:10:34

    Assuming all text is contained within HTML elements, you can use (?<=>|\s)[^<\s>='"] ?(?=<|\s ).

    Using strings <p>One</p><p>Two three, four. Five</p><p>Six</p> There are six games.

    Notice:

    1. It uses backward groups, but not all browsers support this group.
    2. Punctuation marks at the end of words are grouped with them, such as "three", so keep this in mind if you plan on using actual words rather than just counting.

    reply
    0
  • Cancelreply