search

Home  >  Q&A  >  body text

node.js - 怎么用nodejs分析出爬的不同网页那部分是文章标题和内容主体,有相关资料吗

怎么用nodejs分析出爬的不同网页那部分是文章标题和内容而不是页面的其他元素,有相关资料吗

大家讲道理大家讲道理2785 days ago631

reply all(3)I'll reply

  • PHPz

    PHPz2017-04-17 11:32:04

    If it is a specific website, you can make some matches based on its pages.
    It would be difficult to be compatible with all websites. Identification based solely on tag names is definitely not accurate. There should be algorithms such as neural networks and machine learning.

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-17 11:32:04

    It is more convenient to use the cheerio module.
    Example: http://www.focalhot.com/blog/62.html

    reply
    0
  • 巴扎黑

    巴扎黑2017-04-17 11:32:04

    Content themes can try using line block density
    You can only find tags like h1-h3 for the title

    reply
    0
  • Cancelreply