search

Home  >  Q&A  >  body text

python - 如何判断rss是否更新过

最近在写一个python程序要把一些rss中的文章不断获取下来

但不知道怎么判断rss是否更新过,只获取那些更新出来的文章

目前的想法是对每一个rss存一个最新文章的时间,下一次把所有更新的文章获取下来,然后更新这个最新时间


对了还有另外一个问题,就是感觉rss中的文章数没有网页上多,貌似好几天才有新的,但网页上是每天都有的,是什么原因?

大家讲道理大家讲道理2807 days ago1459

reply all(3)I'll reply

  • ringa_lee

    ringa_lee2017-04-17 14:49:55

    Theoretically, RSS should return a last-modified or etag (atom) in the http header, which can be judged by this

    In python’s feedparser, you can use it like this

    import feedparser
    d = feedparser.parse(rss_url)
    d = feedparser.parse(rss_url, modified=d.modified, etag=d.etag)
    d.status # 304
    d.feed # {}
    

    If there is no update, you will not get anything the second time

    reply
    0
  • 迷茫

    迷茫2017-04-17 14:49:55

    Doesn’t RSS have a GUID? Save the latest GUID and make a judgment when crawling again. Whether or not RSS has been updated is the business of other people’s server programs and you can’t control it either

    reply
    0
  • 黄舟

    黄舟2017-04-17 14:49:55

    lz, please give me this program code! The final topic is this. I would like to ask the poster for help. I have zero basic knowledge and how to complete this project quickly. Crab

    reply
    0
  • Cancelreply