search

Home  >  Q&A  >  body text

python3.x - python 如何优雅的处理大量异常语句?

我需要用bs4来分析一个html,需要写很多 提取语句,大概几十条,格式如下

twitter_url = summary_soup.find('a','twitter_url').get('href')
facebook_url = summary_soup.find('a','facebook_url').get('href')
linkedin_url = summary_soup.find('a','linkedin_url').get('href') 
name = summary_soup.find('p', class_='name').find('a').string

但是每个语句都有可能出异常,如果每个语句都加上try except 就太繁琐了,有没有什么好的方法处理每条语句,出异常赋值为None,不中断程序

高洛峰高洛峰2888 days ago428

reply all(5)I'll reply

  • ringa_lee

    ringa_lee2017-04-18 09:05:01

    I asked a small question in the comment of the question. If you can answer it, it will be easier for everyone to understand your needs.

    If you don’t want to think too much and just want to avoid the mistakes that may occur when get, there is a more stealthy way. If there are not too many strange situations to deal with, maybe you can try:

    twitter_url = (summary_soup.find('a','twitter_url') or {}).get('href')

    If bs's find does not find anything, it will return None. At this time, we use or to complete a trick first, making find 沒有找到東西的話,會 return None,此時我們利用先利用 or 來完成一個 trick 使得 get 永遠不會失敗.再利用字典的 get 與 bs tag 的 get 相似的特性就可以處理掉異常,對變數賦值為 None forever Will not fail. Using the similar features of

    of dictionary and

    of bs tag, you can handle the exception and assign the value to the variable as None.

    If you want to write more stably, it will be very helpful to refer to @prolifes’ suggestions. find 怎麼偷雞,那我這樣偷偷看,你知道的,偷雞的訣竅就是 假資料

    Someone asked if it is

    :

    from bs4 import BeautifulSoup
    
    html = '<p class="name"><a href="www.hello.com">hello world</a></p>'
    
    emptysoup = BeautifulSoup('<a></a>', 'xml')
    soup = BeautifulSoup(html, 'xml')
    
    name = (soup.find('p', class_='name') or emptysoup).find('a').string
    print(name)
    name = (soup.find('p', class_='nam') or emptysoup).find('a').string
    print(name)

    Result:

    hello world
    None

    Stealing chicken successfully!

    Questions I answered🎜: Python-QA🎜

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-18 09:05:01

    I think this is not a problem of a large number of exceptions, but a problem of code writing. I will make a bold guess, such as this sentence:

    twitter_url = summary_soup.find('a','twitter_url').get('href')

    I think the possible reasons for the error are: summary_soup.find('a','twitter_url') 这一句没有找到元素,然后返回了 None,然后你用这个None调用 get('href'), then it must be an error.

    If this is the reason, it will be easier to deal with. Write it in two paragraphs:

    twitter_url_a = summary_soup.find('a','twitter_url')
    twitter_url = twitter_url_a.get('href') if twitter_url_a else None

    reply
    0
  • PHP中文网

    PHP中文网2017-04-18 09:05:01

    The chain call of bs4 is very good, so I packaged the soup

    
    class MY_SOUP():
        '''
        包装类
        '''
        def __init__(self,soup):
            self.soup = soup
            if soup:
                if soup.string:
                    self.string = soup.string.strip()
                else:
                    self.string = None
            else:
                self.string = None
    
        def find(self, *args, **kw):
            ret = self.soup.find(*args, **kw)
            if ret:
                return FIND_SOUP(ret)
            return FIND_SOUP(None)
    
        def find_all(self,*args, **kw):
            ret = self.soup.find_all(*args, **kw)
            return ret
    
        def get_text(self):
            if self.soup:
                return self.soup.get_text().strip()
            return None
    
        def get(self,*args, **kw):
            if self.soup:
                return self.soup.get(*args, **kw)
            return None
    
    soup = BeautifulSoup(html,'lxml')
    summary_soup = soup.find('p', class_='summary')
    
    #把 summary_soup 包装成我的soup
    summary_soup = MY_SOUP(summary_soup)
    
    #再也没有None异常了
    twitter_url = summary_soup.find('a','twitter_url').get('href')
    facebook_url = summary_soup.find('a','facebook_url').get('href')
    linkedin_url = summary_soup.find('a','linkedin_url').get('href') 
    name = summary_soup.find('p', class_='name').find('a').string
    ...
    

    Reference @prolifes

    reply
    0
  • ringa_lee

    ringa_lee2017-04-18 09:05:01

    Customize a method where errors may be reported and try inside the method

    reply
    0
  • PHPz

    PHPz2017-04-18 09:05:01

    Every exception may occur, which is the problem when you analyze the HTML writing. When analyzing the HTML, you should try to consider it as comprehensively as possible, and then a try except contains all the analysis statements, and then capture the errors and write logs. When the more pages are crawled, there are no more Only if you make mistakes can you prove that the analysis statement is well written

    reply
    0
  • Cancelreply