python3.x - python 如何优雅的处理大量异常语句？

Question

我需要用bs4来分析一个html，需要写很多 提取语句，大概几十条，格式如下 {代码...} 但是每个语句都有可能出异常，如果每个语句都加上try except 就太繁琐了，有没有什么好的方法处理每条语句，出异常赋值为None...

ringa_lee · Answer

I asked a small question in the comment of the question. If you can answer it, it will be easier for everyone to understand your needs.

If you don’t want to think too much and just want to avoid the mistakes that may occur when get, there is a more stealthy way. If there are not too many strange situations to deal with, maybe you can try:

twitter_url = (summary_soup.find('a','twitter_url') or {}).get('href')

If bs's find does not find anything, it will return None. At this time, we use or to complete a trick first, making find 沒有找到東西的話，會 return None，此時我們利用先利用 or 來完成一個 trick 使得 get 永遠不會失敗．再利用字典的 get 與 bs tag 的 get 相似的特性就可以處理掉異常，對變數賦值為 None forever Will not fail. Using the similar features of

of dictionary and

of bs tag, you can handle the exception and assign the value to the variable as None.

If you want to write more stably, it will be very helpful to refer to @prolifes’ suggestions. find 怎麼偷雞，那我這樣偷偷看，你知道的，偷雞的訣竅就是 假資料

Someone asked if it is

:

from bs4 import BeautifulSoup

html = 'hello world'

emptysoup = BeautifulSoup('', 'xml')
soup = BeautifulSoup(html, 'xml')

name = (soup.find('p', class_='name') or emptysoup).find('a').string
print(name)
name = (soup.find('p', class_='nam') or emptysoup).find('a').string
print(name)

Result:

hello world
None

Stealing chicken successfully!

Questions I answered🎜: Python-QA🎜

大家讲道理 · Answer

I think this is not a problem of a large number of exceptions, but a problem of code writing. I will make a bold guess, such as this sentence:

twitter_url = summary_soup.find('a','twitter_url').get('href')

I think the possible reasons for the error are: summary_soup.find('a','twitter_url') 这一句没有找到元素，然后返回了 None，然后你用这个None调用 get('href'), then it must be an error.

If this is the reason, it will be easier to deal with. Write it in two paragraphs:

twitter_url_a = summary_soup.find('a','twitter_url')
twitter_url = twitter_url_a.get('href') if twitter_url_a else None

PHP中文网 · Answer

The chain call of bs4 is very good, so I packaged the soup


class MY_SOUP():
    '''
    包装类
    '''
    def __init__(self,soup):
        self.soup = soup
        if soup:
            if soup.string:
                self.string = soup.string.strip()
            else:
                self.string = None
        else:
            self.string = None

    def find(self, *args, **kw):
        ret = self.soup.find(*args, **kw)
        if ret:
            return FIND_SOUP(ret)
        return FIND_SOUP(None)

    def find_all(self,*args, **kw):
        ret = self.soup.find_all(*args, **kw)
        return ret

    def get_text(self):
        if self.soup:
            return self.soup.get_text().strip()
        return None

    def get(self,*args, **kw):
        if self.soup:
            return self.soup.get(*args, **kw)
        return None

soup = BeautifulSoup(html,'lxml')
summary_soup = soup.find('p', class_='summary')

#把 summary_soup 包装成我的soup
summary_soup = MY_SOUP(summary_soup)

#再也没有None异常了
twitter_url = summary_soup.find('a','twitter_url').get('href')
facebook_url = summary_soup.find('a','facebook_url').get('href')
linkedin_url = summary_soup.find('a','linkedin_url').get('href') 
name = summary_soup.find('p', class_='name').find('a').string
...

Reference @prolifes

ringa_lee · Answer

Customize a method where errors may be reported and try inside the method

PHPz · Answer

Every exception may occur, which is the problem when you analyze the HTML writing. When analyzing the HTML, you should try to consider it as comprehensively as possible, and then a try except contains all the analysis statements, and then capture the errors and write logs. When the more pages are crawled, there are no more Only if you make mistakes can you prove that the analysis statement is well written

python3.x - python 如何优雅的处理大量异常语句？

reply all(5)I'll reply