집 >백엔드 개발 >파이썬 튜토리얼 >Python 기본 튜토리얼 프로젝트 네 가지 뉴스 수집

Python 기본 튜토리얼 프로젝트 네 가지 뉴스 수집

不言원래의: 2018-04-03 09:17:341761검색

이 기사는 Python 기본 튜토리얼 프로젝트의 네 번째 부분에 대한 뉴스 집계를 주로 소개합니다. 관심 있는 친구는 "파이썬 기본 튜토리얼" 책의 네 번째 연습인 뉴스를 참조할 수 있습니다. 요즘에는 보기 드문, 적어도 나는 한 번도 사용해 본 적이 없는 응용 프로그램 유형을 유즈넷(Usenet)이라고도 합니다. 이 프로그램의 주요 기능은 지정된 소스(여기서는 유즈넷 뉴스그룹)에서 정보를 수집한 다음 이 정보를 지정된 대상 파일에 저장하는 것입니다(여기에서는 일반 텍스트와 html 파일의 두 가지 형식이 사용됨). 이 프로그램의 사용은 현재의 블로그 구독 도구 또는 RSS 구독자와 다소 유사합니다.

먼저 코드부터 시작한 다음 하나씩 분석해 보겠습니다.

from nntplib import NNTP
from time import strftime,time,localtime
from email import message_from_string
from urllib import urlopen
import textwrap
import re
day = 24*60*60
def wrap(string,max=70):
    &#39;&#39;&#39;
    &#39;&#39;&#39;
    return &#39;\n&#39;.join(textwrap.wrap(string)) + &#39;\n&#39;
class NewsAgent:
    &#39;&#39;&#39;
    &#39;&#39;&#39;
    def __init__(self):
        self.sources = []
        self.destinations = []
    def addSource(self,source):
        self.sources.append(source)
    def addDestination(self,dest):
        self.destinations.append(dest)
    def distribute(self):
        items = []
        for source in self.sources:
            items.extend(source.getItems())
        for dest in self.destinations:
            dest.receiveItems(items)
class NewsItem:
    def __init__(self,title,body):
        self.title = title
        self.body = body
class NNTPSource:
    def __init__(self,servername,group,window):
        self.servername = servername
        self.group = group
        self.window = window
    def getItems(self):
        start = localtime(time() - self.window*day)
        date = strftime(&#39;%y%m%d&#39;,start)
        hour = strftime(&#39;%H%M%S&#39;,start)
        server = NNTP(self.servername)
        ids = server.newnews(self.group,date,hour)[1]
        for id in ids:
            lines = server.article(id)[3]
            message = message_from_string(&#39;\n&#39;.join(lines))
            title = message[&#39;subject&#39;]
            body = message.get_payload()
            if message.is_multipart():
                body = body[0]
            yield NewsItem(title,body)
        server.quit()
class SimpleWebSource:
    def __init__(self,url,titlePattern,bodyPattern):
        self.url = url
        self.titlePattern = re.compile(titlePattern)
        self.bodyPattern = re.compile(bodyPattern)
    def getItems(self):
        text = urlopen(self.url).read()
        titles = self.titlePattern.findall(text)
        bodies = self.bodyPattern.findall(text)
        for title.body in zip(titles,bodies):
            yield NewsItem(title,wrap(body))
class PlainDestination:
    def receiveItems(self,items):
        for item in items:
            print item.title
            print &#39;-&#39;*len(item.title)
            print item.body
class HTMLDestination:
    def __init__(self,filename):
        self.filename = filename
    def receiveItems(self,items):
        out = open(self.filename,&#39;w&#39;)
        print >> out,&#39;&#39;&#39;
        <html>
        <head>
         <title>Today&#39;s News</title>
        </head>
        <body>
        <h1>Today&#39;s News</hi>
        &#39;&#39;&#39;
        print >> out, &#39;<ul>&#39;
        id = 0
        for item in items:
            id += 1
            print >> out, &#39;<li><a href="#" rel="external nofollow" >%s</a></li>&#39; % (id,item.title)
        print >> out, &#39;</ul>&#39;
        id = 0
        for item in items:
            id += 1
            print >> out, &#39;<h2><a name="%i">%s</a></h2>&#39; % (id,item.title)
            print >> out, &#39;<pre class="brush:php;toolbar:false">%s

' % item.body print >> out, ''' ''' def runDefaultSetup(): agent = NewsAgent() bbc_url = 'http://news.bbc.co.uk/text_only.stm' bbc_title = r'(?s)a href="[^" rel="external nofollow" ]*">\s*\s*(.*?)\s*' bbc_body = r'(?s)\s*
\s*(.*?)\s*<' bbc = SimpleWebSource(bbc_url, bbc_title, bbc_body) agent.addSource(bbc) clpa_server = 'news2.neva.ru' clpa_group = 'alt.sex.telephone' clpa_window = 1 clpa = NNTPSource(clpa_server,clpa_group,clpa_window) agent.addSource(clpa) agent.addDestination(PlainDestination()) agent.addDestination(HTMLDestination('news.html')) agent.distribute() if __name__ == '__main__': runDefaultSetup()

이 프로그램의 핵심 부분은 먼저 뉴스 소스를 저장하는 것입니다. 대상 주소를 저장한 다음 소스 서버(NNTPSource 및 SimpleWebSource)와 뉴스 작성을 위한 클래스(PlainDestination 및 HTMLDestination)를 호출합니다. 따라서 여기에서 NNTPSource는 뉴스 서버에 대한 정보를 얻는 데 특별히 사용되고 SimpleWebSource는 URL에 대한 데이터를 얻는 데 사용된다는 것을 알 수 있습니다. PlainDestination과 HTMLDestination의 기능은 분명합니다. 전자는 얻은 콘텐츠를 터미널에 출력하는 데 사용되고 후자는 html 파일에 데이터를 쓰는 데 사용됩니다.

이러한 분석을 통해 메인 프로그램의 내용을 살펴보겠습니다. 메인 프로그램은 NewsAgent에 정보 소스를 추가하고 대상 주소를 출력하는 것입니다.

정말 간단한 프로그램이지만 이 프로그램은 레이어링을 사용합니다.

위 내용은 Python 기본 튜토리얼 프로젝트 네 가지 뉴스 수집의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

Python html

성명：

이전 기사：Python의 Json_python 구문 분석에 대한 심층적인 이해다음 기사：Python의 Json_python 구문 분석에 대한 심층적인 이해