집 >백엔드 개발 >파이썬 튜토리얼 >Python3.4 기반의 간단한 크롤러 기능에 대한 자세한 소개

Python3.4 기반의 간단한 크롤러 기능에 대한 자세한 소개

巴扎黑원래의: 2017-09-16 10:16:361679검색

이 글은 주로 간단한 크롤링 및 크롤러 기능을 구현하기 위한 Python3.4 프로그래밍을 소개하며, Python3.4 웹 페이지 크롤링 및 정기적인 파싱 관련 작업 기술이 필요하면 참고할 수 있습니다.

이 글의 예는 Python3.4의 간단한 구현을 알려줍니다. Python3.4 프로그래밍 크롤링 크롤러 기능. 참고할 수 있도록 모든 사람과 공유하세요. 세부 사항은 다음과 같습니다:

import urllib.request
import urllib.parse
import re
import urllib.request,urllib.parse,http.cookiejar
import time
def getHtml(url):
  cj=http.cookiejar.CookieJar()
  opener=urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
  opener.addheaders=[(&#39;User-Agent&#39;,&#39;Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36&#39;),(&#39;Cookie&#39;,&#39;4564564564564564565646540&#39;)]
  urllib.request.install_opener(opener)
  page = urllib.request.urlopen(url)
  html = page.read()
  return html
#print ( html)
#html = getHtml("http://weibo.com/")
def getimg(html):
  html = html.decode(&#39;utf-8&#39;)
  reg=&#39;"screen_name":"(.*?)"&#39;
  imgre = re.compile(reg)
  src=re.findall(imgre,html)
  return src
#print ("",getimg(html))
uid=[&#39;2808675432&#39;,&#39;3888405676&#39;,&#39;2628551531&#39;,&#39;2808587400&#39;]
for a in list(uid):
  print (getimg(getHtml("http://weibo.com/"+a)))
  time.sleep(1)

위 내용은 Python3.4 기반의 간단한 크롤러 기능에 대한 자세한 소개의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

성명：

이전 기사：Python에서 구현된 8가지 정렬 알고리즘 요약(1부)다음 기사：Python에서 구현된 8가지 정렬 알고리즘 요약(1부)