Maison >Opération et maintenance >exploitation et maintenance Linux >Vidéos Zhonggu Education Python recommandées (cours, code source)

Vidéos Zhonggu Education Python recommandées (cours, code source)

黄舟original: 2017-12-04 11:30:182406parcourir

《中谷教育Python视频教程》讲的是Python开发的入门教程，它将介绍Python语言的特点和适用范围，Python基本的数据类型，条件判断和循环，函数，以及Python特有的切片和列表生成式。希望本python教程能够让您快速入门并编写简单的Python程序。

课程播放地址：http://www.php.cn/course/501.html

该老师讲课风格:

教师讲课生动形象，机智诙谐，妙语连珠，动人心弦。一个生动形象的比喻，犹如画龙点睛，给学生开启智慧之门；一种恰如其分的幽默，引来学生会心的微笑，如饮一杯甘醇的美酒，给人以回味和留恋；哲人的警句、文化的箴言不时穿插于讲述中间，给人以思考和警醒。

本视频中较为难点是爬虫了：

1、单个网页的简易爬虫

以下爬虫的主要功能是爬取百度贴吧中某一页面的所有图片。代码由主要有两个函数：其中getHtml()通过页面url获取其对应的html内容，getImage()则通过解析html获取图片地址，实现图片的下载。

代码如下：

import urllib  
import re  
   
def getHtml(url):  
    """通过页面url获取其对应的html内容 
    """  
    page = urllib.urlopen(url) #打开页面  
    content = page.read() #读取页面内容  
    return content  
     
def getImage(html):  
    """通过解析html获取图片地址，实现图片的下载 
    """  
    regx =r&#39;src="(.+?\.jpg)" pic_ext&#39; #利用正则表达式获得图片url  
    imgreg = re.compile(regx)  
    imglist = re.findall(imgreg,html)  
    x = 0  
    for imgurl in imglist:  
        filepath =&#39;F:\\Downloads\\&#39;+str(x)+&#39;.jpg&#39;  
        urllib.urlretrieve(imgurl,filepath) #将图片下载到本地  
        x += 1  
    print &#39;completed!&#39;  
     
html = getHtml(&#39;http://tieba.baidu.com/p/2505265675&#39;)  
imglist = getImage(html)

2、爬取多网页的框架

这里只讲基本思想：第一步是选择一个起始页面，可以直接选择某个网站的主页作为起始页面；第二步是分析这个起始页面的所有链接，然后爬取所有链接的内容；第三步就是无休无止的递归过程，分析爬虫所及的所有子页面内部链接，如果没有爬取过，则继续无休无止的爬取。

借用知乎上谢科兄弟的一段代码来说明。设定初始页面initial_page，爬虫就从这里开始获取页面，url_queue用来存将要爬取的页面队列，seen用来存爬取过的页面。

import Queue  
initial_page ="http://www.renminribao.com"  
url_queue =Queue.Queue()  
seen = set()  
seen.insert(initial_page)  
url_queue.put(initial_page)  
while True:  
    if url_queue.size()>0:  
        current_url = url_queue.get()    #取出队例中第一个的url  
        store(current_url)             #把这个url代表的网页存储好  
        for next_url inextract_urls(current_url): #提取把这个url里链向的url  
            if next_url not in seen:  
                seen.put(next_url)  
                url_queue.put(next_url)  
    else:  
        break

这里还给大家推荐了源码资源的下载：http://www.php.cn/xiazai/learn/1944

这个给大家分享了视频的课件

Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!

Déclaration：

Le contenu de cet article est volontairement contribué par les internautes et les droits d'auteur appartiennent à l'auteur original. Ce site n'assume aucune responsabilité légale correspondante. Si vous trouvez un contenu suspecté de plagiat ou de contrefaçon, veuillez contacter admin@php.cn

Article précédent：Black Horse Cloud Classroom 8 jours de compréhension approfondie du didacticiel vidéo sur le code source PythonArticle suivant：Black Horse Cloud Classroom 8 jours de compréhension approfondie du didacticiel vidéo sur le code source Python

Articles Liés

Voir plus