Super simple Python crawler for downloading NetEase Cloud Music-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Super simple Python crawler for downloading NetEase Cloud Music

不言

Aug 29, 2018 am 11:57 AM

pythonseleniumNetEase Cloud Music

The content of this article is about the download of NetEase Cloud Music using a super simple Python crawler. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Goal

By chance, I heard "Clouds and Smoke Turn into Rain" by Landlady's Cat. I was instantly fascinated by the lazy voice and student-like lyrics, and then I kept listening to them on a loop. s song. Then I went to watch the anime "I am Jiang Xiaobai", and I am really looking forward to the second season...

I want to see you again, even if we leave after just a quick glance...

Okay, no nonsense . The goal this time is to download the lyrics and audio of the singer's popular music based on the singer's ID in NetEase Cloud and save it to a local folder.

Configuration basics

Python
Selenium (for configuration methods, refer to: Selenium configuration)
Chrome browser (others are also available, and need to be modified accordingly)

Analysis

If the friends who have crawled NetEase Cloud’s website have You should know that NetEase Cloud has an anti-crawling mechanism. When POSTing, you need to simulate the encryption function of some information parameters. But here for the sake of simplicity, novices can understand it. Selenium is used directly to simulate login, and then the interface is used to directly download music and lyrics.

Experimental steps:

Get the singer’s hot song list, song names and links based on the singer ID, and save them to a csv file;
Read the csv file, extract the song ID according to the song link, and then use the corresponding interface to download the music and lyrics;
Put the music and save the lyrics locally.

Super simple Python crawler for downloading NetEase Cloud Music

Python implementation

This part will introduce several key functions...

Get singer information

Using Selenium, we don’t need to read the request for the web page. We can directly extract the corresponding information from the web page source code. Looking at the source code of the singer page, we can find that the information we need is within the iframe, so we first need to switch to the iframe:

browser.switch_to.frame('contentFrame')

Continue reading and find that the song name and link we need are in id ="hotsong-list" tag, then each line corresponds to a tr tag. So first get all the tr contents, and then iterate over the single tr.

data = browser.find_element_by_id("hotsong-list").find_elements_by_tag_name("tr")

Note: The former one is find_element, the latter one is find_elements, and the latter returns a list.

The next step is to parse the content of a single tr tag and obtain the song name and link. You can find that both are in the class="txt" tag, and the link is href attribute, the name is title attribute, which can be obtained directly through the get_attribute() function.

Super simple Python crawler for downloading NetEase Cloud Music

for i in range(len(data)):
    content = data[i].find_element_by_class_name("txt")
    href = content.find_element_by_tag_name("a").get_attribute("href")
    title = content.find_element_by_tag_name("b").get_attribute("title")
    song_info.append((title, href))

Download lyrics

NetEase Cloud has an interface for obtaining lyrics, the link is: http://music.163. com/api/song...

The number in the link is the song id, so after we have the song id, we can download the lyrics directly from the link. The lyrics file is in json format. So we need to use the json package.

Super simple Python crawler for downloading NetEase Cloud Music

And among the lyrics obtained directly, each line has a timeline, which needs to be eliminated using regular expressions. The complete code is as follows:

def get_lyric(self):
    url = 'http://music.163.com/api/song/lyric?' + 'id=' + str(self.song_id) + '&lv=1&kv=1&tv=-1'
    r = requests.get(url)
    json_obj = r.text
    j = json.loads(json_obj)
    lyric = j['lrc']['lyric']
    # 利用正则表达式去除时间轴
    regex = re.compile(r'\[.*\]')
    final_lyric = re.sub(regex, '', lyric)
    return final_lyric

Download Audio

NetEase Cloud also provides an interface for audio files, the link is: http://music.163.com/song/med...

in the link The number is the id of the song, and the audio file can be downloaded directly based on the id of the song. The complete code is as follows:

def get_mp3(self):
    url = 'http://music.163.com/song/media/outer/url?id=' + str(self.song_id)+'.mp3'
    try:
        print("正在下载：{0}".format(self.song_name))
        urllib.request.urlretrieve(url, '{0}/{1}.mp3'.format(self.path, self.song_name))
        print("Finish...")
    except:
        print("Fail...")

Related recommendations:

How to use Python to crawl popular comments on NetEase Cloud Music

##Python crawl Example of the process of getting qq music

The above is the detailed content of Super simple Python crawler for downloading NetEase Cloud Music. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Laravel开发：如何使用Laravel Dusk和Selenium进行浏览器测试？Jun 14, 2023 pm 01:53 PM

Laravel开发：如何使用LaravelDusk和Selenium进行浏览器测试？随着Web应用程序变得越来越复杂，我们需要确保其各个部分都能正常运行。浏览器测试是一种常见的测试方法，用于确保应用在各种不同浏览器下的正确性和稳定性。在Laravel开发中，可以使用LaravelDusk和Selenium进行浏览器测试。本文将介绍如何使用这两个工具进行测

利用Java、Selenium和OpenCV结合的方法，解决自动化测试中滑块验证问题。May 08, 2023 pm 08:16 PM

1、滑块验证思路被测对象的滑块对象长这个样子。相对而言是比较简单的一种形式，需要将左侧的拼图通过下方的滑块进行拖动，嵌入到右侧空槽中，即完成验证。要自动化完成这个验证过程，关键点就在于确定滑块滑动的距离。根据上面的分析，验证的关键点在于确定滑块滑动的距离。但是看似简单的一个需求，完成起来却并不简单。如果使用自然逻辑来分析这个过程，可以拆解如下：1.定位到左侧拼图所在的位置，由于拼图的形状和大小固定，那么其实只需要定位其左边边界离背景图片的左侧距离。（实际在本例中，拼图的起始位置也是固定的，节省了

如何使用Selenium进行Web自动化测试Aug 02, 2023 pm 07:43 PM

如何使用Selenium进行Web自动化测试概述：Web自动化测试是现代软件开发过程中至关重要的一环。Selenium是一个强大的自动化测试工具，可以模拟用户在Web浏览器中的操作，实现自动化的测试流程。本文将介绍如何使用Selenium进行Web自动化测试，并附带代码示例，帮助读者快速上手。环境准备在开始之前，需要安装Selenium库和Web浏览器驱动程

pycharm如何安装seleniumDec 08, 2023 pm 02:32 PM

pycharm安装selenium步骤：1、打开PyCharm；2、在菜单栏中选择依次选择 "File"、"Settings"、"Project: [项目名称]"；3、选择 Project Interpreter；4、点击选项卡右侧的"+"；5、在弹出的窗口搜索selenium；6、找到selenium点击旁边的"Install"按钮；7、等待安装完成；8、关闭设置对话框即可。

高效率爬取网页数据：PHP和Selenium的结合使用Jun 15, 2023 pm 08:36 PM

随着互联网技术的飞速发展，Web应用程序越来越多地应用于我们的日常工作和生活中。而在Web应用程序开发过程中，爬取网页数据是一项非常重要的任务。虽然市面上有很多的Web抓取工具，但是这些工具的效率都不是很高。为了提高网页数据爬取的效率，我们可以利用PHP和Selenium的结合使用。首先，我们需要了解一下PHP和Selenium分别是什么。PHP是一种强大的

在Scrapy爬虫中使用Selenium和PhantomJSJun 22, 2023 pm 06:03 PM

在Scrapy爬虫中使用Selenium和PhantomJSScrapy是Python下的一个优秀的网络爬虫框架，已经被广泛应用于各个领域中的数据采集和处理。在爬虫的实现中，有时候需要模拟浏览器操作去获取某些网站呈现的内容，这时候就需要用到Selenium和PhantomJS。Selenium是模拟人类对浏览器的操作，让我们可以自动化地进行Web应用程序测试

Python中如何使用Selenium爬取网页数据May 09, 2023 am 11:05 AM

一.什么是Selenium网络爬虫是Python编程中一个非常有用的技巧，它可以让您自动获取网页上的数据。Selenium是一个自动化测试工具，它可以模拟用户在浏览器中的操作，比如点击按钮、填写表单等。与常用的BeautifulSoup、requests等爬虫库不同，Selenium可以处理JavaScript动态加载的内容，因此对于那些需要模拟用户交互才能获取的数据，Selenium是一个非常合适的选择。二.安装Selenium要使用Selenium，首先需要安装它。您可以使用pip命令来安装

从零开始：如何使用PHP和Selenium构建网络数据爬虫Jun 15, 2023 pm 12:34 PM

随着互联网的发展，网络数据爬取越来越成为人们关注的焦点。网络数据爬虫可以从互联网中采集大量有用的数据，为企业、学术研究和个人分析提供支持。本文将介绍使用PHP和Selenium构建网络数据爬虫的方法和步骤。一、什么是网络数据爬虫？网络数据爬虫是指自动化程序，在互联网中采集指定网站的数据。网络数据爬虫使用不同的技术和工具来实现，其中最常用的技术是使用编程语言和

See all articles