Home >Backend Development >Python Tutorial >Super simple Python crawler for downloading NetEase Cloud Music

Super simple Python crawler for downloading NetEase Cloud Music

不言
不言Original
2018-08-29 11:57:054603browse

The content of this article is about the download of NetEase Cloud Music using a super simple Python crawler. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Goal

By chance, I heard "Clouds and Smoke Turn into Rain" by Landlady's Cat. I was instantly fascinated by the lazy voice and student-like lyrics, and then I kept listening to them on a loop. s song. Then I went to watch the anime "I am Jiang Xiaobai", and I am really looking forward to the second season...

I want to see you again, even if we leave after just a quick glance...

Okay, no nonsense . The goal this time is to download the lyrics and audio of the singer's popular music based on the singer's ID in NetEase Cloud and save it to a local folder.

Configuration basics

  • Python

  • Selenium (for configuration methods, refer to: Selenium configuration)

  • Chrome browser (others are also available, and need to be modified accordingly)

Analysis

If the friends who have crawled NetEase Cloud’s website have You should know that NetEase Cloud has an anti-crawling mechanism. When POSTing, you need to simulate the encryption function of some information parameters. But here for the sake of simplicity, novices can understand it. Selenium is used directly to simulate login, and then the interface is used to directly download music and lyrics.

Experimental steps:

  1. Get the singer’s hot song list, song names and links based on the singer ID, and save them to a csv file;

  2. Read the csv file, extract the song ID according to the song link, and then use the corresponding interface to download the music and lyrics;

  3. Put the music and save the lyrics locally.

Super simple Python crawler for downloading NetEase Cloud Music

Python implementation

This part will introduce several key functions...

Get singer information

Using Selenium, we don’t need to read the request for the web page. We can directly extract the corresponding information from the web page source code. Looking at the source code of the singer page, we can find that the information we need is within the iframe, so we first need to switch to the iframe:

browser.switch_to.frame('contentFrame')

Continue reading and find that the song name and link we need are in id ="hotsong-list" tag, then each line corresponds to a tr tag. So first get all the tr contents, and then iterate over the single tr.

data = browser.find_element_by_id("hotsong-list").find_elements_by_tag_name("tr")

Note: The former one is find_element, the latter one is find_elements, and the latter returns a list.

The next step is to parse the content of a single tr tag and obtain the song name and link. You can find that both are in the class="txt" tag, and the link is href attribute, the name is title attribute, which can be obtained directly through the get_attribute() function.

Super simple Python crawler for downloading NetEase Cloud Music

for i in range(len(data)):
    content = data[i].find_element_by_class_name("txt")
    href = content.find_element_by_tag_name("a").get_attribute("href")
    title = content.find_element_by_tag_name("b").get_attribute("title")
    song_info.append((title, href))

Download lyrics

NetEase Cloud has an interface for obtaining lyrics, the link is: http://music.163. com/api/song...

The number in the link is the song id, so after we have the song id, we can download the lyrics directly from the link. The lyrics file is in json format. So we need to use the json package.

Super simple Python crawler for downloading NetEase Cloud Music

And among the lyrics obtained directly, each line has a timeline, which needs to be eliminated using regular expressions. The complete code is as follows:

def get_lyric(self):
    url = 'http://music.163.com/api/song/lyric?' + 'id=' + str(self.song_id) + '&lv=1&kv=1&tv=-1'
    r = requests.get(url)
    json_obj = r.text
    j = json.loads(json_obj)
    lyric = j['lrc']['lyric']
    # 利用正则表达式去除时间轴
    regex = re.compile(r'\[.*\]')
    final_lyric = re.sub(regex, '', lyric)
    return final_lyric

Download Audio

NetEase Cloud also provides an interface for audio files, the link is: http://music.163.com/song/med...

in the link The number is the id of the song, and the audio file can be downloaded directly based on the id of the song. The complete code is as follows:

def get_mp3(self):
    url = 'http://music.163.com/song/media/outer/url?id=' + str(self.song_id)+'.mp3'
    try:
        print("正在下载:{0}".format(self.song_name))
        urllib.request.urlretrieve(url, '{0}/{1}.mp3'.format(self.path, self.song_name))
        print("Finish...")
    except:
        print("Fail...")

Related recommendations:

How to use Python to crawl popular comments on NetEase Cloud Music

##Python crawl Example of the process of getting qq music

The above is the detailed content of Super simple Python crawler for downloading NetEase Cloud Music. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn