Home > Article > Backend Development > Super simple Python crawler for downloading NetEase Cloud Music
The content of this article is about the download of NetEase Cloud Music using a super simple Python crawler. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
By chance, I heard "Clouds and Smoke Turn into Rain" by Landlady's Cat. I was instantly fascinated by the lazy voice and student-like lyrics, and then I kept listening to them on a loop. s song. Then I went to watch the anime "I am Jiang Xiaobai", and I am really looking forward to the second season...
I want to see you again, even if we leave after just a quick glance...Okay, no nonsense . The goal this time is to download the lyrics and audio of the singer's popular music based on the singer's ID in NetEase Cloud and save it to a local folder.
Python
Selenium (for configuration methods, refer to: Selenium configuration)
Chrome browser (others are also available, and need to be modified accordingly)
If the friends who have crawled NetEase Cloud’s website have You should know that NetEase Cloud has an anti-crawling mechanism. When POSTing, you need to simulate the encryption function of some information parameters. But here for the sake of simplicity, novices can understand it. Selenium is used directly to simulate login, and then the interface is used to directly download music and lyrics.
Experimental steps:
Get the singer’s hot song list, song names and links based on the singer ID, and save them to a csv file;
Read the csv file, extract the song ID according to the song link, and then use the corresponding interface to download the music and lyrics;
Put the music and save the lyrics locally.
This part will introduce several key functions...
Using Selenium, we don’t need to read the request for the web page. We can directly extract the corresponding information from the web page source code. Looking at the source code of the singer page, we can find that the information we need is within the iframe, so we first need to switch to the iframe:
browser.switch_to.frame('contentFrame')
Continue reading and find that the song name and link we need are in id ="hotsong-list"
tag, then each line corresponds to a tr
tag. So first get all the tr
contents, and then iterate over the single tr
.
data = browser.find_element_by_id("hotsong-list").find_elements_by_tag_name("tr")
Note: The former one is find_element
, the latter one is find_elements
, and the latter returns a list.
The next step is to parse the content of a single tr
tag and obtain the song name and link. You can find that both are in the class="txt"
tag, and the link is href
attribute, the name is title
attribute, which can be obtained directly through the get_attribute()
function.
for i in range(len(data)): content = data[i].find_element_by_class_name("txt") href = content.find_element_by_tag_name("a").get_attribute("href") title = content.find_element_by_tag_name("b").get_attribute("title") song_info.append((title, href))
NetEase Cloud has an interface for obtaining lyrics, the link is: http://music.163. com/api/song...
The number in the link is the song id, so after we have the song id, we can download the lyrics directly from the link. The lyrics file is in json
format. So we need to use the json
package.
And among the lyrics obtained directly, each line has a timeline, which needs to be eliminated using regular expressions. The complete code is as follows:
def get_lyric(self): url = 'http://music.163.com/api/song/lyric?' + 'id=' + str(self.song_id) + '&lv=1&kv=1&tv=-1' r = requests.get(url) json_obj = r.text j = json.loads(json_obj) lyric = j['lrc']['lyric'] # 利用正则表达式去除时间轴 regex = re.compile(r'\[.*\]') final_lyric = re.sub(regex, '', lyric) return final_lyric
NetEase Cloud also provides an interface for audio files, the link is: http://music.163.com/song/med...
in the link The number is the id of the song, and the audio file can be downloaded directly based on the id of the song. The complete code is as follows:
def get_mp3(self): url = 'http://music.163.com/song/media/outer/url?id=' + str(self.song_id)+'.mp3' try: print("正在下载:{0}".format(self.song_name)) urllib.request.urlretrieve(url, '{0}/{1}.mp3'.format(self.path, self.song_name)) print("Finish...") except: print("Fail...")
Related recommendations:
How to use Python to crawl popular comments on NetEase Cloud Music
##Python crawl Example of the process of getting qq music
The above is the detailed content of Super simple Python crawler for downloading NetEase Cloud Music. For more information, please follow other related articles on the PHP Chinese website!