Current location - Music Encyclopedia - QQ Music - How to use Python to crawl popular comments on NetEase Cloud Music
How to use Python to crawl popular comments on NetEase Cloud Music

This article will share with you how to use Python to crawl popular comments on NetEase Cloud Music. It has a certain reference value. Friends in need can refer to it

Preface

p>

Recently, I have been studying content related to text mining. It is said that a clever woman cannot make a meal without straw. If you want to conduct text analysis, you must first have text. There are many ways to obtain text, such as downloading ready-made text documents from the Internet, or obtaining data through APIs provided by third parties. But sometimes the data we want cannot be obtained directly because there is no direct download channel or API for us to obtain the data. So what should we do at this time? A better way is to use a web crawler, which is to write a computer program to pretend to be a user to obtain the desired data. Taking advantage of the efficiency of computers, we can obtain data easily and quickly.

About crawlers

So how to write a crawler? There are many languages ????that can be used to write crawlers, such as Java, php, python, etc. I personally prefer to use python. Because Python not only has a built-in powerful network library, but also has many excellent third-party libraries. Others have directly built the wheel, and we can just use it. This brings great convenience to writing crawlers. It is no exaggeration to say that you can actually write a small crawler using less than 10 lines of Python code, while using other languages ??requires writing a lot more code. Being concise and easy to understand is a huge advantage of Python.

Okay, without further ado, let’s get to the main topic today. NetEase Cloud Music has become very popular in recent years. I am a user of NetEase Cloud Music and have been using it for several years. I used to use QQ Music and Kugou. Through my own personal experience, I think the best features of NetEase Cloud Music are its accurate song recommendations and unique user reviews (for the record!!! This is not a soft article) , non-advertising! Just my personal opinion, please don’t comment! Often there will be some comments under a song that have received many likes. Coupled with the fact that NetEase Cloud Music put selected user reviews on the subway a few days ago, NetEase Cloud Music's reviews have become popular again. So I want to analyze NetEase Cloud’s comments and discover the patterns, especially the unique characteristics of some hot comments. With this purpose, I started crawling NetEase Cloud comments.

Network library

Python has two built-in network libraries, urllib and urllib2, but these two libraries are not particularly convenient to use, so here we use a well-received third party Library requests. Using requests, you can achieve more complex crawler work such as setting up agents and simulating logins with just a few lines of code. If pip is already installed, just use pip install requests to install it.

The Chinese document address is here=(organic)|utmcmd=organic; playerid=81568911; __utmb=94650624.23.10.1490672820",

'Connection':"keep-alive", < /p>

'Referer':'/' }

# Set proxy server

proxies= {

'ments(url):

p>

hot_comments_list = []

hot_comments_list.append(u"User ID User Nickname User Avatar Address Comment Time Total Likes Comment Content")

params = get_params(1 ) # First page

encSecKey = get_encSecKey()

json_text = get_json(url,params,encSecKey)

json_dict = json.loads(json_text) < /p>

hot_comments = json_dict['hotComments'] # Hot comments

print("***There are %d hot comments!" % len(hot_comments))

for item in hot_comments:

comment = item['content'] # Comment content

likedCount = item['likedCount'] # Total number of likes

comment_time = item['time'] # Comment time (timestamp)

userID = item['user']['userID'] # Commenter id

nickname = item[ 'user']['nickname'] # Nickname

avatarUrl = item['user']['avatarUrl'] # Avatar address

comment_info = userID + " " + nickname + " " + avatarUrl + " " + comment_time + " " + likedCount + " " + comment + u""

hot_comments_list.append(comment_info)

return hot_comments_list

< p> # Capture all comments of a certain song

def get_all_comments(url):

all_comments_list = [] # Store all comments

all_comments_list.append (u"User ID User Nickname User Avatar Address Comment Time Total Likes Comment Content") # Header information

params = get_params(1)

encSecKey = get_encSecKey()

p>

json_text = get_json(url,params,encSecKey)

json_dict = json.loads(json_text)

comments_num = int(json_dict['total'])

if(comments_num % 20 == 0):