如何爬取动态数据,就是ajax请求的数据
比如说在代码中
<html>
<head>
<title>开课课程信息</title>
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
</head>
<frameset border="false" frameborder="0" rows="30,*">
<frame name="header" scrolling="no" noresize target="frmCourMain" src="akcjj.asp" marginwidth="0"
marginheight="0">
<frame name="frmCourMain" src="akechengdw.asp" scrolling="auto" target="frmCourMain">
<noframes>
<body>
<p>This page uses frames, but your browser doesn't support them.</p>
</body>
</noframes>
</frameset>
</html>
从代码中可以看出来源的数据是框架的akechengdw.asp,但是如何爬取这样的数据
巴扎黑2017-04-17 15:40:59
If it is data requested by ajax, there are generally two ideas.
1. It is a simulated browser to access. Specifically, you can use Google to simulate the browser crawler keyword, but you still have to practice it yourself.
2. Find the relevant interface, crawl the interface, and pay attention to some request headers.
巴扎黑2017-04-17 15:40:59
F12 looks at the ajax request, just pay attention to disguise it, such as user agent, referrer and so on.
If you need login permission, just add a cookie to identify the user. You can try it one by one.
If there is a CSRF defense mechanism, just find the hidden CSRF token and attach it to it.
阿神2017-04-17 15:40:59
Two ideas to supplement the one above
To simulate a browser, you can generally use some headless broswer. For Node, there are some packages, such as https://github.com/amir20/pha...
PHP中文网2017-04-17 15:40:59
At least post a url. I suggest you go to Baidu first and look up "The Art of Questioning". Don't talk so much. It's all just talk. When you ask questions, you have to let others understand.