Home  >  Article  >  Backend Development  >  请教怎么抓取用JS分页的网页内容

请教怎么抓取用JS分页的网页内容

WBOY
WBOYOriginal
2016-06-13 10:46:27904browse

请问如何抓取用JS分页的网页内容
我要抓取一个网站的内容,这个网站分页机制是用js的。具体如下:

[size=10px]

HTML code
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/--><a href="javascript:gogage(pageno+1)" class="navigation">下一页</a>


JScript code
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->    var pageno=1;    function gogage(pno){      tbl.firstPage();      pageno=1;      for(var i=1; (i 


HTML code
<!--Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->

请高手赐教!

------解决方案--------------------
帮忙顶!
------解决方案--------------------
把html页取下来,数据在"#xmldso"指定的位置,也取下来,就是全部内容了。和分页关系不大.
------解决方案--------------------
帮顶!
------解决方案--------------------
利用htmlparser抓取网页内容(一) 
import org.htmlparser.Node;
import org.htmlparser.NodeFilter;
import org.htmlparser.Parser;
import org.htmlparser.filters.TagNameFilter;
import org.htmlparser.tags.TableTag;
import org.htmlparser.util.NodeList;

/**
 *

 * 标题:

 * 功能概要:

 * 版权: cityyouth.cn (c) 2005

 * 公司:上海城市青年网

 * 创建时间:2005-12-21

 * 修改时间:

 * 修改原因:
 * 
 * @author 张伟
 * @version 1.0
 */
public class TestYahoo {
public static void testHtml() {
try {
String sCurrentLine;
String sTotalString;
sCurrentLine = "";
sTotalString = "";
java.io.InputStream l_urlStream;
java.net.URL l_url = new java.net.URL(
"http://sports.sina.com.cn/iframe/nba/live/");
java.net.HttpURLConnection l_connection = (java.net.HttpURLConnection) l_url
.openConnection();
l_connection.connect();
l_urlStream = l_connection.getInputStream();
java.io.BufferedReader l_reader = new java.io.BufferedReader(
new java.io.InputStreamReader(l_urlStream));
while ((sCurrentLine = l_reader.readLine()) != null) {
sTotalString += sCurrentLine;
}
System.out.println(sTotalString);

System.out.println("====================");
String testText = extractText(sTotalString);
System.out.println(testText);
} catch (Exception e) {
e.printStackTrace();
}

}

/**
* 抽取纯文本信息

* @param inputHtml
* @return
*/
public static String extractText(String inputHtml) throws Exception {
StringBuffer text = new StringBuffer();

Parser parser = Parser.createParser(new String(inputHtml.getBytes(),
"8859_1"), "8859-1");
// 遍历所有的节点
NodeList nodes = parser.extractAllNodesThatMatch(new NodeFilter() {
public boolean accept(Node node) {
return true;
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn