Home  >  Article  >  Java  >  Simple usage example of Jsoup

Simple usage example of Jsoup

大家讲道理
大家讲道理Original
2017-05-28 11:29:201965browse

Test web page


# #

<!doctype html><!-- http://jwc.yangtzeu.edu.cn/ --><html class="outlin colo"><head>
    <meta charset="utf-8">
    <title>长江大学</title>
    <link type="text/css" rel="stylesheet" href="./css/reset.css">
    <link type="text/css" rel="stylesheet" href="./css/layout.css">
    <link type="text/css" rel="stylesheet" href="./css/yangtze.css">
    <script src="base.js"></script>
    <script src="./js/nodeObject.js"></script>
    <script src="./js/yangtze.js"></script></head><body>

    <p id="content">

        <!-- 顶部图片p -->
        <p id="header-imagep"></p>

        <!-- 顶部菜单p -->
        <p id="header-menup">

            <p id="header-menu-table">

                <p class="header-menu-cell"><a href="#" title="首页">首页</a></p>
                <p class="header-menu-cell"><a href="#" title="机构设置">机构设置</a></p>
                <p class="header-menu-cell"><a href="#" title="规章制度">规章制度</a></p>
                <p class="header-menu-cell"><a href="#" title="教学建设">教学建设</a></p>
                <p class="header-menu-cell"><a href="#" title="教务管理">教务管理</a></p>
                <p class="header-menu-cell"><a href="#" title="考务管理">考务管理</a></p>
                <p class="header-menu-cell"><a href="#" title="实践创新">实践创新</a></p>
                <p class="header-menu-cell"><a href="#" title="质量评估">质量评估</a></p>
                <p class="header-menu-cell"><a href="#" title="学务管理">学务管理</a></p>
                <p class="header-menu-cell"><a href="#" title="服务指南">服务指南</a></p>
                <p class="header-menu-cell"><a href="#" title="下载中心">下载中心</a></p>

            </p>

        </p>
        <p class="space"></p>

        <!-- 顶部时间p -->
        <p id="header-datep"></p>
        <p class="space"></p>

        <!-- 中间的tablep -->
        <p id="table">

            <!-- 左侧table-cell -->
            <p id="table-left">

                <p id="table-left-imagep"></p>
                <p class="space"></p>

                <h2 class="h2-style">高教信息<a href="#">+MORE</a></h2>
                <ul class="ul-type-1">
                    <li style="color : red;"><img src="./images/li_bg.jpg"> <a href="#">教育部高等教育司2016年工作要点</a></li>
                    <li><img src="./images/li_bg.jpg"> <a href="#">湖北省教育厅高等教育处2016年工作要点</a></li>
                    <li><img src="./images/li_bg.jpg"> <a href="#">湖北省教育厅高等教育处2015年工作要点</a></li>
                    <li><img src="./images/li_bg.jpg"> <a href="#">省委高校工委 省教育厅关于印发201</a></li>
                    <li><img src="./images/li_bg.jpg"> <a href="#">教育部2015年工作要点</a></li>
                    <li><img src="./images/li_bg.jpg"> <a href="#">近两年就业率较低的本科专业名单</a></li>
                </ul>
                <p class="space"></p>

                <h2 class="h2-style">友情链接</h2>
                <p id="select-type"></p>


            </p>
            <p class="space"></p>

            <!-- 中间table-cell -->
            <p id="table-center">

                <p id="table-center-topLinep"><span id="notice" onmouseover="switchTab(this)">教务通知</span><span id="thisweek" onmouseover="switchTab(this)">本周事务</span></p>

                <ul id="notice-ul">
                    <li>关于组织2017年(第十二届)长江大学大学生化学实验<span>2017-03-30</span></li>
                    <li>关于核查文科相关学院2013级毕业班学生成绩的通知<span>2017-03-30</span></li>
                    <li>关于组织申报第二批校级双语教学示范课程的通知<span>2017-03-30</span></li>
                    <li>查看更多...</li>
                </ul>

                <ul id="thisweek-ul">
                    <li>2016~2017学年第二学期6~7月份主要教学工作安排<span>2017-03-30</span></li>
                    <li>2016~2017学年第二学期5月份主要教学工作安排<span>2017-03-30</span></li>
                    <li>2016~2017学年第二学期4月份主要教学工作安排<span>2017-03-30</span></li>
                    <li>2016~2017学年第二学期3月份主要教学工作安排<span>2017-03-30</span></li>
                    <li>查看更多...</li>
                </ul>

                <p class="chooseTab"><span>教务通知</span><span>本周事务</span></p>

            </p>

            <!-- 右侧table-cell -->
            <p id="table-right">
            </p>

        </p>
        <p class="space"></p>

        <!-- 底部的menup -->
        <p id="bottom-menup"></p>
        <p class="space"></p>

    </p>

    <!-- 最底部的p -->
    <p id="footer"></p>

    <script>
        setup();
        switchTab(elementById("notice"));
        addEventss();    </script></body></html>

Java code

##

import java.io.File;import java.util.ArrayList;import org.jsoup.Jsoup;import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;public class App {    public static void main(String args[]) {        try {

            File input = new File("/Users/YouXianMing/Documents/Project/HTML Project/yangtze/yangtze.html");
            Document doc = Jsoup.parse(input, "UTF-8", "http://yangtze.com/");            // 根据元素id获取元素            {
                Element content = doc.getElementById("content");
                System.out.println(content);
            }            // 根据CSS的class名获取元素数组            {
                ArrayList<Element> list = doc.getElementsByClass("space");                for (Element element : list) {
                    System.out.println(element + "\n");
                }
            }            // 根据标签获取元素数组            {
                ArrayList<Element> list = doc.getElementsByTag("p");                for (Element element : list) {
                    System.out.println(element + "\n");
                }
            }            // 根据元素中含有的属性值获取元素数组            {
                ArrayList<Element> list = doc.getElementsByAttribute("href");                for (Element element : list) {
                    System.out.println(element + "\n");
                }
            }            // 根据元素中含有的属性值获取元素数组            {
                Element content = doc.getElementById("header-menu-table");                // 元素的父元素                System.out.println(content.parent());                // 元素的所有子元素                System.out.println(content.children());                // 与该元素平级的第一个兄弟元素
                System.out.println(content.child(0).firstElementSibling());                // 与该元素平级的最后一个兄弟元素
                System.out.println(content.child(0).lastElementSibling());                // 该元素的前一个兄弟元素
                System.out.println(content.child(1).previousElementSibling());                // 该元素的下一个兄弟元素
                System.out.println(content.child(0).nextElementSibling());
            }            // 一个元素中的数据            {
                Element content = doc.getElementsByClass("ul-type-1").first().child(0);                // 获取文本内容                System.out.println(content.text());                // 获取tag名字                System.out.println(content.tagName());                // 获取tag对象                System.out.println(content.tag());                // 获取属性字典                System.out.println(content.attributes());                // 获取当前内容当中的html内容                System.out.println(content.html());                // 获取外部的html内容                System.out.println(content.outerHtml());                // 获取属性style的值
                System.out.println(content.attr("style"));
            }            // 使用选择器语法来查找元素            {
                Elements elements = null;                // 通过标签查找元素
                elements = doc.select("a");
                System.out.println(elements);                // 通过id查找元素
                elements = doc.select("#content");
                System.out.println(elements);                // 通过class查找元素
                elements = doc.select(".ul-type-1");
                System.out.println(elements);                // 通过属性查找元素
                elements = doc.select("[href]");
                System.out.println(elements);                // 通过属性前缀查找元素
                elements = doc.select("[^hr]");
                System.out.println(elements);                // 通过属性值来查找元素
                elements = doc.select("[id=notice]");
                System.out.println(elements);                // 匹配属性值开头
                elements = doc.select("[onmouseover^=swit]");
                System.out.println(elements);                
                // 匹配属性值结尾
                elements = doc.select("[onmouseover$=(this)]");
                System.out.println(elements);                
                // 匹配包含了属性值
                elements = doc.select("[onmouseover*=Tab]");
                System.out.println(elements);                
                // 正则表达式匹配
                elements = doc.select("ul[id~=^notice]");
                System.out.println(elements);
            }

        } catch (Exception e) {

            System.out.println(e);
        }
    }
}

Note

Please replace the following places by yourself. I loaded the html from local

The following are several situations of obtaining elements

The above is the detailed content of Simple usage example of Jsoup. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn