search

With the continuous development of Internet information technology, we increasingly need to convert HTML pages into Word documents for editing, typesetting, printing, etc. This article will introduce how to use the POI library to convert HTML pages into Word documents, and provide some practical code examples.

1. Introduction to POI

POI is the abbreviation of "Poor Obfuscation Implementation". It is an open source project under the Apache Software Foundation and is dedicated to Microsoft Office (including Word, Excel, PowerPoint etc.) developed a set of Java API. Currently, POI has become one of the standard libraries for creating, reading/writing Microsoft Office documents in Java development, and many Java programs use it to operate Office documents.

2. The basic process of creating a Word document with POI

Before using POI to create a Word document, we need to first understand the basic process of creating a Word document.

  1. Create an empty Word document

Create an empty Word document by using the XWPFDocument class provided by POI.

XWPFDocument doc = new XWPFDocument();
  1. Operation of Word document content

The operation of Word document content is implemented through the XWPFParagraph and XWPFRun classes provided by POI, specifically including:

(1 ) Create a paragraph

XWPFParagraph para = doc.createParagraph();

(2) Create text

XWPFRun run = para.createRun();
run.setText("Hello World!");
  1. Write the Word document to the file

Use the write method provided by the XWPFDocument class to write the Word document Write to file.

FileOutputStream out = new FileOutputStream("output.docx");
doc.write(out);
out.close();

3. Convert HTML to Word document

Above we have briefly introduced the basic process of using POI to create a Word document. Below we will introduce how to use POI to convert HTML pages into Word documents.

  1. Get the content of the HTML page

We can use the URLConnection class provided by Java to get the content of the HTML page, as shown below:

String urlStr = "http://www.baidu.com";
URL url = new URL(urlStr);
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
StringBuffer sb = new StringBuffer();
while((line = br.readLine()) != null){
    sb.append(line);
}  
String html = sb.toString();
  1. HTML page parsing

Parse the obtained HTML page content, and use the Jsoup library to realize the parsing of the HTML page, as shown below:

Document docHtml = Jsoup.parse(html);
  1. Word document content Create

(1) Create a blank Word document and use POI's XWPFDocument class

XWPFDocument docx = new XWPFDocument();

(2) Get all paragraphs in the HTML page

Elements parags = docHtml.getElementsByTag("p");

(3) Convert paragraphs of HTML page to paragraphs of Word document

for(Element p : parags){
    XWPFParagraph paragraph = docx.createParagraph();// 新建一个段落
    XWPFRun run = paragraph.createRun();// 在该段落中创建一个文本片段,即 XWPFRun
    run.setText(p.text());// 设置该文本片段的文字内容
}
  1. Write Word document to disk

Finally, we will write the created Word document to disk for subsequent use use.

OutputStream os = new FileOutputStream("output.docx");
docx.write(os);
os.close();

4. Complete code example

The following is a complete code example for converting an HTML page into a Word document:

import java.io.*;
import java.net.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
import org.apache.poi.*;
import org.apache.poi.xwpf.usermodel.*;

public class Html2Word {
    public static void main(String[] args) throws Exception {
        String urlStr = "http://www.baidu.com"; //待转换的HTML页面链接地址
        URL url = new URL(urlStr);
        URLConnection conn = url.openConnection();
        InputStream is = conn.getInputStream();
        BufferedReader br = new BufferedReader(new InputStreamReader(is));
        String line = null;
        StringBuffer sb = new StringBuffer();
        while((line = br.readLine()) != null){
            sb.append(line);
        }
        String html = sb.toString();
        Document docHtml = Jsoup.parse(html);
        Elements parags = docHtml.getElementsByTag("p"); //获取HTML页面中的所有段落
        XWPFDocument docx = new XWPFDocument(); //使用POI的XWPFDocument类创建空白Word文档
        for(Element p : parags){
            XWPFParagraph paragraph = docx.createParagraph(); //新建一个段落
            XWPFRun run = paragraph.createRun(); //在该段落中创建一个文本片段,即 XWPFRun
            run.setText(p.text()); //设置该文本片段的文字内容
        }
        OutputStream os = new FileOutputStream("output.docx");
        docx.write(os);
        os.close();
    }
}

5. Summary

Passed From the above introduction, we can see that using POI to convert HTML pages into Word documents is a very practical function. It can help us quickly and accurately process various text contents in our daily work. POI encapsulates some Java APIs for operating Office software, which can help us operate Word, Excel and other document formats more conveniently, improve our work efficiency, and bring more convenience to our work.

The above is the detailed content of poi html to word. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Keys in React: A Deep Dive into Performance Optimization TechniquesKeys in React: A Deep Dive into Performance Optimization TechniquesMay 01, 2025 am 12:25 AM

KeysinReactarecrucialforoptimizingperformancebyaidinginefficientlistupdates.1)Usekeystoidentifyandtracklistelements.2)Avoidusingarrayindicesaskeystopreventperformanceissues.3)Choosestableidentifierslikeitem.idtomaintaincomponentstateandimproveperform

What are keys in React?What are keys in React?May 01, 2025 am 12:25 AM

Reactkeysareuniqueidentifiersusedwhenrenderingliststoimprovereconciliationefficiency.1)TheyhelpReacttrackchangesinlistitems,2)usingstableanduniqueidentifierslikeitemIDsisrecommended,3)avoidusingarrayindicesaskeystopreventissueswithreordering,and4)ens

The Importance of Unique Keys in React: Avoiding Common PitfallsThe Importance of Unique Keys in React: Avoiding Common PitfallsMay 01, 2025 am 12:19 AM

UniquekeysarecrucialinReactforoptimizingrenderingandmaintainingcomponentstateintegrity.1)Useanaturaluniqueidentifierfromyourdataifavailable.2)Ifnonaturalidentifierexists,generateauniquekeyusingalibrarylikeuuid.3)Avoidusingarrayindicesaskeys,especiall

Using Indexes as Keys in React: When It's Acceptable and When It's NotUsing Indexes as Keys in React: When It's Acceptable and When It's NotMay 01, 2025 am 12:17 AM

Using indexes as keys is acceptable in React, but only if the order of list items is unchanged and not dynamically added or deleted; otherwise, a stable and unique identifier should be used as the keys. 1) It is OK to use index as key in a static list (download menu option). 2) If list items can be reordered, added or deleted, using indexes will lead to state loss and unexpected behavior. 3) Always use the unique ID of the data or the generated identifier (such as UUID) as the key to ensure that React correctly updates the DOM and maintains component status.

React's JSX Syntax: A Developer-Friendly Approach to UI DesignReact's JSX Syntax: A Developer-Friendly Approach to UI DesignMay 01, 2025 am 12:13 AM

JSXisspecialbecauseitblendsHTMLwithJavaScript,enablingcomponent-basedUIdesign.1)ItallowsembeddingJavaScriptinHTML-likesyntax,enhancingUIdesignandlogicintegration.2)JSXpromotesamodularapproachwithreusablecomponents,improvingcodemaintainabilityandflexi

What type of audio files can be played using HTML5?What type of audio files can be played using HTML5?Apr 30, 2025 pm 02:59 PM

The article discusses HTML5 audio formats and cross-browser compatibility. It covers MP3, WAV, OGG, AAC, and WebM, and suggests using multiple sources and fallbacks for broader accessibility.

Difference between SVG and Canvas HTML5 element?Difference between SVG and Canvas HTML5 element?Apr 30, 2025 pm 02:58 PM

SVG and Canvas are HTML5 elements for web graphics. SVG, being vector-based, excels in scalability and interactivity, while Canvas, pixel-based, is better for performance-intensive applications like games.

Is drag and drop possible using HTML5 and how?Is drag and drop possible using HTML5 and how?Apr 30, 2025 pm 02:57 PM

HTML5 enables drag and drop with specific events and attributes, allowing customization but facing browser compatibility issues on older versions and mobile devices.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment