Home > Article > Web Front-end > html to word java
With the development of Internet technology, more and more applications have been developed, among which HTML and Word are two applications we often use. HTML is a markup language used to create web pages and other web documents. Word is a text editing program used to create and edit documents. There are many situations where HTML to Word needs to be converted, such as during website maintenance where you need to create a Word document from an HTML document for easy offline viewing, or to convert an online report into a document that can be uploaded. In this article, I will introduce how to convert HTML to Word document using Java code.
fce2022be5e87c17c94245fd7ccbf1d9
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.17</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.17</version> </dependency> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.1</version> </dependency>
d233ceef72c18d2307de4871b5eff5ad
8b05045a5be5764f313ed5b9168a17e6
100db36a723c770d327fc0aef2ce13b1
93f0f5c25f18dab9d176bd4f6de5d30e
<meta charset="UTF-8"> <title>HTML to Word Conversion</title>
9c3bca370b5104690d9ef395f2c5f8d1
6c04bd5ca3fcae76e30b72ad730ca86d
<h1>This is a sample HTML file</h1> <p>Here is some text that we will convert to Word format.</p> <ul> <li>List item 1</li> <li>List item 2</li> <li>List item 3</li> </ul> <br /> <ol> <li>Numered item 1</li> <li>Numered item 2</li> <li>Numered item 3</li> </ol>
36cc49f0c466276486e50c850b7e4956
73a6ac4ed44ffec12cee46588e518a5e
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.jsoup.*;
import org.jsoup. nodes.*;
import org.jsoup.select.*;
public class HtmlToWordConverter {
public static void main(String[] args) { String inputFilePath = "D:\sample.html"; String outputFilePath = "D:\sample.docx"; convertHtmlToWord(inputFilePath, outputFilePath); } public static void convertHtmlToWord(String inputFilePath, String outputFilePath) { try { String html = readFile(inputFilePath); Document document = Jsoup.parse(html); XWPFDocument doc = new XWPFDocument(); Elements elements = document.body().children(); for (Element element : elements) { if (element.tagName().equals("h1")) { XWPFParagraph paragraph = doc.createParagraph(); XWPFRun run = paragraph.createRun(); run.setText(element.text()); run.setBold(true); } else if (element.tagName().equals("p")) { XWPFParagraph paragraph = doc.createParagraph(); XWPFRun run = paragraph.createRun(); run.setText(element.text()); } else if (element.tagName().equals("ul")) { XWPFParagraph paragraph = doc.createParagraph(); XWPFRun run = paragraph.createRun(); Elements listItems = element.children(); int i = 1; for (Element listItem : listItems) { run.setText(i + ". " + listItem.text() + "
");
i++; } } else if (element.tagName().equals("ol")) { XWPFParagraph paragraph = doc.createParagraph(); XWPFRun run = paragraph.createRun(); Elements listItems = element.children(); int i = 1; for (Element listItem : listItems) { run.setText(listItem.text() + "
");
i++; } } } FileOutputStream out = new FileOutputStream(outputFilePath); doc.write(out); out.close(); } catch (IOException ex) { System.out.println(ex.getMessage()); } } public static String readFile(String filePath) { try { BufferedReader reader = new BufferedReader(new FileReader(filePath)); StringBuilder stringBuilder = new StringBuilder(); String line; while ((line = reader.readLine()) != null) { stringBuilder.append(line); } return stringBuilder.toString(); } catch (IOException ex) { System.out.println(ex.getMessage()); return null; } }
}
java -cp ".;path-to-all-dependency-jars*" HtmlToWordConverter
Note that you need to replace path-to-all-dependency-jars for your download The path to all Jars. In Windows operating systems, use semicolons to separate Jars paths.
After running the code, a Word document named sample.docx will be created in the specified output path. Open the Word document and check the content. You will see something similar to the content of the HTML file. If you add an image to an HTML file, it will be displayed accordingly in the Word document.
Conclusion:
In this article, we introduced how to convert HTML files to Word documents using Java code. We used the Apache POI and JSoup libraries to read the HTML files and convert them into Word document format. In simple HTML files, this method is very efficient and can be used directly. However, in more complex HTML files, you may need to make more detailed adjustments depending on the target format you want to convert it to.
The above is the detailed content of html to word java. For more information, please follow other related articles on the PHP Chinese website!