Home  >  Article  >  Web Front-end  >  How to convert html to pdf in Java

How to convert html to pdf in Java

PHPz
PHPzOriginal
2023-04-21 11:27:5014384browse

In recent years, with the continuous advancement of the digitalization process, the demand for electronic documents has become higher and higher. In actual work, we often need to convert HTML files to PDF files, and in this process we need to use Java programming technology. This article will introduce the Java implementation method of converting HTML to PDF from the following three aspects:

1. Use iText to convert HTML to PDF

iText is a popular Java PDF library that can convert HTML to PDF. Convert the file to a PDF file. iText parses HTML files and reconstructs the page using PDF markup language. The following is the key code for using iText to convert HTML to PDF:

Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("output.pdf"));
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
String html = "<html><head></head><body><p>Hello World</p></body></html>";
htmlWorker.parse(new StringReader(html));
document.close();

The above code creates a Document object for generating PDF files, and then uses PDFWriter to write the Document object into the output stream to generate PDF files. The HTMLWorker is then used to parse the HTML document and add it to the PDF page. Finally, close the Document object to complete the generation of the PDF file.

2. Use Flying Saucer to convert HTML to PDF

Another Java tool that can be used to convert HTML to PDF is Flying Saucer. It is a free and open source PDF renderer that can convert HTML to PDF format documents. The following is a sample code for using Flying Saucer to convert HTML to PDF:

DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(htmlContent)));
ITextRenderer iTextRenderer = new ITextRenderer();
iTextRenderer.setDocument(document, null);
iTextRenderer.layout();
OutputStream outputStream = new FileOutputStream("output.pdf");
iTextRenderer.createPDF(outputStream);
outputStream.close();

The above code first parses the HTML document and reads it into Document. Then, use the ITextRenderer's layout() method to lay out the document. Finally, use the createPDF() method to generate the PDF file into the outputStream.

3. Use PDFBox to convert HTML to PDF

PDFBox is a popular open source Java PDF library that provides many tools for creating and processing PDF files. It also provides some HTML to PDF sample code, the complete sample code can be seen here.

The following is a sample code for using PDFBox to convert HTML to PDF:

PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
PDRectangle mediaBox = page.getMediaBox();
float margin = 72;
float startX = mediaBox.getLowerLeftX() + margin;
float startY = mediaBox.getUpperRightY() - margin;
float width = mediaBox.getWidth() - 2 * margin;
String html = "<html><head></head><body><p>Hello World!</p></body></html>";
ByteArrayInputStream bais = new ByteArrayInputStream(html.getBytes());
InputStreamReader isr = new InputStreamReader(bais);
COSDocument cosDoc = new COSDocument();
PDFOperator.reset();
PDPageTree pageTree = new PDPageTree();
PDDOMParser parser = new PDDOMParser(cosDoc);
parser.parse(isr);
PDDocumentOutline outline = new PDDocumentOutline();
document.getDocumentCatalog().setDocumentOutline(outline.getRootNode());
PDOutlineItem item = new PDOutlineItem();
item.setTitle("PDFBox");
PDOutlineItem childItem = new PDOutlineItem();
childItem.setTitle("Hello World 2");
item.addLast(childItem);
outline.getRootNode().addLast(item);
PDAcroForm form = new PDAcroForm(cosDoc);
document.getDocumentCatalog().setAcroForm(form);
PDPageContentStream cs = new PDPageContentStream(document, page);
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(0);
stripper.setEndPage(1);
String text = stripper.getText(document);
cs.beginText();
cs.setFont(PDType1Font.COURIER, 14);
cs.drawString(text, 100, 100);
cs.endText();
contentStream.close();
document.save("output.pdf");
document.close();

The above code first creates a PDDocument object and adds a new page to it. Then, a PDPageContentStream object is created that is used to draw content on the page. Next, use PDDOMParser to parse the HTML into a COSDocument object. Finally, the content is written to the output stream to generate a PDF file.

Summary

HTML to PDF has a very wide range of applications in the actual production process, and this important task can be easily completed through Java programming. This article introduces how to convert HTML to PDF using three tools: iText, Flying Saucer and PDFBox. Whatever the situation, development can be made faster and more convenient by choosing the method that best suits your project needs.

The above is the detailed content of How to convert html to pdf in Java. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn