Home  >  Article  >  Java  >  Java development skills revealed: Implementing PDF document processing functions

Java development skills revealed: Implementing PDF document processing functions

WBOY
WBOYOriginal
2023-11-20 13:45:341148browse

Java development skills revealed: Implementing PDF document processing functions

Java development skills revealed: Implementing PDF document processing functions

PDF (Portable Document Format) is a widely used electronic document format that has cross-platform and format retention capabilities and safety advantages. In Java development, it is a common requirement to implement the function of processing PDF documents. This article will introduce some Java development techniques to help developers implement PDF document processing functions.

1. Import PDF document processing library

In Java development, we can use some third-party libraries to implement PDF document processing functions, such as iText, PDFBox, etc. These libraries provide rich APIs that can easily create, read, modify, and extract content from PDF documents.

In order to use these libraries, we need to import the corresponding JAR files into the project. You can download the latest version of the JAR file on the official website and add it to the project's dependencies.

2. Create PDF documents

Use the iText library to easily create PDF documents. Here is a simple sample code:

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;

public class CreatePDF {
    public static void main(String[] args) {
        Document document = new Document();
        try {
            PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("sample.pdf"));
            document.open();
            document.add(new Paragraph("Hello World!"));
            document.close();
            writer.close();
            System.out.println("PDF created successfully!");
        } catch (DocumentException | FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

The above code creates a PDF document named "sample.pdf" and adds a paragraph to it.

3. Reading PDF documents

Using the PDFBox library can easily read the content of PDF documents. The following is a simple sample code:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

import java.io.File;
import java.io.IOException;

public class ReadPDF {
    public static void main(String[] args) {
        try {
            PDDocument document = PDDocument.load(new File("sample.pdf"));
            PDFTextStripper stripper = new PDFTextStripper();
            String content = stripper.getText(document);
            System.out.println("PDF content: " + content);
            document.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

The above code reads the contents of the "sample.pdf" document and prints it to the console.

4. Modify PDF documents

Using the iText library can easily modify the content of PDF documents. Here is a simple sample code:

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

public class ModifyPDF {
    public static void main(String[] args) {
        try {
            PdfReader reader = new PdfReader("sample.pdf");
            PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("modified.pdf"));
            Paragraph paragraph = new Paragraph("Modified content");
            stamper.getOverContent(1).add(paragraph);
            stamper.close();
            reader.close();
            System.out.println("PDF modified successfully!");
        } catch (IOException | DocumentException e) {
            e.printStackTrace();
        }
    }
}

The above code opens the "sample.pdf" document, adds a paragraph to the first page, and saves the modified document as "modified.pdf".

5. Extract PDF document content

Using the PDFBox library can easily extract the content of PDF documents. Here is a simple sample code:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import org.apache.pdfbox.text.TextPosition;

import java.awt.*;
import java.awt.geom.Rectangle2D;
import java.io.File;
import java.io.IOException;

public class ExtractContent {
    public static void main(String[] args) {
        try {
            PDDocument document = PDDocument.load(new File("sample.pdf"));
            PDFTextStripperByArea stripper = new PDFTextStripperByArea() {
                @Override
                protected void writePage() throws IOException {
                    // do nothing
                }

                @Override
                protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
                    for (TextPosition text : textPositions) {
                        Rectangle2D.Float boundingBox = new Rectangle2D.Float(text.getX(), text.getY(), text.getWidth(), text.getHeight());
                        graphics.setColor(Color.RED);
                        graphics.fill(boundingBox);
                    }
                }
            };
            stripper.extractRegions(document.getPage(0));
            document.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

The above code extracts the content from the first page of the "sample.pdf" document and draws a red rectangle around each character.

Summary:

This article introduces some Java development techniques to help developers realize the processing function of PDF documents. By importing the PDF document processing library, creating, reading, modifying and extracting content in PDF documents, we can flexibly process PDF documents to meet various needs. Hope this article helps you!

The above is the detailed content of Java development skills revealed: Implementing PDF document processing functions. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn