Home >Java >javaTutorial >Java development skills revealed: Implementing PDF document processing functions
Java development skills revealed: Implementing PDF document processing functions
PDF (Portable Document Format) is a widely used electronic document format that has cross-platform and format retention capabilities and safety advantages. In Java development, it is a common requirement to implement the function of processing PDF documents. This article will introduce some Java development techniques to help developers implement PDF document processing functions.
1. Import PDF document processing library
In Java development, we can use some third-party libraries to implement PDF document processing functions, such as iText, PDFBox, etc. These libraries provide rich APIs that can easily create, read, modify, and extract content from PDF documents.
In order to use these libraries, we need to import the corresponding JAR files into the project. You can download the latest version of the JAR file on the official website and add it to the project's dependencies.
2. Create PDF documents
Use the iText library to easily create PDF documents. Here is a simple sample code:
import com.itextpdf.text.Document; import com.itextpdf.text.DocumentException; import com.itextpdf.text.Paragraph; import com.itextpdf.text.pdf.PdfWriter; import java.io.FileNotFoundException; import java.io.FileOutputStream; public class CreatePDF { public static void main(String[] args) { Document document = new Document(); try { PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("sample.pdf")); document.open(); document.add(new Paragraph("Hello World!")); document.close(); writer.close(); System.out.println("PDF created successfully!"); } catch (DocumentException | FileNotFoundException e) { e.printStackTrace(); } } }
The above code creates a PDF document named "sample.pdf" and adds a paragraph to it.
3. Reading PDF documents
Using the PDFBox library can easily read the content of PDF documents. The following is a simple sample code:
import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; import java.io.File; import java.io.IOException; public class ReadPDF { public static void main(String[] args) { try { PDDocument document = PDDocument.load(new File("sample.pdf")); PDFTextStripper stripper = new PDFTextStripper(); String content = stripper.getText(document); System.out.println("PDF content: " + content); document.close(); } catch (IOException e) { e.printStackTrace(); } } }
The above code reads the contents of the "sample.pdf" document and prints it to the console.
4. Modify PDF documents
Using the iText library can easily modify the content of PDF documents. Here is a simple sample code:
import com.itextpdf.text.Document; import com.itextpdf.text.DocumentException; import com.itextpdf.text.Paragraph; import com.itextpdf.text.pdf.PdfReader; import com.itextpdf.text.pdf.PdfStamper; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; public class ModifyPDF { public static void main(String[] args) { try { PdfReader reader = new PdfReader("sample.pdf"); PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("modified.pdf")); Paragraph paragraph = new Paragraph("Modified content"); stamper.getOverContent(1).add(paragraph); stamper.close(); reader.close(); System.out.println("PDF modified successfully!"); } catch (IOException | DocumentException e) { e.printStackTrace(); } } }
The above code opens the "sample.pdf" document, adds a paragraph to the first page, and saves the modified document as "modified.pdf".
5. Extract PDF document content
Using the PDFBox library can easily extract the content of PDF documents. Here is a simple sample code:
import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripperByArea; import org.apache.pdfbox.text.TextPosition; import java.awt.*; import java.awt.geom.Rectangle2D; import java.io.File; import java.io.IOException; public class ExtractContent { public static void main(String[] args) { try { PDDocument document = PDDocument.load(new File("sample.pdf")); PDFTextStripperByArea stripper = new PDFTextStripperByArea() { @Override protected void writePage() throws IOException { // do nothing } @Override protected void writeString(String string, List<TextPosition> textPositions) throws IOException { for (TextPosition text : textPositions) { Rectangle2D.Float boundingBox = new Rectangle2D.Float(text.getX(), text.getY(), text.getWidth(), text.getHeight()); graphics.setColor(Color.RED); graphics.fill(boundingBox); } } }; stripper.extractRegions(document.getPage(0)); document.close(); } catch (IOException e) { e.printStackTrace(); } } }
The above code extracts the content from the first page of the "sample.pdf" document and draws a red rectangle around each character.
Summary:
This article introduces some Java development techniques to help developers realize the processing function of PDF documents. By importing the PDF document processing library, creating, reading, modifying and extracting content in PDF documents, we can flexibly process PDF documents to meet various needs. Hope this article helps you!
The above is the detailed content of Java development skills revealed: Implementing PDF document processing functions. For more information, please follow other related articles on the PHP Chinese website!