This article mainly introduces JAVA to read PDF and WORD documents through example code. Friends in need can refer to the following
Read PDF file jar reference
<dependency> <groupid>org.apache.pdfbox</groupid> pdfbox</artifactid> <version>1.8.13</version> </dependency>
Read WORD file jar reference
<dependency> <groupid>org.apache.poi</groupid> poi-scratchpad</artifactid> <version>3.16-beta1</version> </dependency> <dependency> <groupid>org.apache.poi</groupid> poi</artifactid> <version>3.16-beta1</version> </dependency>
Read WORD file method
/** * * @Title: getTextFromWord * @Description: 读取word * @param filePath * 文件路径 * @return: String 读出的Word的内容 */ public static String getTextFromWord(String filePath) { String result = null; File file = new File(filePath); FileInputStream fis = null; try { fis = new FileInputStream(file); @SuppressWarnings("resource") WordExtractor wordExtractor = new WordExtractor(fis); result = wordExtractor.getText(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (fis != null) { try { fis.close(); } catch (IOException e) { e.printStackTrace(); } } } return result; }
Read PDF file method
/** * * @Title: getTextFromPdf * @Description: 读取pdf文件内容 * @param filePath * @return: 读出的pdf的内容 */ public static String getTextFromPdf(String filePath) { String result = null; FileInputStream is = null; PDDocument document = null; try { is = new FileInputStream(filePath); PDFParser parser = new PDFParser(is); parser.parse(); document = parser.getPDDocument(); PDFTextStripper stripper = new PDFTextStripper(); result = stripper.getText(document); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (is != null) { try { is.close(); } catch (IOException e) { e.printStackTrace(); } } if (document != null) { try { document.close(); } catch (IOException e) { e.printStackTrace(); } } } return result; }
The above is the detailed content of Detailed explanation of how to read PDF and WORD documents in JAVA. For more information, please follow other related articles on the PHP Chinese website!