Home  >  Article  >  Backend Development  >  How to Read DOCX Files in PHP without Extraneous Characters?

How to Read DOCX Files in PHP without Extraneous Characters?

Susan Sarandon
Susan SarandonOriginal
2024-10-25 18:06:03799browse

How to Read DOCX Files in PHP without Extraneous Characters?

How to Read DOC Files in PHP

When attempting to read DOC or DOCX files in PHP, you may encounter issues with extraneous characters at the end of your text. This error occurs because the provided code snippet is unable to correctly parse the DOC format.

To resolve this issue, we need to modify our approach slightly since PHP does not support native DOC file parsing. Instead, we will use a different method to handle DOCX files.

Updated Code for Reading DOCX Files:

<code class="php">function read_file_docx($filename) {
    $striped_content = '';
    $content = '';

    if (!$filename || !file_exists($filename)) return false;

    $zip = zip_open($filename);

    if (!$zip || is_numeric($zip)) return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

        if (zip_entry_name($zip_entry) != "word/document.xml") continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }// end while

    zip_close($zip);

    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    $content = str_replace('</w:r></w:p>', "\r\n", $content);
    $striped_content = strip_tags($content);

    return $striped_content;
}

$filename = "filepath";// or /var/www/html/file.docx

$content = read_file_docx($filename);
if($content !== false) {

    echo nl2br($content);
}
else {
    echo 'Couldn\'t the file. Please check that file.';
}</code>

This updated code uses the PHP ZipArchive class to open and read the contents of the DOCX file. Specifically, it extracts the "word/document.xml" file from the ZIP archive, which contains the actual text content.

By using this method, you can successfully read and parse DOCX files in PHP.

The above is the detailed content of How to Read DOCX Files in PHP without Extraneous Characters?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn