Home  >  Article  >  Backend Development  >  How to get the content in docx with PHP

How to get the content in docx with PHP

(*-*)浩
(*-*)浩Original
2019-09-04 14:09:374485browse

How to get the content in docx with PHP

Reading of docx files

docx files are actually composed of many XML files, the contents of which exist in word/document .xml inside.

We find a docx file and open it using a zip file (or change the docx suffix name to zip and then unzip it) (recommended learning: PHP video tutorial)

There is document.xml in the word directory, and the content of the docx file exists in document.xml. We can just read this file.

The code is as follows:

function parseWord($file) {
    $content = "";
    $zip = new ZipArchive ( );
    if ($zip->open ($file) === TRUE ) { 
        for($i = 0; $i < $zip->numFiles; $i ++) {
            $entry = $zip->getNameIndex ( $i ); 
            if (pathinfo ($entry,PATHINFO_BASENAME) == "document.xml") { 
                $zip->extractTo (pathinfo ($file, PATHINFO_DIRNAME ) . "/" . pathinfo ($file, PATHINFO_FILENAME ), array (
                        $entry
                ) );
                $filepath = pathinfo ($file, PATHINFO_DIRNAME ) . "/" . pathinfo ( $file, PATHINFO_FILENAME ) . "/" . $entry; 
                $content = strip_tags ( file_get_contents ( $filepath ) );
                break;
            }
        }
        $zip->close ();
         return $content;
    } else {
        echo &#39;no&#39;;
    }
}

It is worth noting:

The first $file file cannot be in the same directory file as the current code, $file is stored in a separate folder

The above is the detailed content of How to get the content in docx with PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn