Home  >  Article  >  Backend Development  >  golang document to pdf

golang document to pdf

王林
王林Original
2023-05-13 11:06:372169browse

In our daily work, we often need to convert some documents into PDF format for easy sharing and printing. Although there are many ready-made PDF conversion tools on the market, for most technical personnel, it is more interesting and meaningful to implement a document-to-PDF conversion tool by yourself. In this article, we will introduce how to use Golang to implement a simple document to PDF tool.

  1. Installation dependencies

First, we need to install two libraries to complete the tasks of file conversion and PDF generation. These two libraries are:

  • github.com/SebastiaanKlippert/go-wkhtmltopdf: This library is a wkhtmltopdf library encapsulated in go language and is used to convert html format files into PDF format.
  • github.com/unidoc/unioffice: This library is a golang office file reading and writing library that supports multiple types of office files, including doc, docx, xls, xlsx, ppt, pptx, etc.

We can use the go mod command to install these two libraries:

go get -u github.com/SebastiaanKlippert/go-wkhtmltopdf
go get -u github.com/unidoc/unioffice
  1. Implement document conversion to HTML

Now we are ready After building two libraries, the next step is to implement the function of converting documents to HTML. We will use docx files as an example. Docx documents can be easily read using the unioffice library. The following is a simple implementation code:

package main

import (
    "log"
    "github.com/unidoc/unioffice/document"
)

func DocxToHtml(inputFilePath string) (string, error) {
    doc, err := document.Open(inputFilePath)
    if err != nil {
        return "", err
    }
    defer func() {
        err := doc.Close()
        if err != nil {
            log.Fatalf("unable to close document: %s", err)
        }
    }()

    html, err := doc.Html()
    if err != nil {
        return "", err
    }
    
    return html, nil
}

The function of this function is to convert the input docx file into an HTML string. The input parameter of the function is the path to the docx file, and the output is an HTML string and an error value. In this function, we first use the document.Open function to open the docx file, and then use the doc.Html function to convert the file content into an HTML string. Finally, the HTML string is returned as the output value.

  1. Convert HTML to PDF

Now that we have completed the function of converting docx files to HTML, the next step is to convert HTML to PDF. We will use the go-wkhtmltopdf library to implement this functionality. This library is a wkhtmltopdf library encapsulated in Go language, which can easily convert HTML format documents into PDF format. The following is a simple implementation code:

package main

import (
    "io/ioutil"
    "strings"
    "github.com/SebastiaanKlippert/go-wkhtmltopdf"
)

func HtmlToPdf(html string, outputFilePath string) error {
    err := ioutil.WriteFile("input.html", []byte(html), 0644)
    if err != nil {
        return err
    }

    pdfg, err := wkhtmltopdf.NewPDFGenerator()
    if err != nil {
        return err
    }

    pdfg.AddPage(wkhtmltopdf.NewPageReader(strings.NewReader(html)))
    pdfg.Dpi.Set(300)
    pdfg.Orientation.Set(wkhtmltopdf.OrientationPortrait)
    pdfg.PageSize.Set(wkhtmltopdf.PageSizeA4)
    err = pdfg.Create()
    if err != nil {
        return err
    }

    err = pdfg.WriteFile(outputFilePath)
    if err != nil {
        return err
    }

    return nil
}

The function of this function is to convert the input HTML string into a PDF file. The input parameters of the function are an HTML string and the path to the output PDF file, and the output is an error value. In the function, we first write the HTML string to a file, and then use the go-wkhtmltopdf library to convert the HTML file into a PDF file. When calling the AddPage function, we pass the HTML file as a Reader object. For the output PDF file, we can use the pdfg.WriteFile function to write it to the specified path.

  1. Full code

Now we have completed the two main steps of converting documents to PDF. The following is a complete example code:

package main

import (
    "io/ioutil"
    "log"
    "os"
    "strings"
    "github.com/SebastiaanKlippert/go-wkhtmltopdf"
    "github.com/unidoc/unioffice/document"
)

func main() {
    inputFilePath := "input.docx"
    outputFilePath := "output.pdf"
    html, err := DocxToHtml(inputFilePath)
    if err != nil {
        log.Fatalf("unable to convert docx to html: %s", err)
    }

    err = HtmlToPdf(html, outputFilePath)
    if err != nil {
        log.Fatalf("unable to convert html to pdf: %s", err)
    }

    err = os.Remove("input.html")
    if err != nil {
        log.Fatalf("unable to delete input.html: %s", err)
    }
}

func DocxToHtml(inputFilePath string) (string, error) {
    doc, err := document.Open(inputFilePath)
    if err != nil {
        return "", err
    }
    defer func() {
        err := doc.Close()
        if err != nil {
            log.Fatalf("unable to close document: %s", err)
        }
    }()

    html, err := doc.Html()
    if err != nil {
        return "", err
    }
    
    return html, nil
}

func HtmlToPdf(html string, outputFilePath string) error {
    err := ioutil.WriteFile("input.html", []byte(html), 0644)
    if err != nil {
        return err
    }

    pdfg, err := wkhtmltopdf.NewPDFGenerator()
    if err != nil {
        return err
    }

    pdfg.AddPage(wkhtmltopdf.NewPageReader(strings.NewReader(html)))
    pdfg.Dpi.Set(300)
    pdfg.Orientation.Set(wkhtmltopdf.OrientationPortrait)
    pdfg.PageSize.Set(wkhtmltopdf.PageSizeA4)
    err = pdfg.Create()
    if err != nil {
        return err
    }

    err = pdfg.WriteFile(outputFilePath)
    if err != nil {
        return err
    }

    return nil
}

In this example, we define two functions: DocxToHtml and HtmlToPdf. The DocxToHtml function is used to convert the input docx file into an HTML string; the HtmlToPdf function is used to convert the HTML string into a PDF file. In the main function, we first call the DocxToHtml function to convert the docx file into an HTML string, and then call the HtmlToPdf function to convert the HTML string into a PDF file. Finally, we delete the generated intermediate file input.html.

  1. Summary

In this article, we introduced how to use Golang to implement a simple document to PDF tool. We used two libraries: go-wkhtmltopdf and unioffice. The go-wkhtmltopdf library is used to convert HTML format files into PDF format; the unioffice library is used to read docx files and convert them into HTML strings. Using these two libraries, we can easily convert documents into PDF format. Although this example is simple, it is a good starting point for people who use Golang to develop projects and can be further expanded and optimized.

The above is the detailed content of golang document to pdf. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:golang json to stringNext article:golang json to string