Home >Backend Development >Golang >Golang generates consistent hashes for jpeg images without writing to disk

Golang generates consistent hashes for jpeg images without writing to disk

WBOY
WBOYforward
2024-02-11 16:33:08762browse

Golang 为 jpeg 图像生成一致的哈希值,而无需写入磁盘

During the development process, we often need to compare the similarity of image files in order to perform image recognition, deduplication and other operations. Generating a hash of an image is a common approach. Usually, we need to write the image to disk and then read it out for hash calculation. However, using the Golang programming language, we can easily generate a jpeg image while directly calculating a consistent hash value without writing to disk. This saves us time and disk space and increases efficiency. This article will detail how to implement this feature in Golang.

Question content

golang imaging newbie

I'm trying to generate consistent hashes for jpeg images. When I reload the image after writing it to disk as a JPEG (which is expected), loading the image and generating the hash on the raw bytes produces a different hash. Once I write the RBGA to disk as a JPEG, the pixels are modified, which corrupts the hash I calculated earlier.

Just hashing the file hash("abc.jpeg") means I have to write to disk; read back; generate the hash, etc..

  • Is there any setting that can be used to control the behavior of output jpeg pixels when reading/writing
  • Should I use *image.RGBA? The input image is *image.YCbCr?
// Open the input image file
inputFile, _ := os.Open("a.jpg")
defer inputFile.Close()

// Decode the input image
inputImage, _, _ := image.Decode(inputFile)

// Get the dimensions of the input image
width := inputImage.Bounds().Dx()
height := inputImage.Bounds().Dy()
subWidth := width / 4
subHeight := height / 4

// Create a new image
subImg := image.NewRGBA(image.Rect(0, 0, subWidth, subHeight))
draw.Draw(subImg, subImg.Bounds(), inputImage, image.Point{0, 0}, draw.Src)

// id want the hashes to be the same for read / write but they will always differ
hash1 := sha256.Sum256(imageToBytes(subImg))
fmt.Printf("<---OUT [%s] %x\n", filename, hash1)
jpg, _ := os.Create("mytest.jpg")
_ = jpeg.Encode(jpg, subImg, nil)
jpg.Close()

// upon reading it back in the pixels are ever so slightly diff
f, _ := os.Open("mytest.jpg")
img, _, _ := image.Decode(f)
jpg_input := image.NewRGBA(img.Bounds())
draw.Draw(jpg_input, img.Bounds(), img, image.Point{0, 0}, draw.Src)
hash2 := sha256.Sum256(imageToBytes(jpg_input))
fmt.Printf("--->IN  [%s] %x\n", filename, hash2)

            // real world use case is..
            // generate subtile of large image plus hash
            // if hash in a dbase
            //    pixel walk to see if hash collision occurred
            //    if pixels are different
            //       deal with it...
            ///   else
            //      object.filename = dbaseb.filename
            // else
            //     add filename to dbase with hash as the lookup
            //     write to jpeg to disk

Workaround

You can use a hash as the writer's target and use io.MultiWriter to calculate the hash when writing to the file:

hash:=sha256.New()
jpeg.Encode(io.MultiWriter(file,hash),img,nil)
hashValue:=hash.Sum(nil)

The above is the detailed content of Golang generates consistent hashes for jpeg images without writing to disk. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete