The following tutorial column will introduce you to golang's efficient processing of large files_Using Pandas to process large files in chunks. I hope it will be helpful to friends in need! Use Pandas to process large files in chunksProblem: Today when processing user data of Kuaishou, I encountered a txt text of almost 600M. It crashed when I opened it with sublime. I used pandas. It took nearly 2 minutes to read with read_table(). Finally, when I opened it, I found almost 30 million rows of data. It's just opening, I don't know how hard it would be to handle.
Solution: I looked through the documentation. This type of function that reads files has two parameters:
chunksize,
iteratorThe principle is The file data is not read into the memory at once, but multiple times. 1. Specify chunksize to read files in chunks
read_csv and read_table have a chunksize parameter to specify a chunk size (how many lines to read each time) and return an iterable TextFileReader object.
table=pd.read_table(path+'kuaishou.txt',sep='t',chunksize=1000000) for df in table: 对df处理 #如df.drop(columns=['page','video_id'],axis=1,inplace=True) #print(type(df),df.shape)打印看一下信息
I have divided the file here again and divided it into several sub-files for separate processing (yes, to_csv also has the chunksize parameter)
2. Specify iterator=True
iterator=True also returns a TextFileReader object
reader = pd.read_table('tmp.sv', sep='t', iterator=True) df=reader.get_chunk(10000) #通过get_chunk(size),返回一个size行的块 #接着同样可以对df处理
Let’s take a look at the content of the pandas document in this aspect.
The above is the detailed content of How to efficiently process large files in golang. For more information, please follow other related articles on the PHP Chinese website!

Go's "strings" package provides rich features to make string operation efficient and simple. 1) Use strings.Contains() to check substrings. 2) strings.Split() can be used to parse data, but it should be used with caution to avoid performance problems. 3) strings.Join() is suitable for formatting strings, but for small datasets, looping = is more efficient. 4) For large strings, it is more efficient to build strings using strings.Builder.

Go uses the "strings" package for string operations. 1) Use strings.Join function to splice strings. 2) Use the strings.Contains function to find substrings. 3) Use the strings.Replace function to replace strings. These functions are efficient and easy to use and are suitable for various string processing tasks.

ThebytespackageinGoisessentialforefficientbyteslicemanipulation,offeringfunctionslikeContains,Index,andReplaceforsearchingandmodifyingbinarydata.Itenhancesperformanceandcodereadability,makingitavitaltoolforhandlingbinarydata,networkprotocols,andfileI

Go uses the "encoding/binary" package for binary encoding and decoding. 1) This package provides binary.Write and binary.Read functions for writing and reading data. 2) Pay attention to choosing the correct endian (such as BigEndian or LittleEndian). 3) Data alignment and error handling are also key to ensure the correctness and performance of the data.

The"bytes"packageinGooffersefficientfunctionsformanipulatingbyteslices.1)Usebytes.Joinforconcatenatingslices,2)bytes.Bufferforincrementalwriting,3)bytes.Indexorbytes.IndexByteforsearching,4)bytes.Readerforreadinginchunks,and5)bytes.SplitNor

Theencoding/binarypackageinGoiseffectiveforoptimizingbinaryoperationsduetoitssupportforendiannessandefficientdatahandling.Toenhanceperformance:1)Usebinary.NativeEndianfornativeendiannesstoavoidbyteswapping.2)BatchReadandWriteoperationstoreduceI/Oover

Go's bytes package is mainly used to efficiently process byte slices. 1) Using bytes.Buffer can efficiently perform string splicing to avoid unnecessary memory allocation. 2) The bytes.Equal function is used to quickly compare byte slices. 3) The bytes.Index, bytes.Split and bytes.ReplaceAll functions can be used to search and manipulate byte slices, but performance issues need to be paid attention to.

The byte package provides a variety of functions to efficiently process byte slices. 1) Use bytes.Contains to check the byte sequence. 2) Use bytes.Split to split byte slices. 3) Replace the byte sequence bytes.Replace. 4) Use bytes.Join to connect multiple byte slices. 5) Use bytes.Buffer to build data. 6) Combined bytes.Map for error processing and data verification.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SublimeText3 English version
Recommended: Win version, supports code prompts!

SublimeText3 Linux new version
SublimeText3 Linux latest version
