Home > Article > Web Front-end > A brief analysis of how Nodejs reads and writes large files
The author has recently been doing some file reading, writing and multi-part upload work on node. During this process, I found that if the file read by node exceeds 2G, it will exceed 2G. In order to read the maximum value of the Blob, a read exception will occur. In addition, reading and writing files in the node is also limited by the server RAM. It needs to be read in slices. I will record the problems encountered and the process of solving the problems. [Related tutorial recommendations: nodejs video tutorial]
- File reading and writing in node
- Node file reading and writing RAM and Blob size restrictions
- Others
Normally, if we want to read a relatively small file, we can directly pass:
const fs = require('fs') let data = fs.readFileSync("./test.png") console.log(data,123) //输出data = <Buffer 89 50 4e ...>
Generally speaking, the synchronization method is not recommended because js/nodejs is single-threaded. Synchronous methods block the main thread. The latest version of node directly provides fs.promise, which can be used directly in combination with async/await:
const fs = require('fs') const readFileSync = async () => { let data = await fs.promises.readFile("./test.png") console.log(data,123) } readFileSync() //输出data = <Buffer 89 50 4e ...>
The asynchronous method call here will not block the main thread, and the IO of multiple file readings can also be performed in parallel, etc. .
For conventional file reading and writing, we will read the file into the memory at one time. This method is time efficient and memory efficient. Both are very low. Low time efficiency means that the file must be read once before execution. Low memory efficiency means that the file must be read and put into the memory at once, which takes up a lot of memory. Therefore, in this case, we generally use Stream to read files:
const fs = require('fs') const readFileTest = () => { var data = '' var rs = fs.createReadStream('./test.png'); rs.on('data', function(chunk) { data += chunk; console.log(chunk) }); rs.on('end',function(){ console.log(data); }); rs.on('error', function(err){ console.log(err.stack); }); } readFileTest() // data = <Buffer 89 50 64 ...>
Reading and writing files through Steam can improve memory efficiency and time efficiency.
Stream files also support the second writing method:
const fs = require('fs') const readFileTest = () => { var data = '' var chunk; var rs = fs.createReadStream('./test.png'); rs.on('readable', function() { while ((chunk=rs.read()) != null) { data += chunk; }}); rs.on('end', function() { console.log(data) }); }; readFileTest()
When reading large files, there will be a limit on the size of the read file. For example, we are currently reading a 2.5G video file:
const fs = require('fs') const readFileTest = async () => { let data = await fs.promises.readFile("./video.mp4") console.log(data) } readFileTest()
Executing the above code will report an error:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (2246121911) is greater than 2 GB
We may think that by setting option, NODE_OPTIONS='--max -old-space-size=5000', at this time 5000M>2.5G, but the error still does not disappear, which means that the size limit of the file read by the node cannot be changed through Options.
The above is a conventional way to read large files. Is there any file size limit if it is read through Steam? For example:
const fs = require('fs') const readFileTest = () => { var data = '' var rs = fs.createReadStream('./video.mp4'); rs.on('data', function(chunk) { data += chunk; }); rs.on('end',function(){ console.log(data); }); rs.on('error', function(err){ console.log(err.stack); }); } readFileTest()
There will be no exception when reading a 2.5G file in the above way, but please note that there is an error here:
data += chunk; ^ RangeError: Invalid string length
This is because the length of the data exceeds Maximum limit, such as 2048M, etc. Therefore, when processing with Steam, when saving the reading results, pay attention to the file size, which must not exceed the default maximum value of the Buffer. In the above case, we don't need data = chunk to save all the data in one large data. We can read and process it at the same time.
During the process of reading the file, createReadStream can actually read in segments. This method of segmented reading can also be used. As an alternative to reading large files. Especially when reading concurrently, it has certain advantages and can improve the speed of file reading and processing.
createReadStream accepts the second parameter {start, end}. We can get the size of the file through fs.promises.stat, then determine the fragments, and finally read the fragments once, for example:
const info = await fs.promises.stat(filepath) const size = info.size
const SIZE = 128 * 1024 * 1024 let sizeLen = Math.floor(size/SIZE) let total = sizeLen +1 ; for(let i=0;i<=sizeLen;i++){ if(sizeLen ===i){ console.log(i*SIZE,size,total,123) readStremfunc(i*SIZE,size,total) }else{ console.log(i*SIZE,(i+1)*SIZE,total,456) readStremfunc(i*SIZE,(i+1)*SIZE-1,total) } } //分片后【0,128M】,【128M, 256M】...
3. Implement the read function
const readStremfunc = () => { const readStream = fs.createReadStream(filepath,{start:start,end:end}) readStream.setEncoding('binary') let data = '' readStream.on('data', chunk => { data = data + chunk }) readStream.end('data', () => { ... }) }
It is worth noting that fs.createReadStream(filepath,{ start, end}), start and end are closed front and back. For example, fs.createReadSteam(filepath,{start:0,end:1023}) reads [0,1023] for a total of 1024 bits.
The large files were previously stored in nodejs So is there any problem with reading large files on the browser side?
浏览器在本地读取大文件时,之前有类似FileSaver、StreamSaver等方案,不过在浏览器本身添加了File的规范,使得浏览器本身就默认和优化了Stream的读取。我们不需要做额外的工作,相关的工作:github.com/whatwg/fs。不过不同的版本会有兼容性的问题,我们还是可以通过FileSaver等进行兼容。
如果是在浏览器中获取静态资源大文件,一般情况下只需要通过range分配请求即可,一般的CDN加速域名,不管是阿里云还是腾讯云,对于分片请求都支持的很好,我们可以将资源通过cdn加速,然后在浏览器端直接请求cdn加速有的资源。
分片获取cdn静态资源大文件的步骤为,首先通过head请求获取文件大小:
const getHeaderInfo = async (url: string) => { const res: any = await axios.head(url + `?${Math.random()}`); return res?.headers; }; const header = getHeaderInfo(source_url) const size = header['content-length']
我们可以从header中的content-length属性中,获取文件的大小。然后进行分片和分段,最后发起range请求:
const getRangeInfo = async (url: string, start: number, end: number) => { const data = await axios({ method: 'get', url, headers: { range: `bytes=${start}-${end}`, }, responseType: 'blob', }); return data?.data; };
在headers中指定 range: bytes=${start}-${end}
,就可以发起分片请求去获取分段资源,这里的start和end也是前闭后闭的。
更多node相关知识,请访问:nodejs 教程!
The above is the detailed content of A brief analysis of how Nodejs reads and writes large files. For more information, please follow other related articles on the PHP Chinese website!