Home >Backend Development >Python Tutorial >How to Efficiently Retrieve the Last N Lines of a Large File?

How to Efficiently Retrieve the Last N Lines of a Large File?

Patricia Arquette
Patricia ArquetteOriginal
2024-11-30 10:39:10445browse

How to Efficiently Retrieve the Last N Lines of a Large File?

Get Last N Lines of a File, Simulating 'Tail'

Introduction:

When analyzing large log files, it's often necessary to retrieve the last N lines for pagination or inspection. This raises the question of how to efficiently tail a log file with an offset.

Candidate Solution 1:

def tail(f, n, offset=0):
    avg_line_length = 74
    to_read = n + offset
    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None]
        avg_line_length *= 1.3

Evaluation:

This approach makes assumptions about the average line length and incrementally seeks backwards until it finds enough lines. Due to the initial estimate, it may have to seek multiple times, potentially incurring performance penalties.

Candidate Solution 2:

def tail(f, lines=20):
    BLOCK_SIZE = 1024
    f.seek(0, 2)
    block_end_byte = f.tell()
    lines_to_go = lines
    block_number = -1
    blocks = []
    while lines_to_go > 0 and block_end_byte > 0:
        if (block_end_byte - BLOCK_SIZE > 0):
            f.seek(block_number * BLOCK_SIZE, 2)
            blocks.append(f.read(BLOCK_SIZE))
        else:
            f.seek(0, 0)
            blocks.append(f.read(block_end_byte))
        lines_found = blocks[-1].count('\n')
        lines_to_go -= lines_found
        block_end_byte -= BLOCK_SIZE
        block_number -= 1
    all_read_text = ''.join(reversed(blocks))
    return '\n'.join(all_read_text.splitlines()[-lines:])

Explanation:

This method backtracks through the file block by block until it finds the desired number of newlines. It doesn't make assumptions about line length and reads from the beginning if the file is too small to backtrack.

Comparison:

Candidate Solution 2 is generally more efficient and robust than Candidate Solution 1, as it doesn't rely on estimates and reads the file sequentially. It's a more reliable approach for tailing log files with offsets.

The above is the detailed content of How to Efficiently Retrieve the Last N Lines of a Large File?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn