Home  >  Article  >  Backend Development  >  Detailed introduction to file processing in python

Detailed introduction to file processing in python

高洛峰
高洛峰Original
2017-03-22 10:38:211463browse

If you want to operate a file on the hard disk in python, it can be roughly divided into three steps. The process is as follows:

Use the open function to open a file handle and assign it to a variable.

Operate the specified file through the corresponding file handle.

Close the file after the operation is completed. After the file is closed, the file contents will be written to the disk.

The method of using the open function is as follows.

open('file path',mode='mode to open file',encoding='file encoding method')

File path: This file path can be an absolute path or Relative path, in python, you only need to write the file name for the relative path. If the python program and the file that needs to be opened are in the same directory, just use the relative path directly.

Notice! If you want to use the open function, the file path is a necessary parameter and cannot be omitted!

Open file mode: There is an optional formal parameter in the open function, which is mode, which is used to define the file opening mode. If the file opening mode is not specified, r (read-only) will be used by default. mode to open the file.

The common file opening methods provided in python are as follows:

'r' opens the file in read-only mode. Use r (read-only mode) to open the file. The file can only be read and cannot be edited. write operation. (If the file does not exist, an exception will be thrown)

'w' opens the file in write-only mode. Use w (write-only mode) to open the file. The file can only be written, not read, but there is one thing that needs to be special Notice! ! ! ! ! Once a file with original content is opened using w, the content of the file will be cleared! ! (As for the reason, I will add it later in this article.) (If you don’t want the original content of the file to be cleared!! Then never use w mode!!!!!!) (When the file does not exist, a file will be created. If the file exists , the file content will be cleared first)

'a' Append mode is also a write-only mode, used to append content at the end of the ask price, and the written content will be appended to the end of the file

'b' Binary mode, use binary mode to open the file, pay attention! This ‘b’ (binary mode) is to be used in combination with the three modes (r, w, a). (This mode is recommended for cross-platform and cross-operating systems)

'r+' Readable and writable (In this mode, although it is readable and writable, you must pay attention when writing, the seek pointer is still in the file If you start writing directly without adjusting the position of the seek pointer, you will directly start overwriting the previously written content (starting from the head of the file). Therefore, when using r+ mode, you must pay attention to the position of the seek pointer. Where in the file! ! Otherwise, the original content will be overwritten. )

'w+' can be written and read (this mode is generally not used, and the file will be cleared directly)

' a+' appended at the end, writable and readable

1. Common methods for operating file objects

Reading files:

readable() is used to determine whether the file is readable , returns True if readable, False otherwise.

readline() reads the file one line at a time and returns the string type.

read() reads all the contents of the file at one time and returns an entire string.

readlines() reads all the contents of the file and adds each line of the file to a list. Each line of the file will be used as an element in the list.

Write file:

writable(): Determine whether the file is writable. If it is writable, return True, otherwise return False.

write(): Write content in the file. This method can only be used when the file is in writable mode. The specific location of the file to be written depends on the opening mode of the file (r+ Or a+ or w+) It also depends on where the current seek pointer points to in the file. (In addition, when using the write method to write content to the inside of a file, there is no newline character. You need to add a newline character manually, otherwise all the content will be stuck together.)

Example: f1.write(' hello!\n') #\n is the newline character.

writelines(): Similar to wirte, it writes content to the inside of the file. Different from write, writelines uses the form of a list to write content to the inside of the file. Using the writelines method, python will List loop, each element in the list is written to the file.

Notice! When using writelines to write content inside the file, there is no newline character. If you add a newline character to the end of each element, then each element in the list is a line in the file.

Notice! ! The content written in the file can only be strings, not other types! ! Otherwise, an exception will be thrown. Even if a number is to be written, the number must be converted into a string type! ! !

Other operations:

close() close the file. When the file is finished reading or writing, you must use close to close the file! (Except for using the with syntax, because using the with keyword to open a file will automatically close the file after the operation on the file is completed).

If you do not close the file after reading it, the program will continue to occupy system resources.

If you do not close the file after writing it, the contents in the memory will not be synchronized to the hard disk in time. If you want to completely write the contents to the hard disk, otherwise use close to close the file. Otherwise, use the flush method. Force the data in the memory to be flushed to the hard disk.

flush() forcefully flushes the data in the memory that has not been written to the hard disk to the hard disk.

encoding: Displays the encoding of the opened file (this method is not available in python2, but can be used in python3.)

tell(): You can obtain the position of the current seek pointer.

seek (pointer position, mode) moves the seek pointer to the position in the file in bytes.

There are three modes for operating the seek pointer in pyrhon. The following is a detailed introduction to these three modes:

file.seek(n,0)#n represents the pointer The position of , the number 0 behind it represents the serial number of the mode.

file.seek(n,0): (Mode 0) Mode 0 represents the absolute position. When n is the number, the pointer will be moved to the byte counting from the beginning of the file. (When using seek, if you do not specify a mode, the default mode is 0.)

For example, file.seek(3,0) moves the pointer to the beginning of the file (the 0th byte) Starting number, three-byte position.

# Before using pointers

f1 = open('seasons.lrc',mode='r')

print f1.readline()

>>>Hamazaki あゆみ- Seasons

#The following is to use 0 mode to move the pointer to the third byte of the file. After moving the pointer to the third byte, read file, the file will be read starting from behind the pointer.

f1 = open('seasons.txt',mode='r')

print f1.tell() #Display the current position of the seek pointer

> >>0 #(The position is 0, which means the pointer is at the beginning of the file)

f1.seek(3,0) #(Move the pointer to the third byte, use 0 mode, absolute Position)

print f1.tell() #Check the pointer position again, you can verify that the pointer position has indeed been moved to the third byte.

>>>3

print f1.readline() #Read a line from behind the current pointer position.

>>>あゆみ- Seasons

Someone may want to ask at this time, it is correct to read the file from behind the pointer, but the pointer is clearly moved to three After bytes, why is one character skipped?

You need to understand what characters and bytes are, and you must be clear about this concept! ! In the UTF-8 character encoding, one Chinese character occupies three bytes. The original content of the first line is "Hamazaki Yumi - Seasons". Because one Chinese character occupies three bytes, the pointer is moved backward by three byte, which happens to be the position of a Chinese character. The pointer is moved to behind "浜". When reading the file, it starts reading from behind this character, so what is displayed is "ああゆみ- Seasons". (A Japanese character also occupies three bytes in UTF-8 character encoding.)

I will give an example later. This example can help you understand the meaning of "absolute position".

f1 = open('seasons.lrc',mode='r')

print f1.tell() #After opening the file, the default position of the pointer is 0.

>>>0

f1.seek(3,0) #Move the finger to the position of 3 bytes in the file.

print f1.tell()

>>>3

f1.seek(3,0)

print f1.tell( )

>>>3

file.seek(n,1): (Mode 1) Mode 1, relative position, n represents how much backward the pointer moves at the current position bytes.

If you think what I said is not easy to understand, you will probably understand after looking at the following examples.

f1 = open('seasons.lrc',mode='r')

print f1.tell() #After opening the file, the default position of the pointer is 0.

>>>0

f1.seek(3,1) #Move the pointer backward by 3 bytes.

print f1.tell()

>>>3 #The pointer moves to the 3rd byte position.

f1.seek(3,1) #Here is the key point, move the pointer backward 3 bytes (here you can compare the difference between mode 1 and mode 0.)

print f1.tell()

>>>6 #The position of the pointer is on the 6th byte, which also shows that every time the 1 mode moves, it does not start from It starts from the beginning of the file and moves backwards based on the last position of the pointer. (This is what "relative position" means.)

If you still don't understand, then read the supplement below.

Finally, let me add:

f1.seek(3,0) means to move the pointer to the third byte of the file. (Absolute position)

f1.seek(3,1) means to move the pointer 3 bytes backward from the current position. (Relative position)

file.seek(-n,2): Use absolute position, starting from the end of the file and moving to the beginning of the file. (When using 2 mode, you need to note that the position where the pointer moves can only be a negative number, because it starts from the end and moves forward!)

The following is an example:

#Open a file , use the read method to read from the beginning to the end of the file. When the file is read, the pointer will naturally move back to the end of the file.

f1 = open('seasons.txt',mode='r')

print f1.tell()

>>>0

f1.read()

print f1.tell()

>>>756

Now we know through the above method that the end of the file is the file The first few bytes of .

Let’s test whether the function of the 2 mode of the seek method is as mentioned before, starting from the end of the file and moving to the beginning of the file.

f1 = open('seasons.txt',mode='r')

print f1.tell()

>>>0

f1.seek(-1,2) #Use the 2 mode of seek to move the pointer forward 1 byte from the end of the file

print f1.tell()

>>>755

#The end of the file is 756, moving forward one byte is 755, and we get the effect we want.

In fact, the 2 modes of the seek method are quite useful. The following is a detailed description of the 2 modes of the seek method.

To obtain the penultimate line to the Nth line of the file, you can use the 2 mode of the seek method.

At this time, someone may ask, isn't it a very simple operation to take out the last line of the file? Just use the readlines method to read out the file, and then take the last element, and then you can get the last line. Wouldn't this method be simpler?

like this.

f1 = open('seasons.txt',mode='r')

print f1.readlines()[-1]

>>> The last line of the file was indeed taken out, but have you ever thought about it. The essence of using readlines to read a file is to read every line of the file into the memory. , if the file is extremely large, such as 10G or 100G, and is too large to be stored in the current memory, if this is the case, this method is not applicable.

The following method is particularly suitable for large files.

f1 = open('seasons.txt',mode='r')

for i in f1: #Direct for loop file handle will not read all files at once memory, but reads one line from the file.

line_bytes = -36

while True:

f1.seek(line_bytes,2)

data = f1.readlines()

if len(data) > 1:

print data[-1]

break

else:

line_bytes * 2

The above is the detailed content of Detailed introduction to file processing in python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn