Home > Article > Backend Development > Dynamically add attributes and generate objects in python classes
This article will solve the problem one by one through the following aspects
1. Main functions of the program
2. Implementation process
3. Class definition
4. Use generator generator to dynamically update each object and return the object
5. Use strip to remove Unnecessary characters
6. rematch the matching string
7. Use timestrptime to extract the string and convert it into a time object
8. Complete code
The main function of the program
Now there is a table-like document that stores user information: the first line is Attributes, each attribute is separated by commas (,). Starting from the second line, each line is the value corresponding to each attribute, and each line represents a user. How to read this document and output one user object per line?
There are also 4 small requirements:
Each document is very large. If so many objects generated by all rows are stored in a list and returned at once, the memory will collapse. Only one line-generated object can be stored in the program at a time.
Each string separated by commas may be preceded or followed by double quotes (") or single quotes ('). For example, "Zhang San", you need to remove the quotes; if it is a number, there are +000000001.24 like this Yes, you need to remove all the + and 0 in front, and extract the time in the 1.24
document. It may be in the form of 2013-10-29, or it may be 2013/10/29 2:23:56. Form, you need to convert such a string into a time type
There are many such documents, each with different attributes. For example, this is the user's information, and that is the call record, so the specifics in the class. Which attributes should be dynamically generated based on the first line of the document
Implementation process
1. Class definition
Since attributes are added dynamically, attribute-value pairs are also added dynamically, and the class must contain two members: updateAttributes()
and updatePairs()
Function is sufficient. In addition, the list attributes
is used to store the attributes, and the dictionary attrilist
is used to store the mapping. The init()
function is the constructor. An underscore in front indicates a private variable, which cannot be called directly from outside. Just a=UserInfo()
is required when instantiating it, without any parameters.
The generator is equivalent to a generator that only needs to be initialized Once, the function can be automatically run multiple times, and each loop returns a result. However, the function uses return to return the result, and the generator uses yield to return the result # each time it is run. ##yield
returns, and the next run starts after
. For example, we implement the Fibonacci sequence using functions and generators: <pre class="brush:py;">class UserInfo(object):
&#39;Class to restore UserInformation&#39;
def __init__ (self):
self.attrilist={}
self.__attributes=[]
def updateAttributes(self,attributes):
self.__attributes=attributes
def updatePairs(self,values):
for i in range(len(values)):
self.attrilist[self.__attributes[i]]=values[i]</pre>.
We calculate the first 6 numbers of the sequence:
def fib(max): n, a, b = 0, 0, 1 while n < max: print(b) a, b = b, a + b n = n + 1 return 'done'
If you use a generator, just
printcan be changed to
yieldas follows:
##
>>> fib(6) 1 1 2 3 5 8 'done'
Usage:
def fib(max): n, a, b = 0, 0, 1 while n < max: yield b a, b = b, a + b n = n + 1As you can see, the generator fib itself is an object. Each time it is executed to yield, it will interrupt and return a result. Next time, it will continue to execute from the next line of code in
yield
. The generator can also be executed usinggenerator.next()
. In my program, the generator part code is as follows:>>> f = fib(6) >>> f <generator object fib at 0x104feaaa0> >>> for i in f: ... print(i) ... 1 1 2 3 5 8 >>>
Among them,
a=UserInfo() It is an instantiation of class UserInfo
. Because the document is gb2312 encoded, the corresponding decoding method is used above. Since the first line is attributes, there is a function to store the attribute list inUserInfo
, that is,updateAttributes();
The following lines need to read the attribute-value pairs into a dictionary storage. The dictionary inp.s.python is equivalent to a map.
3. Use strip to remove unnecessary characters
From the above code, you can see that using
can remove the somechar characters before and after str. somechar can be a symbol or a regular expression, as above:
def ObjectGenerator(maxlinenum): filename='/home/thinkit/Documents/usr_info/USER.csv' attributes=[] linenum=1 a=UserInfo() file=open(filename) while linenum < maxlinenum: values=[] line=str.decode(file.readline(),'gb2312')#linecache.getline(filename, linenum,'gb2312') if line=='': print'reading fail! Please check filename!' break str_list=line.split(',') for item in str_list: item=item.strip() item=item.strip('\"') item=item.strip('\'') item=item.strip('+0*') item=catchTime(item) if linenum==1: attributes.append(item) else: values.append(item) if linenum==1: a.updateAttributes(attributes) else: a.updatePairs(values) yield a.attrilist #change to ' a ' to use linenum = linenum +1
##4.re.match Match string
Function syntax:
item=item.strip()#除去字符串前后的所有转义字符,如\t,\n等 item=item.strip('\"')#除去前后的" item=item.strip('\'') item=item.strip('+0*')#除去前后的+00...00,*表示0的个数可以任意多,也可以没有Function parameter description:
flags 标志位,用于控制正则表达式的匹配方式,如:是否区分大小写,多行匹配等等。
若匹配成功re.match方法返回一个匹配的对象,否则返回None。`
>>> s='2015-09-18'
>>> matchObj=re.match(r'\d{4}-\d{2}-\d{2}',s, flags= 0)
>>> print matchObj
d3b49718cc712c0f68eee5f116f281b9
1
2
3
4
5
5.使用time.strptime提取字符串转化为时间对象
在time
模块中,time.strptime(str,format)
可以把str
按照format
格式转化为时间对象,format
中的常用格式有:
%y 两位数的年份表示(00-99)
%Y 四位数的年份表示(000-9999)
%m 月份(01-12)
%d 月内中的一天(0-31)
%H 24小时制小时数(0-23)
%I 12小时制小时数(01-12)
%M 分钟数(00=59)
%S 秒(00-59)
此外,还需要使用re
模块,用正则表达式,对字符串进行匹配,看是否是一般时间的格式,如YYYY/MM/DD H:M:S, YYYY-MM-DD
等
在上面的代码中,函数catchTime就是判断item是否为时间对象,是的话转化为时间对象。
代码如下:
import time import re def catchTime(item): # check if it's time matchObj=re.match(r'\d{4}-\d{2}-\d{2}',item, flags= 0) if matchObj!= None : item =time.strptime(item,'%Y-%m-%d') #print "returned time: %s " %item return item else: matchObj=re.match(r'\d{4}/\d{2}/\d{2}\s\d+:\d+:\d+',item,flags=0 ) if matchObj!= None : item =time.strptime(item,'%Y/%m/%d %H:%M:%S') #print "returned time: %s " %item return item
完整代码:
import collections import time import re class UserInfo(object): 'Class to restore UserInformation' def __init__ (self): self.attrilist=collections.OrderedDict()# ordered self.__attributes=[] def updateAttributes(self,attributes): self.__attributes=attributes def updatePairs(self,values): for i in range(len(values)): self.attrilist[self.__attributes[i]]=values[i] def catchTime(item): # check if it's time matchObj=re.match(r'\d{4}-\d{2}-\d{2}',item, flags= 0) if matchObj!= None : item =time.strptime(item,'%Y-%m-%d') #print "returned time: %s " %item return item else: matchObj=re.match(r'\d{4}/\d{2}/\d{2}\s\d+:\d+:\d+',item,flags=0 ) if matchObj!= None : item =time.strptime(item,'%Y/%m/%d %H:%M:%S') #print "returned time: %s " %item return item def ObjectGenerator(maxlinenum): filename='/home/thinkit/Documents/usr_info/USER.csv' attributes=[] linenum=1 a=UserInfo() file=open(filename) while linenum < maxlinenum: values=[] line=str.decode(file.readline(),'gb2312')#linecache.getline(filename, linenum,'gb2312') if line=='': print'reading fail! Please check filename!' break str_list=line.split(',') for item in str_list: item=item.strip() item=item.strip('\"') item=item.strip('\'') item=item.strip('+0*') item=catchTime(item) if linenum==1: attributes.append(item) else: values.append(item) if linenum==1: a.updateAttributes(attributes) else: a.updatePairs(values) yield a.attrilist #change to ' a ' to use linenum = linenum +1 if __name__ == '__main__': for n in ObjectGenerator(10): print n #输出字典,看是否正确
总结
以上就是这篇文章的全部内容,希望能对大家的学习或者工作带来一定帮助,如果有疑问大家可以留言交流,谢谢大家对PHP中文网的支持。
更多在python的类中动态添加属性与生成对象相关文章请关注PHP中文网!