Home >Backend Development >Python Tutorial >Django batch imports xml data
Django background batch import data
In a production environment, there are often not a few or hundreds of pieces of data. So for example, if you import the employee numbers or account passwords of all the company's employees into the background, it is not recommended. Records are added one by one in the background
How to batch import svn records from xml
Step 1:
Build a model for the data
@python_2_unicode_compatible class SVNLog(models.Model): vision = models.IntegerField(verbose_name=u"修订版本", blank=False, null=False,) author = models.CharField(verbose_name=u"作者", max_length=60, blank=True, null=True) date = models.DateTimeField(verbose_name=u"修订时间",null=True ) msg = models.TextField(verbose_name=u"注释消息", blank=False, null=False, default=u"") paths = models.TextField(verbose_name=u"影响的文件", blank=False, null=False, default=u"") created_time = models.DateTimeField(verbose_name=u"创建时间", auto_now_add=True, ) update_time = models.DateTimeField(verbose_name=u"修改时间", auto_now=True, ) class Meta: ordering = ['revision'] def __str__(self): return u'r%s' % (self.revision or u"", )
Now that the model has been established, let’s go Create models that accept our xml files
@python_2_unicode_compatible class ImportLogFile(models.Model): LogFile = models.FileField(upload_to='LogFile') FileName = models.CharField(max_length=50, verbose_name=u'文件名') class Meta: ordering = ['FileName'] def __str__(self): return self.FileName
ok. In the above code, we have defined the model of data and uploaded files
Synchronize the database
python manage.py makemigrations python manage.py migrate
Then we modify admin.py Let us upload files from the background,
class ImportLogAdmin(admin.ModelAdmin): list_display = ('LogFile','FileName',) list_filter = ['FileName',] def save_model(self, request, obj, form, change): re = super(YDImportLogAdmin,self).save_model(request, obj, form, change) update_svn_log(self, request, obj, change) return re
Pay attention to save_model in the above code, here is the key, here I rewrite the save_model method in ModelAdmin
Because we need to upload files and read files, Parse the file and operate the database in one step. You can turn on debug. When uploading a file, the return parameter obj includes the path to upload the file. This path is also the key to our next step of parsing the file. Okay, let's do this Create a new utils.py under this app folder to operate the tool class we use to operate files and databases. For simplicity, I wrote the function as follows
First, paste the xml file we want to test
qwert2016-09-27T07:16:37.396449Z/aaa/README20160927 151630VisualSVN Server2016-09-20T05:03:12.861315Z/branches/tags/trunkhello word
Output result format
r2 | qwer | 2016-09-27 15:16:37 +0800 (二, 27 9 2016) | 1 line Changed paths: A /xxx/README 20160927 151630 ------------------------------------------------------------------------ r1 | VisualSVN Server | 2016-09-20 13:03:12 +0800 (二, 20 9 2016) | 1 line Changed paths: A /branches A /tags A /trunk Initial structure. from .models import SVNLog import xmltodict def update_svn_log(self, request, obj, change): headers = ['r','a','d','m','p'] filepath = obj.LogFile.path xmlfile = xmltodict.parse(open(filepath, 'r')) xml_logentry = xml.get('log').get('logentry') info_list = [] pathlist = [] sql_insert_list = [] sql_update_list = [] for j in xml: data_dict = {} # get path paths = j.get('paths').get('path') if isinstance(paths,list): for path in paths: action = path.get('@action') pathtext = path.get('#text') pathtext = action + ' ' + pathtext pathlist.append(pathtext) _filelist = u'\n'.join(pathlist) _paths = u"Changed paths:\n {}".format(_filelist) print _paths else: _filelist = paths.get('@action') + ' ' + paths.get('#text') _paths = u"Changed paths:\n {}".format(_filelist) print _paths # get revision vision = j.get('@vision') # get auth author = j.get('author') #get date date = j.get('date') #get msg msg = j.get('msg') data_dict[headers[0]] = int(vision) data_dict[headers[1]] = author data_dict[headers[2]] = date data_dict[headers[3]] = msg data_dict[headers[4]] = _paths info_list.append(data_dict) _svnlog = SVNLog.objects.filter().order_by('-vision').first() _last_version = _svnlog.vision if _svnlog else 0 for value in info_list: vision = value['r'] author = value['a'] date = value['d'] msg = value['m'] paths = value['p'] print vision,author _svnlog = YDSVNLog.objects.filter().order_by('-revision').first() _last_version = _svnlog.revision if _svnlog else 0 if vision > _last_version: sql_insert_list.append(SVNLog(revision=revision, author=author, date=date, msg = msg, paths = paths)) else: sql_update_list.append(SVNLog(revision=revision, author=author, date=date, msg = msg, paths = paths)) SVNLog.objects.bulk_create(sql_insert_list) SVNLog.objects.bulk_create(sql_update_list)
We use the third-party library xmltodict to parse xml. It parses the content into an efficient orderdict type, which is a sequenced dictionary
The more complicated thing in this xml is the path in the paths, because This xml contains two elements. The path of the first element only contains one path, and the paths in the second element contain three paths. Therefore, we need to judge when parsing and obtaining.
paths = j.get('paths').get('path') if isinstance(paths,list): pass
We judge Is this path a list type? If so, then we will process it in a list way. If not, then we will process it in a single way. After obtaining it, we will process the result according to the output result format and then get other content
revision = j.get('@vision') # get auth author = j.get('author') #get date date = j.get('date') #get msg msg = j.get('msg')
Finally, we will store the obtained elements in the dictionary
Judge the current version number and the version number in the database in the loop,
If it is smaller than the original one, then we will perform the update operation, otherwise we will perform the insertion operation
Finally, bulk_create is used to operate the database, which avoids the waste of resources caused by database operations every time in the loop