Home  >  Article  >  Backend Development  >  Django batch imports xml data

Django batch imports xml data

不言
不言Original
2018-05-23 11:04:501863browse

Django background batch import data

In a production environment, there are often not a few or hundreds of pieces of data. So for example, if you import the employee numbers or account passwords of all the company's employees into the background, it is not recommended. Records are added one by one in the background

How to batch import svn records from xml

Step 1:

Build a model for the data

@python_2_unicode_compatible
class SVNLog(models.Model):

  vision = models.IntegerField(verbose_name=u"修订版本", blank=False, null=False,)
  author = models.CharField(verbose_name=u"作者", max_length=60, blank=True, null=True)
  date = models.DateTimeField(verbose_name=u"修订时间",null=True )
  msg = models.TextField(verbose_name=u"注释消息", blank=False, null=False, default=u"")
  paths = models.TextField(verbose_name=u"影响的文件", blank=False, null=False, default=u"")
  created_time = models.DateTimeField(verbose_name=u"创建时间", auto_now_add=True, )
  update_time = models.DateTimeField(verbose_name=u"修改时间", auto_now=True, )

  class Meta:
    ordering = ['revision']

  def __str__(self):
    return u'r%s' % (self.revision or u"", )

Now that the model has been established, let’s go Create models that accept our xml files

@python_2_unicode_compatible
class ImportLogFile(models.Model):

  LogFile = models.FileField(upload_to='LogFile')
  FileName = models.CharField(max_length=50, verbose_name=u'文件名')

  class Meta:
    ordering = ['FileName']

  def __str__(self):
    return self.FileName

ok. In the above code, we have defined the model of data and uploaded files

Synchronize the database

python manage.py makemigrations
python manage.py migrate

Then we modify admin.py Let us upload files from the background,

class ImportLogAdmin(admin.ModelAdmin):

  list_display = ('LogFile','FileName',)
  list_filter = ['FileName',]

  def save_model(self, request, obj, form, change):

    re = super(YDImportLogAdmin,self).save_model(request, obj, form, change)
    update_svn_log(self, request, obj, change)
    return re

Pay attention to save_model in the above code, here is the key, here I rewrite the save_model method in ModelAdmin
Because we need to upload files and read files, Parse the file and operate the database in one step. You can turn on debug. When uploading a file, the return parameter obj includes the path to upload the file. This path is also the key to our next step of parsing the file. Okay, let's do this Create a new utils.py under this app folder to operate the tool class we use to operate files and databases. For simplicity, I wrote the function as follows
First, paste the xml file we want to test

qwert2016-09-27T07:16:37.396449Z/aaa/README20160927 151630VisualSVN Server2016-09-20T05:03:12.861315Z/branches/tags/trunkhello word

Output result format

r2 | qwer | 2016-09-27 15:16:37 +0800 (二, 27 9 2016) | 1 line
Changed paths:
  A /xxx/README

20160927 151630
------------------------------------------------------------------------
r1 | VisualSVN Server | 2016-09-20 13:03:12 +0800 (二, 20 9 2016) | 1 line
Changed paths:
  A /branches
  A /tags
  A /trunk

Initial structure.
from .models import SVNLog
import xmltodict
def update_svn_log(self, request, obj, change):

  headers = ['r','a','d','m','p']
  filepath = obj.LogFile.path
  xmlfile = xmltodict.parse(open(filepath, 'r'))
  xml_logentry = xml.get('log').get('logentry')
  info_list = []
  pathlist = []
  sql_insert_list = []
  sql_update_list = []
  for j in xml:
    data_dict = {}
    # get path
    paths = j.get('paths').get('path')
    if isinstance(paths,list):
      for path in paths:
        action = path.get('@action')
        pathtext = path.get('#text')
        pathtext = action + ' ' + pathtext
        pathlist.append(pathtext)
        
      _filelist = u'\n'.join(pathlist)
      _paths = u"Changed paths:\n {}".format(_filelist)
      print _paths
    else:
      _filelist = paths.get('@action') + ' ' + paths.get('#text')
      _paths = u"Changed paths:\n {}".format(_filelist)
      print _paths
    # get revision
    vision = j.get('@vision')
    # get auth
    author = j.get('author')
    #get date
    date = j.get('date')
    #get msg
    msg = j.get('msg')

    data_dict[headers[0]] = int(vision)
    data_dict[headers[1]] = author
    data_dict[headers[2]] = date
    data_dict[headers[3]] = msg
    data_dict[headers[4]] = _paths
    info_list.append(data_dict)

  _svnlog = SVNLog.objects.filter().order_by('-vision').first()
  _last_version = _svnlog.vision if _svnlog else 0

  for value in info_list:
    vision = value['r']
    author = value['a']
    date = value['d']
    msg = value['m']
    paths = value['p']
    print vision,author
    _svnlog = YDSVNLog.objects.filter().order_by('-revision').first()
    _last_version = _svnlog.revision if _svnlog else 0
    if vision > _last_version:
      sql_insert_list.append(SVNLog(revision=revision, author=author, date=date, msg = msg, paths = paths))
    else:
      sql_update_list.append(SVNLog(revision=revision, author=author, date=date, msg = msg, paths = paths))

  SVNLog.objects.bulk_create(sql_insert_list)
  SVNLog.objects.bulk_create(sql_update_list)

We use the third-party library xmltodict to parse xml. It parses the content into an efficient orderdict type, which is a sequenced dictionary
The more complicated thing in this xml is the path in the paths, because This xml contains two elements. The path of the first element only contains one path, and the paths in the second element contain three paths. Therefore, we need to judge when parsing and obtaining.

paths = j.get('paths').get('path')
if isinstance(paths,list):
  pass

We judge Is this path a list type? If so, then we will process it in a list way. If not, then we will process it in a single way. After obtaining it, we will process the result according to the output result format and then get other content

revision = j.get('@vision')
# get auth
author = j.get('author')
#get date
date = j.get('date')
#get msg
msg = j.get('msg')

Finally, we will store the obtained elements in the dictionary
Judge the current version number and the version number in the database in the loop,
If it is smaller than the original one, then we will perform the update operation, otherwise we will perform the insertion operation

Finally, bulk_create is used to operate the database, which avoids the waste of resources caused by database operations every time in the loop

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn