Home >Backend Development >Python Tutorial >How can I convert XML to a Pandas DataFrame efficiently?

How can I convert XML to a Pandas DataFrame efficiently?

Barbara Streisand
Barbara StreisandOriginal
2024-11-30 19:46:11701browse

How can I convert XML to a Pandas DataFrame efficiently?

Converting XML to a Pandas DataFrame Efficiently

XML files can often contain valuable data that can be analyzed using tools such as Pandas. To convert an XML file to a DataFrame, an effective approach can be found below:

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

Explanation:

  • The iter_docs generator function iterates over the XML document, extracting author attributes, document attributes, and the content of the document's text node into a dictionary.
  • The Pandas DataFrame is then constructed from a list of dictionaries generated by the iter_docs function.

Additional Notes:

The example XML provided in the question assumes a single author. If there are multiple authors, an additional generator function, iter_author, can be used to iterate over each author and yield all their respective document dictionaries. This would require modifying the last line of the example code to:

doc_df = pd.DataFrame(list(iter_author(etree)))

For further guidance on working with XML in Python, refer to the ElementTree tutorial in the xml library documentation.

The above is the detailed content of How can I convert XML to a Pandas DataFrame efficiently?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn