search
HomeBackend DevelopmentPHP Tutorial用 PHP5 轻松解析 XML_php技巧

用 sax 方式的时候,要自己构建3个函数,而且要直接用这三的函数来返回数据,要求较强的逻辑。在处理不同结构的 xml 的时候,还要重新进行构造这三个函数,麻烦!

用 dom 方式,倒是好些,但是他把每个节点都看作是一个 node,,操作起来要写好多的代码,麻烦!

网上有好多的开源的 xml 解析的类库,以前看过几个,但是心里总是觉得不踏实,感觉总是跟在别人的屁股后面。

这几天在搞 Java,挺累的,所以决定换换脑袋,写点 PHP 代码,为了防止以后 XML 解析过程再令我犯难,就花了一天的时间写了下面一个 XML 解析的类,于是就有了下面的东西。

实现方式是通过包装“sax方式的解析结果”来实现的。总的来说,对于我个人来说挺实用的,性能也还可以,基本上可以完成大多数的处理要求。

功能:
1\ 对基本的 XML 文件的节点进行 查询 / 添加 / 修改 / 删除 工作。
2\ 导出 XML 文件的所有数据到一个数组里面。
3\ 整个设计采用了 OO 方式,在操作结果集的时候,使用方法类似于 dom

缺点:
1\ 每个节点最好都带有一个id(看后面的例子),每个“节点名字”=“节点的标签_节点的id”,如果这个 id 值没有设置,程序将自动给他产生一个 id,这个 id 就是这个节点在他的上级节点中的位置编号,从 0 开始。
2\ 查询某个节点的时候可以通过用“|”符号连接“节点名字”来进行。这些“节点名字”都是按顺序写好的上级节点的名字。

使用说明:
运行下面的例子,在执行结果页面上可以看到函数的使用说明

代码是通过 PHP5 来实现的,在 PHP4 中无法正常运行。

由于刚刚写完,所以没有整理文档,下面的例子演示的只是一部分的功能,代码不是很难,要是想知道更多的功能,可以研究研究源代码。

目录结构:

test.php
test.xml
xml / SimpleDocumentBase.php
xml / SimpleDocumentNode.php
xml / SimpleDocumentRoot.php
xml / SimpleDocumentParser.php
 
文件:test.xml



 华联
 

北京长安街-9999号

 连锁超市
 
 
   food11
   12.90
 

 
   food12
   22.10
   好东西推荐
 

 

 
 
   tel21
   1290
 

 

 
 
   coat31
   112
 

 
   coat32
   45
 

 

 
 
   hot41
   99
 

 

文件:test.php

require_once "xml/SimpleDocumentParser.php";
require_once "xml/SimpleDocumentBase.php";
require_once "xml/SimpleDocumentRoot.php";
require_once "xml/SimpleDocumentNode.php";
$test = new SimpleDocumentParser();
$test->parse("test.xml");
$dom = $test->getSimpleDocument();
echo "
";
echo "
";
echo "下面是通过函数getSaveData()返回的整个xml数据的数组";
echo "

";
print_r($dom->getSaveData());
echo "
";
echo "下面是通过setValue()函数,给给根节点添加信息,添加后显示出结果xml文件的内容";
echo "

";
$dom->setValue("telphone", "123456789");
echo htmlspecialchars($dom->getSaveXml());
echo "
";
echo "下面是通过getNode()函数,返回某一个分类下的所有商品的信息";
echo "

";
$obj = $dom->getNode("cat_food");
$nodeList = $obj->getNode();
foreach($nodeList as $node){
    $data = $node->getValue();
    echo "商品名:".$data[name]."
";
    print_R($data);
    print_R($node->getAttribute());
}
echo "
";
echo "下面是通过findNodeByPath()函数,返回某一商品的信息";
echo "

";
$obj = $dom->findNodeByPath("cat_food|goods_food11");
if(!is_object($obj)){
    echo "该商品不存在";
}else{
    $data = $obj->getValue();
    echo "商品名:".$data[name]."
";
    print_R($data);
    print_R($obj->getAttribute());
}
echo "
";
echo "下面是通过setValue()函数,给商品\"food11\"添加属性, 然后显示添加后的结果";
echo "

";
$obj = $dom->findNodeByPath("cat_food|goods_food11");
$obj->setValue("leaveword", array("value"=>"这个商品不错", "attrs"=>array("author"=>"hahawen", "date"=>date('Y-m-d'))));
echo htmlspecialchars($dom->getSaveXml());
echo "
";
echo "下面是通过removeValue()/removeAttribute()函数,给商品\"food11\"改变和删除属性, 然后显示操作后的结果";
echo "

";
$obj = $dom->findNodeByPath("cat_food|goods_food12");
$obj->setValue("name", "new food12");
$obj->removeValue("desc");
echo htmlspecialchars($dom->getSaveXml());
echo "
";
echo "下面是通过createNode()函数,添加商品, 然后显示添加后的结果";
echo "

";
$obj = $dom->findNodeByPath("cat_food");
$newObj = $obj->createNode("goods", array("id"=>"food13"));
$newObj->setValue("name", "food13");
$newObj->setValue("price", 100);
echo htmlspecialchars($dom->getSaveXml());
echo "
";
echo "下面是通过removeNode()函数,删除商品, 然后显示删除后的结果";
echo "

";
$obj = $dom->findNodeByPath("cat_food");
$obj->removeNode("goods_food12");
echo htmlspecialchars($dom->getSaveXml());

?>
 
文件:SimpleDocumentParser.php
 
/**
 *================================================
 * 
 * @author     hahawen(大龄青年) 
 * @since      2004-12-04
 * @copyright  Copyright (c) 2004, NxCoder Group 
 *
 *================================================
 */
 /**
 * class SimpleDocumentParser
 * use SAX parse xml file, and build SimpleDocumentObject 
 * all this pachage's is work for xml file, and method is action as DOM.
 *
 * @package SmartWeb.common.xml
 * @version 1.0
 */
 class SimpleDocumentParser
 {
     private $domRootObject = null;
     private $currentNO = null;
     private $currentName  = null;
     private $currentValue = null;
     private $currentAttribute = null;
     public
     function getSimpleDocument()
     {
         return $this->domRootObject;
     }
     public function parse($file)
     {
         $xmlParser = xml_parser_create();
         xml_parser_set_option($xmlParser,XML_OPTION_CASE_FOLDING,
         0);
         xml_parser_set_option($xmlParser,XML_OPTION_SKIP_WHITE, 1);
         xml_parser_set_option($xmlParser,
         XML_OPTION_TARGET_ENCODING, 'UTF-8');
         xml_set_object($xmlParser, $this);
         xml_set_element_handler($xmlParser, "startElement", "endElement");
         xml_set_character_data_handler($xmlParser,
         "characterData");
         if (!xml_parse($xmlParser, file_get_contents($file)))
         die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xmlParser)),
         xml_get_current_line_number($xmlParser)));
         xml_parser_free($xmlParser);
     }
     private function startElement($parser, $name, $attrs)
     {
         $this->currentName = $name;
         $this->currentAttribute = $attrs;
         if($this->currentNO == null)
         {
             $this->domRootObject = new SimpleDocumentRoot($name);
             $this->currentNO = $this->domRootObject;
         }
         else
         {
             $this->currentNO = $this->currentNO->createNode($name, $attrs);
         }
     }
     private function endElement($parser, $name)
     {
         if($this->currentName==$name)
         {
             $tag = $this->currentNO->getSeq();
             $this->currentNO  = $this->currentNO->getPNodeObject();
             if($this->currentAttribute!=null && sizeof($this->currentAttribute)>0)
             $this->currentNO->setValue($name, array('value'=>$this->currentValue, 'attrs'=>$this->currentAttribute));
             else
             $this->currentNO->setValue($name, $this->currentValue);
             $this->currentNO->removeNode($tag);
         }
         else
         {
             $this->currentNO = (is_a($this->currentNO, 'SimpleDocumentRoot'))?   null:
             $this->currentNO->getPNodeObject();
         }
     }
     private function characterData($parser,  $data)
     {
         $this->currentValue = iconv('UTF-8', 'GB2312', $data);
     }

     function __destruct()
     {
         unset($this->domRootObject);
     }
 }
?>
 
文件:SimpleDocumentBase.php
 
/**
 *=================================================
 *
 * @author     hahawen(大龄青年) 
 * @since      2004-12-04
 * @copyright  Copyright (c) 2004, NxCoder Group
 *
 *=================================================
 */
 /**
 * abstract class SimpleDocumentBase
 * base class for xml file parse
 * all this pachage's is work for xml file, and method is action as DOM.
 *
 * 1\ add/update/remove data of xml file.
 * 2\ explode data to array.
 * 3\ rebuild xml file
 *
 * @package SmartWeb.common.xml
 * @abstract
 * @version 1.0
 */
 abstract class SimpleDocumentBase
 {
     private $nodeTag = null;
     private $attributes = array();
     private $values =
     array();
     private $nodes = array();
     function __construct($nodeTag)
     {
         $this->nodeTag = $nodeTag;
     }
     public function getNodeTag()
     {
         return $this->nodeTag;
     }
     public function setValues($values){
         $this->values = $values;
     }
     public function setValue($name, $value)
     {
         $this->values[$name] = $value;
     }
     public function getValue($name=null)
     {
         return $name==null?
         $this->values: $this->values[$name];
     }
 
     public function removeValue($name)
     {
         unset($this->values["$name"]);
     }
     public function setAttributes($attributes){
         $this->attributes = $attributes;
     }
     public function setAttribute($name, $value)
     {
         $this->attributes[$name] = $value;
     }
     public function getAttribute($name=null)
     {
         return $name==null? $this->attributes:
         $this->attributes[$name];
     }
     public function removeAttribute($name)
     {
         unset($this->attributes["$name"]);
     }
     public function getNodesSize()
     {
         return sizeof($this->nodes);
     }
     protected function setNode($name, $nodeId)
     {
         $this->nodes[$name]
         = $nodeId;
     }
     public abstract function createNode($name, $attributes);
     public abstract function removeNode($name);
     public abstract function getNode($name=null);
     protected function getNodeId($name=null)
     {
         return $name==null? $this->nodes: $this->nodes[$name];
     }
     protected function createNodeByName($rootNodeObj, $name, $attributes, $pId)
     {
         $tmpObject = $rootNodeObj->createNodeObject($pId, $name, $attributes);
         $key = isset($attributes[id])?
         $name.'_'.$attributes[id]: $name.'_'.$this->getNodesSize();
         $this->setNode($key, $tmpObject->getSeq());
         return $tmpObject;
     }
     protected function removeNodeByName($rootNodeObj, $name)
     {
         $rootNodeObj->removeNodeById($this->getNodeId($name));
         if(sizeof($this->nodes)==1)
         $this->nodes = array();
         else
         unset($this->nodes[$name]);
     }
     protected function getNodeByName($rootNodeObj, $name=null)
     {
         if($name==null)
         {
             $tmpList = array();
             $tmpIds = $this->getNodeId();
             foreach($tmpIds as $key=>$id)
             $tmpList[$key] = $rootNodeObj->getNodeById($id);
             return $tmpList;
         }
         else
         {
             $id = $this->getNodeId($name);
             if($id===null)
             {
                 $tmpIds = $this->getNodeId();
                 foreach($tmpIds as $tkey=>$tid)
                 {
                     if(strpos($key, $name)==0)
                     {
                         $id = $tid;
                         break;
                     }
                 }
             }
             return $rootNodeObj->getNodeById($id);
         }
     }
     public function findNodeByPath($path)
     {
         $pos = strpos($path, '|');
         if($pos         {
             return $this->getNode($path);

         }
         else
         {
             $tmpObj = $this->getNode(substr($path, 0,
             $pos));
             return is_object($tmpObj)?
             $tmpObj->findNodeByPath(substr($path,
             $pos+1)):
             null;
         }
     }
     public function getSaveData()
     {
         $data = $this->values;
         if(sizeof($this->attributes)>0)
         $data[attrs] = $this->attributes;
         $nodeList = $this->getNode();

         if($nodeList==null)
         return $data;
         foreach($nodeList as $key=>$node)
         {
             $data[$key] = $node->getSaveData();
         }
         return $data;
     }

     public function getSaveXml($level=0)
     {
         $prefixSpace
         = str_pad("",
         $level, "\t");
         $str = "$prefixSpacenodeTag";
 
         foreach($this->attributes as $key=>$value)
         $str .= " $key=\"$value\"";
         $str .= ">\r\n";

         foreach($this->values as $key=>$value){
             if(is_array($value))
             {
                 $str .= "$prefixSpace\t
                 foreach($value[attrs] as $attkey=>$attvalue)
                 $str .= " $attkey=\"$attvalue\"";
                 $tmpStr = $value[value];

             }
             else
             {
                 $str .= "$prefixSpace\t
                 $tmpStr = $value;
             }
             $tmpStr = trim(trim($tmpStr, "\r\n"));
             $str .= ($tmpStr===null || $tmpStr==="")? " />\r\n": ">$tmpStr$key>\r\n";
         }
         foreach($this->getNode() as $node)
         $str .= $node->getSaveXml($level+1)."\r\n";

         $str .= "$prefixSpace$this->nodeTag>";
         return $str;
     }
 
     function __destruct()
     {
         unset($this->nodes, $this->attributes, $this->values);
     }
 }
?>

 
文件:SimpleDocumentRoot.php
 
/**
 *==============================================
 *
 * @author     hahawen(大龄青年) 
 * @since      2004-12-04
 * @copyright  Copyright (c) 2004, NxCoder Group
 *
 *==============================================
 */
 /**
 * class SimpleDocumentRoot
 * xml root class, include values/attributes/subnodes.
 * all this pachage's is work for xml file, and method is action as DOM.
 *
 * @package SmartWeb.common.xml
 * @version 1.0
 */
class SimpleDocumentRoot extends SimpleDocumentBase
{
    private $prefixStr = '';
    private $nodeLists = array();
    function __construct($nodeTag)
    {
        parent::__construct($nodeTag);
    }
    public function createNodeObject($pNodeId, $name, $attributes)
    {
        $seq = sizeof($this->nodeLists);
        $tmpObject = new SimpleDocumentNode($this,
        $pNodeId, $name, $seq);
        $tmpObject->setAttributes($attributes);
        $this->nodeLists[$seq] = $tmpObject;
        return $tmpObject;
    }
    public function removeNodeById($id)
    {
        if(sizeof($this->nodeLists)==1)
        $this->nodeLists = array();
        else
        unset($this->nodeLists[$id]);
    }
    public function getNodeById($id)
    {
        return $this->nodeLists[$id];
    }
    public function createNode($name, $attributes)
    {
        return $this->createNodeByName($this, $name, $attributes, -1);
    }
    public function removeNode($name)
    {
        return $this->removeNodeByName($this, $name);
    }
    public function getNode($name=null)
    {
        return $this->getNodeByName($this, $name);
    }
    public function getSaveXml()
    {
        $prefixSpace = "";
        $str = $this->prefixStr."\r\n";
        return $str.parent::getSaveXml(0);
    }
}
?>
 
文件:SimpleDocumentNode.php
 
/**
 *===============================================
 *
 * @author     hahawen(大龄青年) 
 * @since      2004-12-04
 * @copyright  Copyright (c) 2004, NxCoder Group
 *
 *===============================================
 */
 /**
 * class SimpleDocumentNode
 * xml Node class, include values/attributes/subnodes.
 * all this pachage's is work for xml file, and method is action as DOM.
 *
 * @package SmartWeb.common.xml
 * @version 1.0
 */
 class SimpleDocumentNode extends SimpleDocumentBase
 {
     private $seq = null;
     private $rootObject = null;
     private $pNodeId = null;
     function __construct($rootObject, $pNodeId, $nodeTag, $seq)
     {
         parent::__construct($nodeTag);
         $this->rootObject = $rootObject;
         $this->pNodeId = $pNodeId;
         $this->seq = $seq;
     }
     public function getPNodeObject()
     {
         return ($this->pNodeId==-1)?
         $this->rootObject:
         $this->rootObject->getNodeById($this->pNodeId);
     }
     public function getSeq(){
         return $this->seq;
     }
     public function createNode($name, $attributes)
     {
         return $this->createNodeByName($this->rootObject,
         $name, $attributes,
         $this->getSeq());
     }
     public function removeNode($name)
     {
         return $this->removeNodeByName($this->rootObject, $name);
     }

     public function getNode($name=null)
     {
         return $this->getNodeByName($this->rootObject,
         $name);
     }
 }
?>

 
下面是例子运行对结果
 

下面是通过函数getSaveData()返回的整个xml数据的数组
Array
(
    [name] => 华联
    [address] => 北京长安街-9999号
    [desc] => 连锁超市
    [cat_food] => Array
        (
            [attrs] => Array
                (
                    [id] => food
                )

            [goods_food11] => Array
                (
                    [name] => food11
                    [price] => 12.90
                    [attrs] => Array
                        (
                            [id] => food11
                        )

                )

            [goods_food12] => Array
                (
                    [name] => food12
                    [price] => 22.10
                    [desc] => Array
                        (
                            [value] => 好东西推荐
                            [attrs] => Array
                                (
                                    [creator] => hahawen
                                )

                        )

                    [attrs] => Array
                        (
                            [id] => food12
                        )

                )

        )

    [cat_1] => Array
        (
            [goods_tel21] => Array
                (
                    [name] => tel21
                    [price] => 1290
                    [attrs] => Array
                        (
                            [id] => tel21
                        )

                )

        )

    [cat_coat] => Array
        (
            [attrs] => Array
                (
                    [id] => coat
                )

            [goods_coat31] => Array
                (
                    [name] => coat31
                    [price] => 112
                    [attrs] => Array
                        (
                            [id] => coat31
                        )

                )

            [goods_coat32] => Array
                (
                    [name] => coat32
                    [price] => 45
                    [attrs] => Array
                        (
                            [id] => coat32
                        )

                )

        )

    [special_hot] => Array
        (
            [attrs] => Array
                (
                    [id] => hot
                )

            [goods_0] => Array
                (
                    [name] => hot41
                    [price] => 99
                )

        )

)

下面是通过setValue()函数,给给根节点添加信息,添加后显示出结果xml文件的内容


华联
北京长安街-9999号

连锁超市
123456789

 
   food11
   12.90
 

 
   food12
   22.10
   好东西推荐
 



 
   tel21
   1290
 



 
   coat31
   112
 

 
   coat32
   45
 



 
   hot41
   99
 



下面是通过getNode()函数,返回某一个分类下的所有商品的信息
商品名:food11
Array
(
    [name] => food11
    [price] => 12.90
)
Array
(
    [id] => food11
)
商品名:food12
Array
(
    [name] => food12
    [price] => 22.10
    [desc] => Array
        (
            [value] => 好东西推荐
            [attrs] => Array
                (
                    [creator] => hahawen
                )

        )

)
Array
(
    [id] => food12
)

下面是通过findNodeByPath()函数,返回某一商品的信息
商品名:food11
Array
(
    [name] => food11
    [price] => 12.90
)
Array
(
    [id] => food11
)

下面是通过setValue()函数,给商品"food11"添加属性, 然后显示添加后的结果


华联
北京长安街-9999号

连锁超市
123456789

 
   food11
   12.90
   这个商品不错
 

 
   food12
   22.10
   好东西推荐
 



 
   tel21
   1290
 



 
   coat31
   112
 

 
   coat32
   45
 



 
   hot41
   99
 



下面是通过removeValue()/removeAttribute()函数,给商品"food11"改变和删除属性, 然后显示操作后的结果


华联
北京长安街-9999号

连锁超市
123456789

 
   food11
   12.90
   这个商品不错
 

 
   new food12
   22.10
 



 
   tel21
   1290
 



 
   coat31
   112
 

 
   coat32
   45
 



 
   hot41
   99
 



下面是通过createNode()函数,添加商品, 然后显示添加后的结果


华联
北京长安街-9999号

连锁超市
123456789

 
   food11
   12.90
   这个商品不错
 

 
   new food12
   22.10
 

 
   food13
   100
 



 
   tel21
   1290
 



 
   coat31
   112
 

 
   coat32
   45
 



 
   hot41
   99
 



下面是通过removeNode()函数,删除商品, 然后显示删除后的结果


华联
北京长安街-9999号

连锁超市
123456789

 
   food11
   12.90
   这个商品不错
 

 
   food13
   100
 



 
   tel21
   1290
 



 
   coat31
   112
 

 
   coat32
   45
 



 
   hot41
   99
 


Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php5和php8有什么区别php5和php8有什么区别Sep 25, 2023 pm 01:34 PM

php5和php8的区别在性能、语言结构、类型系统、错误处理、异步编程、标准库函数和安全性等方面。详细介绍:1、性能提升,PHP8相对于PHP5来说在性能方面有了巨大的提升,PHP8引入了JIT编译器,可以对一些高频执行的代码进行编译和优化,从而提高运行速度;2、语言结构改进,PHP8引入了一些新的语言结构和功能,PHP8支持命名参数,允许开发者通过参数名而不是参数顺序等等。

XML外部实体注入漏洞的示例分析XML外部实体注入漏洞的示例分析May 11, 2023 pm 04:55 PM

一、XML外部实体注入XML外部实体注入漏洞也就是我们常说的XXE漏洞。XML作为一种使用较为广泛的数据传输格式,很多应用程序都包含有处理xml数据的代码,默认情况下,许多过时的或配置不当的XML处理器都会对外部实体进行引用。如果攻击者可以上传XML文档或者在XML文档中添加恶意内容,通过易受攻击的代码、依赖项或集成,就能够攻击包含缺陷的XML处理器。XXE漏洞的出现和开发语言无关,只要是应用程序中对xml数据做了解析,而这些数据又受用户控制,那么应用程序都可能受到XXE攻击。本篇文章以java

php如何将xml转为json格式?3种方法分享php如何将xml转为json格式?3种方法分享Mar 22, 2023 am 10:38 AM

当我们处理数据时经常会遇到将XML格式转换为JSON格式的需求。PHP有许多内置函数可以帮助我们执行这个操作。在本文中,我们将讨论将XML格式转换为JSON格式的不同方法。

Python中xmltodict对xml的操作方式是什么Python中xmltodict对xml的操作方式是什么May 04, 2023 pm 06:04 PM

Pythonxmltodict对xml的操作xmltodict是另一个简易的库,它致力于将XML变得像JSON.下面是一个简单的示例XML文件:elementsmoreelementselementaswell这是第三方包,在处理前先用pip来安装pipinstallxmltodict可以像下面这样访问里面的元素,属性及值:importxmltodictwithopen("test.xml")asfd:#将XML文件装载到dict里面doc=xmltodict.parse(f

使用nmap-converter将nmap扫描结果XML转化为XLS实战的示例分析使用nmap-converter将nmap扫描结果XML转化为XLS实战的示例分析May 17, 2023 pm 01:04 PM

使用nmap-converter将nmap扫描结果XML转化为XLS实战1、前言作为网络安全从业人员,有时候需要使用端口扫描利器nmap进行大批量端口扫描,但Nmap的输出结果为.nmap、.xml和.gnmap三种格式,还有夹杂很多不需要的信息,处理起来十分不方便,而将输出结果转换为Excel表格,方面处理后期输出。因此,有技术大牛分享了将nmap报告转换为XLS的Python脚本。2、nmap-converter1)项目地址:https://github.com/mrschyte/nmap-

Python中怎么对XML文件的编码进行转换Python中怎么对XML文件的编码进行转换May 21, 2023 pm 12:22 PM

1.在Python中XML文件的编码问题1.Python使用的xml.etree.ElementTree库只支持解析和生成标准的UTF-8格式的编码2.常见GBK或GB2312等中文编码的XML文件,用以在老旧系统中保证XML对中文字符的记录能力3.XML文件开头有标识头,标识头指定了程序处理XML时应该使用的编码4.要修改编码,不仅要修改文件整体的编码,还要将标识头中encoding部分的值修改2.处理PythonXML文件的思路1.读取&解码:使用二进制模式读取XML文件,将文件变为

xml中node和element的区别是什么xml中node和element的区别是什么Apr 19, 2022 pm 06:06 PM

xml中node和element的区别是:Element是元素,是一个小范围的定义,是数据的组成部分之一,必须是包含完整信息的结点才是元素;而Node是节点,是相对于TREE数据结构而言的,一个结点不一定是一个元素,一个元素一定是一个结点。

深度使用Scrapy:如何爬取HTML、XML、JSON数据?深度使用Scrapy:如何爬取HTML、XML、JSON数据?Jun 22, 2023 pm 05:58 PM

Scrapy是一款强大的Python爬虫框架,可以帮助我们快速、灵活地获取互联网上的数据。在实际爬取过程中,我们会经常遇到HTML、XML、JSON等各种数据格式。在这篇文章中,我们将介绍如何使用Scrapy分别爬取这三种数据格式的方法。一、爬取HTML数据创建Scrapy项目首先,我们需要创建一个Scrapy项目。打开命令行,输入以下命令:scrapys

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)