Home > Article > Backend Development > PHP regular expression problem
1. How to correctly match the bookmark backup files (html format) exported by Google Chrome and Firefox, and correctly read the superior-subordinate relationship between category directories and bookmarks?
2. Exported backup bookmark format:
<code><DT><H3 FOLDED ADD_DATE="1467374152" FAV_POS="0">技术性网页</H3> <DL><p> <DT><H3 FOLDED ADD_DATE="1467374152" FAV_POS="0">Hacker</H3> <DL><p> <DT><A HREF="http://blog.knowsky.com/192993.htm" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="2" >SQL 手工注入大全</A> <DT><A HREF="http://www.2cto.com/Article/201207/139493.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="6" >Burp Suite详细使用教程-Intruder模块详解 - 软件工具 - 红黑联盟</A> </DL><p> <DT><H3 FOLDED ADD_DATE="1467374152" FAV_POS="0">安卓开发</H3> <DL><p> <DT><A HREF="http://www.2cto.com/kf/201310/249684.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="1" >adb not responding you can wait more - Android移动开发技术文章_手机开发 - 红黑联盟</A> </DL><p> <DT><A HREF="http://www.oschina.net/code/list?lang=php&catalog=&show=time" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="3" >代码分享列表 -- PHP - 开源中国社区</A> <DT><A HREF="http://www.yiibai.com/" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="38" >易百教程 - 专注于IT教程和实例</A> </DL><p> <DT><A HREF="https://www.apachefriends.org/zh_cn/index.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="45" >XAMPP Installers and Downloads for Apache Friends</A> <DT><A HREF="https://www.apachefriends.org/zh_cn/index.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="45" >XAMPP Installers and Downloads for Apache Friends</A> <DT><A HREF="https://www.apachefriends.org/zh_cn/index.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="45" >XAMPP Installers and Downloads for Apache Friends</A></code>
I tried to use regular expression matching, but I didn’t learn it well enough and couldn’t get the superior-subordinate relationship correctly. Could you please give me some advice?
1. How to correctly match the bookmark backup files (html format) exported by Google Chrome and Firefox, and correctly read the superior-subordinate relationship between category directories and bookmarks?
2. Exported backup bookmark format:
<code><DT><H3 FOLDED ADD_DATE="1467374152" FAV_POS="0">技术性网页</H3> <DL><p> <DT><H3 FOLDED ADD_DATE="1467374152" FAV_POS="0">Hacker</H3> <DL><p> <DT><A HREF="http://blog.knowsky.com/192993.htm" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="2" >SQL 手工注入大全</A> <DT><A HREF="http://www.2cto.com/Article/201207/139493.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="6" >Burp Suite详细使用教程-Intruder模块详解 - 软件工具 - 红黑联盟</A> </DL><p> <DT><H3 FOLDED ADD_DATE="1467374152" FAV_POS="0">安卓开发</H3> <DL><p> <DT><A HREF="http://www.2cto.com/kf/201310/249684.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="1" >adb not responding you can wait more - Android移动开发技术文章_手机开发 - 红黑联盟</A> </DL><p> <DT><A HREF="http://www.oschina.net/code/list?lang=php&catalog=&show=time" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="3" >代码分享列表 -- PHP - 开源中国社区</A> <DT><A HREF="http://www.yiibai.com/" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="38" >易百教程 - 专注于IT教程和实例</A> </DL><p> <DT><A HREF="https://www.apachefriends.org/zh_cn/index.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="45" >XAMPP Installers and Downloads for Apache Friends</A> <DT><A HREF="https://www.apachefriends.org/zh_cn/index.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="45" >XAMPP Installers and Downloads for Apache Friends</A> <DT><A HREF="https://www.apachefriends.org/zh_cn/index.html" ADD_DATE="1467374152" LAST_VISIT="0" LAST_MODIFIED="1467374152" LOVEFAV="0" FAV_POS="45" >XAMPP Installers and Downloads for Apache Friends</A></code>
I tried to use regular expression matching, but I didn’t learn it well enough and couldn’t get the superior-subordinate relationship correctly. Could you please give me some advice?
I can only say that it is better to use DOM to parse this form using regular expressions...