search

Home  >  Q&A  >  body text

python - Why does lxml.etree automatically add plus</i>?

I am learning lxml, the code is as follows:

from lxml import etree
text = '''
<i class="cell maincell">
    <p class="title">
        <a target="_blank" href="https://itjuzi.com/company/60321">
            <span>洋鼹鼠</span>
        </a>
    </p>
    <p>
        <span class="tags t-small c-gray-aset">
            <a href="https://itjuzi.com/investevents?scope=145">电子商务</a>
        </span>
        <span class="loca c-gray-aset t-small">
            <a href="https://itjuzi.com/investevents?prov=天津">天津</a>
        </span>
    </p>
</i>
'''
html = etree.HTML(text)
print(etree.tostring(html,encoding='utf-8').decode('utf-8'))

The output is as follows:

<html><body><i class="cell maincell">
    </i><p class="title">
        <a target="_blank" href="https://itjuzi.com/company/60321">
            <span>洋鼹鼠</span>
        </a>
    </p>
    <p>
        <span class="tags t-small c-gray-aset">
            <a href="https://itjuzi.com/investevents?scope=145">电子商务</a>
        </span>
        <span class="loca c-gray-aset t-small">
            <a href="https://itjuzi.com/investevents?prov=天津">天津</a>
        </span>
    </p>

</body></html>

Mainly I don’t understand why there is an error in the <i> tag? How to solve this problem? Thank you~

学习ing学习ing2757 days ago793

reply all(1)I'll reply

  • PHP中文网

    PHP中文网2017-06-22 11:54:40

    Mainly because

    p element
    Content classification Flow content, palpable content.
    Allowed content Phrasing content.
    Allowed parent elements Any element that accepts flow content

    i Element
    Content catergories Flow content, phrasing content, palpable content.
    Allowance phrasing content.

    Obviously the parent element of the P element should be of flow content type, but i does not meet the conditions, which means it does not comply with the specification.
    The solution is to replace i directly with p.

    reply
    0
  • Cancelreply