search

Home  >  Q&A  >  body text

How to write regular a tag url (python or js)

<a target="blank"href="http://a.b.c.d/abc.php?viewkey=11111111111d5c2a51d1e2&amp;page=1&amp;viewtype=basic&amp;category=rf"></a>
<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&amp;page=1&amp;viewtype=basic&amp;category=rf"></a>

<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&amp;page=1&amp;viewtype=basic&amp;category=rf"></a>


<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&amp"></a>

<a target="blank"href="http://a.b.c.d/abc"></a>


<a target="blank"href="http://a.b.c.d/123"></a>

I want to get the link in href

The first three of the 6 links meet the conditions. How to write the regular form (that is, the link must have the parameter viewkey page viewtype category)

The second and third links are the same. How to repeat them (under python)

怪我咯怪我咯2757 days ago673

reply all(2)I'll reply

  • 黄舟

    黄舟2017-05-18 10:53:20

    # python 2.7
    
    import re
    
    a = '''<a target="blank"href="http://a.b.c.d/abc.php?viewkey=11111111111d5c2a51d1e2&amp;page=1&amp;viewtype=basic&amp;category=rf"></a>
    <a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&amp;page=1&amp;viewtype=basic&amp;category=rf"></a>
    
    <a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&amp;page=1&amp;viewtype=basic&amp;category=rf"></a>
    
    
    <a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&amp"></a>
    
    <a target="blank"href="http://a.b.c.d/abc"></a>
    
    
    <a target="blank"href="http://a.b.c.d/123"></a>'''
    
    print set(re.findall('''(?=.*(?:viewkey))(?=.*(?:page))(?=.*(?:viewtype))(?=.*(?:category))href=["']([^'"]+)''', a))

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-05-18 10:53:20

    Extract the first three links:

    links= re.findall(r'href=\"(.*?=rf)\"',l_string,re.S)

    Remove duplicates:

    new_links=set(links)
    
    

    reply
    0
  • Cancelreply