<a target="blank"href="http://a.b.c.d/abc.php?viewkey=11111111111d5c2a51d1e2&page=1&viewtype=basic&category=rf"></a>
<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&page=1&viewtype=basic&category=rf"></a>
<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&page=1&viewtype=basic&category=rf"></a>
<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&"></a>
<a target="blank"href="http://a.b.c.d/abc"></a>
<a target="blank"href="http://a.b.c.d/123"></a>
I want to get the link in href
The first three of the 6 links meet the conditions. How to write the regular form (that is, the link must have the parameter viewkey page viewtype category)
The second and third links are the same. How to repeat them (under python)
黄舟2017-05-18 10:53:20
# python 2.7
import re
a = '''<a target="blank"href="http://a.b.c.d/abc.php?viewkey=11111111111d5c2a51d1e2&page=1&viewtype=basic&category=rf"></a>
<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&page=1&viewtype=basic&category=rf"></a>
<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&page=1&viewtype=basic&category=rf"></a>
<a target="blank"href="http://a.b.c.d/abc.php?viewkey=6d7a7f6a6e9c2a5191e2&"></a>
<a target="blank"href="http://a.b.c.d/abc"></a>
<a target="blank"href="http://a.b.c.d/123"></a>'''
print set(re.findall('''(?=.*(?:viewkey))(?=.*(?:page))(?=.*(?:viewtype))(?=.*(?:category))href=["']([^'"]+)''', a))
伊谢尔伦2017-05-18 10:53:20
Extract the first three links:
links= re.findall(r'href=\"(.*?=rf)\"',l_string,re.S)
Remove duplicates:
new_links=set(links)