<script type="application/ld+json">{
"@context": "http://schema.org",
"@type": "SaleEvent",
"name": "10% Off First Orders",
"url": "https://www.myvouchercodes.co.uk/coggles",
"image": "https://mvp.tribesgds.com/dyn/oh/Ow/ohOwXIWglMg/_/mQR5xLX5go8/m0Ys/coggles-logo.png",
"startDate": "2017-02-17",
"endDate": "2017-12-31",
"location": {
"@type": "Place",
"name": "Coggles",
"url": "coggles.co.uk",
"address": "Coggles"
},
"description": "Get the top branded fashion items from Coggles at discounted prices. Apply this code and enjoy savings on your purchase.",
"eventStatus": "EventScheduled"
}</script>
How to use python regular expression to extract the coggles.co.uk domain name from this script? I hope experts from all walks of life can show me their skills...
ringa_lee2017-06-22 11:53:53
When implementing regularization, just make sure that your calibration/features are unique. But the symbol "url"
is not the only one. At this time @prolifes' method is very good.
If you must implement regular implementation, you need to use zero-width assertions. Of course, the translation of this word is relatively straightforward, which leads to many misunderstandings. It actually means matching at the specified position, and the width of the position is 0.
Here we can see the "url"
we need in "location"
, which can be used as location information.
The code is as follows:
re.search('(?<=location).+?"url": "([^"]+)"', string, re.DOTALL).group(1)
Let me explain a little bit, (?<=location)
This place means that there must be a location in front. If there is any later, write it like this: (?=location)
re.DOTALL
This is necessary because these strings have crossed lines. Its function is to expand the string matching range of .
to include newlines. "([^"]+)"
This place is my habit, [^"]
means all characters that are not "
, which matches all strings in double quotes.
世界只因有你2017-06-22 11:53:53
This is a pretty standard json, if you want to be more rough, convert it directly into json
import json
str = '''
<script type="application/ld+json">{
"@context": "http://schema.org",
"@type": "SaleEvent",
"name": "10% Off First Orders",
"url": "https://www.myvouchercodes.co.uk/coggles",
"image": "https://mvp.tribesgds.com/dyn/oh/Ow/ohOwXIWglMg/_/mQR5xLX5go8/m0Ys/coggles-logo.png",
"startDate": "2017-02-17",
"endDate": "2017-12-31",
"location": {
"@type": "Place",
"name": "Coggles",
"url": "coggles.co.uk",
"address": "Coggles"
},
"description": "Get the top branded fashion items from Coggles at discounted prices. Apply this code and enjoy savings on your purchase.",
"eventStatus": "EventScheduled"
}</script>
'''
d = json.loads(re.search('({[\s\S]*})', str).group(1))
print d['location']['url']