搜尋

首頁  >  問答  >  主體

python正規怎麼提取域名

<script type="application/ld+json">{
    "@context": "http://schema.org",
    "@type": "SaleEvent",
    "name": "10% Off First Orders",
    "url": "https://www.myvouchercodes.co.uk/coggles",
    "image": "https://mvp.tribesgds.com/dyn/oh/Ow/ohOwXIWglMg/_/mQR5xLX5go8/m0Ys/coggles-logo.png",
    "startDate": "2017-02-17",
    "endDate": "2017-12-31",
    "location": {
        "@type": "Place",
        "name": "Coggles",
        "url": "coggles.co.uk",
        "address": "Coggles"
    },
    "description": "Get the top branded fashion items from Coggles at discounted prices. Apply this code and enjoy savings on your purchase.",
    "eventStatus": "EventScheduled"
}</script>

怎麼用python正則從這段腳本中提取coggles.co.uk域名呢,望各路高手指點顯示下身手...

淡淡烟草味淡淡烟草味2758 天前1012

全部回覆(2)我來回復

  • ringa_lee

    ringa_lee2017-06-22 11:53:53

    正規實現的話只要保證你的標定/特徵是唯一的就好。但是"url"這個標誌又不是唯一的。這時候@prolifes的方法是很好的。

    如果一定要正規實作呢,要用到零寬斷言(zero-width assertions),當然這個字的翻譯比較直,帶來很多誤解。它其實意思是指定位置的匹配,位置的寬度就是0嘛。

    這裡我們可以看到我們所需的這個"url""location"裡面,可以以此為位置資訊。

    程式碼如下:

    re.search('(?<=location).+?"url": "([^"]+)"', string, re.DOTALL).group(1)

    稍微解釋一下,
    (?<=location)這個地方就是指前面得有location。後面有的話這樣寫:(?=location)
    re.DOTALL這個是必須的,因為這些字串已經跨行了。他的作用是將.的字串匹配範圍擴大,包含換行符。
    "([^"]+)"這個地方是我的習慣,[^"]意指所有非"的字符,這就匹配了雙引號中所有的字符串。

    回覆
    0
  • 世界只因有你

    世界只因有你2017-06-22 11:53:53

    這是一段挺標準的json,粗一點,直接轉換成json

    import json
    
    str = '''
    <script type="application/ld+json">{
        "@context": "http://schema.org",
        "@type": "SaleEvent",
        "name": "10% Off First Orders",
        "url": "https://www.myvouchercodes.co.uk/coggles",
        "image": "https://mvp.tribesgds.com/dyn/oh/Ow/ohOwXIWglMg/_/mQR5xLX5go8/m0Ys/coggles-logo.png",
        "startDate": "2017-02-17",
        "endDate": "2017-12-31",
        "location": {
            "@type": "Place",
            "name": "Coggles",
            "url": "coggles.co.uk",
            "address": "Coggles"
        },
        "description": "Get the top branded fashion items from Coggles at discounted prices. Apply this code and enjoy savings on your purchase.",
        "eventStatus": "EventScheduled"
    }</script>
    '''
    
    d = json.loads(re.search('({[\s\S]*})', str).group(1))
    print d['location']['url']

    回覆
    0
  • 取消回覆