search
HomeOperation and MaintenanceSafetyUse lexical analysis to extract domain names and IPs

Use lexical analysis to extract domain names and IPs

Dec 25, 2019 pm 01:08 PM
ipdomain nameextractlexical analysis

Use lexical analysis to extract domain names and IPs

Background

When analyzing the logs, I found that some log parameters contained other URLs, for example:

Use lexical analysis to extract domain names and IPs

##Extract the URL (xss.ha.ckers.org) in the request parameters, and then compare it with the threat intelligence database. If it hits the blacklist, it will be blacklisted. If it is not in the blacklist or the company's whitelist, you can mark it first and focus on analysis later.

Extract URL

There are many articles on the Internet about URL extraction, most of which use regular expressions. The method is simple but not very accurate. I provide a method here: use lexical analysis to extract domain names and IPs. The idea is borrowed from this article:

https://blog.csdn.net/breaksoftware/article/details/7009209. If you are interested, you can take a look. Facts have proved that following the master really improves your posture.

The original text is in C version, here I wrote a similar one in Python for your reference.

Common URL classification

Use lexical analysis to extract domain names and IPs

Observation can be seen: the IP form of URL structure is the simplest: 4 numbers less than 255 are divided by.; domain form comparison Complex, but they have something in common: they all have the top-level domain name .com.

Define legal characters:

Use lexical analysis to extract domain names and IPs

Top-level domain name list:

Use lexical analysis to extract domain names and IPs

Domain name form extraction: such as

www.baidu.com.

Use lexical analysis to extract domain names and IPs

Use lexical analysis to extract domain names and IPs

IP format extraction: such as 192.168.1.1.

Use lexical analysis to extract domain names and IPs

while (i < len(z) and z[i].isdigit()):
                i = i + 1
                ip_v1 = True
                reti = i            if i < len(z) and z[i] == &#39;.&#39;:
                i = i + 1
                reti = i            else:
                tokenType = TK_OTHER
                reti = 1while (i < len(z) and z[i].isdigit()):
                i = i + 1
                ip_v2 = True
            if i < len(z) and z[i] == &#39;.&#39;:
                i = i + 1
            else:                if tokenType != TK_DOMAIN:
                    tokenType = TK_OTHER
                    reti = 1while (i < len(z) and z[i].isdigit()):
                i = i + 1
                ip_v3 = True
            if i < len(z) and z[i] == &#39;.&#39;:
                i = i + 1
            else:                if tokenType != TK_DOMAIN:
                    tokenType = TK_OTHER
                    reti = 1while (i < len(z) and z[i].isdigit()):
                i = i + 1
                ip_v4 = True

            if i < len(z) and z[i] == &#39;:&#39;:
                i = i + 1
            while (i < len(z) and z[i].isdigit()):
                i = i + 1

            if ip_v1 and ip_v2 and ip_v3 and ip_v4:                
                self.urls.append(z[0:i])                
                return reti, tokenType            
            else:                
                if tokenType != TK_DOMAIN:
                    tokenType = TK_OTHER
                    reti = 1

Mixed form extraction: such as 1234.com.

Scan the first half of 1234, which conforms to the characteristics of the IP form, but it is found that the code will report an exception, so the IP processing code segment needs to be added to determine whether the suffix is ​​a top-level domain name:

Use lexical analysis to extract domain names and IPs

Result test

Test data:

Use lexical analysis to extract domain names and IPs

Running result:

Use lexical analysis to extract domain names and IPs

This is just a preliminary version, please correct me if there are any bugs.

Conclusion

In the past, I only focused on writing code with my head down, ignoring the thinking and summary afterwards. Now I’m trying to change it, and while working, I’m refining and summarizing it. When I encounter something that feels good, I try to write it as a tool and open source it to share with everyone.

Code Portal:

https://github.com/skskevin/UrlDetect/blob/master/tool/domainExtract/domainExtract.py

Recommended related article tutorials:

Web server security

The above is the detailed content of Use lexical analysis to extract domain names and IPs. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:FreeBuf.COM. If there is any infringement, please contact admin@php.cn delete

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor