最近得到一个接近12亿的全球ns节点的数据,本来想用来做一个全国通过dns反查域名然后进行全国范围的网站收集和扫描的,后来发现网站的数量不是很准确,加上一个人的精力和财力实在难以完成这样一个庞大的任务,就没有做下去,只留下了这个搭建的笔记。
文本格式,简单的文本搜索,速度太慢,一次搜索接近花掉5-10分钟时间,决定将其倒入数据库进行一次优化,速度应该能提升不到,电脑上只有AMP的环境,那么就决定将其倒入到mysql中,
一开始使用Navicat进行倒入,刚好数据的格式是 ip,ns 这样的格式,倒入了接近5个小时发现还没有倒入到百分之一,这可是纯文本格式化的时候大小为54G的数据文件啊!
后来发现用mysql自带的load data local infile只话了30分钟左右,第一次导入的时候忘记新建键了,只好重新导入一次
mysql> load data local infile 'E:\\dns\\rite\\20141217-rdns.txt' into table dnsfields terminated by ',';Query OK, 1194674130 rows affected, 1700 warnings (29 min 26.65 sec)Records: 1194674130 Deleted: 0 Skipped: 0 Warnings: 1700
因为添加了一个id字段,所以导入速度明显下降,不过大概也只花了1个半小时左右的时间就完成了55G数据的导入。
接着是建立索引,因为我需要的模糊查询,所以在这里建立的是Full Text+Btree,差不多花了3天时间索引才建立完成,期间因为一不小心把mysql的执行窗口关闭了,以为就这么完蛋了,最后发现其实mysql还在后台默默的建立索引。
建立了索引之后发现查询速度也就比没有建立索引快那么一点,执行了一条
select * from ns where ns like '%weibo.com'
花掉了210秒的时间,还是太慢了。
然后就开始使用SPhinx来做索引提升速度,
从官方下载了64位的SPHINX MYSQL SUPPORT的包下载地址
接着配置配置文件,src里配置要mysql的账号密码
source src1{ sql_host = localhost sql_user = root sql_pass = root sql_db = ns sql_port = 3306 sql_query = \ SELECT id,ip,ns from ns //这里写上查询语句 sql_attr_uint = id
然后searchd里也需要配置一下,端口和日志,pid文件的路径配置好即可
searchd{ listen = 9312 listen = 9306:mysql41 log = E:/phpStudy/splinx/file/log.log query_log = E:/phpStudy/splinx/file/query.log pid_file = E:/phpStudy/splinx/file/searchd.pid
然后切换到sphinx的bin目录进行建立索引,执行
searchd test1 #test1是你source的名称
我大概建立了不到2个小时的时间就建立完成了,
然后切换到api目录下执行
E:\phpStudy\splinx\api>test.py asdDEPRECATED: Do not call this method or, even better, use SphinxQL instead of anAPIQuery 'asd ' retrieved 1000 of 209273 matches in 0.007 secQuery stats: 'asd' found 209291 times in 209273 documentsMatches:1. doc_id=20830, weight=12. doc_id=63547, weight=13. doc_id=96147, weight=14. doc_id=1717000, weight=15. doc_id=2213385, weight=16. doc_id=3916825, weight=17. doc_id=3981791, weight=18. doc_id=5489598, weight=19. doc_id=9348383, weight=110. doc_id=18194414, weight=111. doc_id=18194415, weight=112. doc_id=18195126, weight=113. doc_id=18195517, weight=114. doc_id=18195518, weight=115. doc_id=18195519, weight=116. doc_id=18195520, weight=117. doc_id=18195781, weight=118. doc_id=18195782, weight=119. doc_id=18200301, weight=120. doc_id=18200303, weight=1
进行了测试,发现速度真的很快,写了一个PHP脚本进行调用
<?phpinclude 'sphinxapi.php';$conn=mysql_connect('127.0.0.1','root','root');mysql_select_db('ns',$conn);$sphinx = new SphinxClient();$now=time();$sphinx->SetServer ( '127.0.0.1', 9312 );$result = $sphinx->query ('weibo.com', 'test1'); foreach($result['matches'] as $key => $val){ $sql="select * from ns where id='{$key}'"; $res=mysql_query($sql); $res=mysql_fetch_array($res); echo "{$res['ip']}:{$res['ns']}";}echo time()-$now;?>
基本实现了秒查!,最后输出的时间只花掉了0!
123.125.104.176:w-176.service.weibo.com123.125.104.178:w-178.service.weibo.com123.125.104.179:w-179.service.weibo.com123.125.104.207:w-207.service.weibo.com123.125.104.208:w-208.service.weibo.com123.125.104.209:w-209.service.weibo.com123.125.104.210:w-210.service.weibo.com202.106.169.235:staff.weibo.com210.242.10.56:weibo.com.tw218.30.114.174:w114-174.service.weibo.com219.142.118.228:staff.weibo.com60.28.2.221:w-221.hao.weibo.com60.28.2.222:w-222.hao.weibo.com60.28.2.250:w-222.hao.weibo.com61.135.152.194:sina152-194.staff.weibo.com61.135.152.212:sina152-212.staff.weibo.com65.111.180.3:pr1.cn-weibo.com160.34.0.155:srm-weibo.us2.cloud.oracle.com202.126.57.40:w1.weibo.vip.hk3.tvb.com202.126.57.41:w1.weibo.hk3.tvb.com0

Reasons for PHPSession failure include configuration errors, cookie issues, and session expiration. 1. Configuration error: Check and set the correct session.save_path. 2.Cookie problem: Make sure the cookie is set correctly. 3.Session expires: Adjust session.gc_maxlifetime value to extend session time.

Methods to debug session problems in PHP include: 1. Check whether the session is started correctly; 2. Verify the delivery of the session ID; 3. Check the storage and reading of session data; 4. Check the server configuration. By outputting session ID and data, viewing session file content, etc., you can effectively diagnose and solve session-related problems.

Multiple calls to session_start() will result in warning messages and possible data overwrites. 1) PHP will issue a warning, prompting that the session has been started. 2) It may cause unexpected overwriting of session data. 3) Use session_status() to check the session status to avoid repeated calls.

Configuring the session lifecycle in PHP can be achieved by setting session.gc_maxlifetime and session.cookie_lifetime. 1) session.gc_maxlifetime controls the survival time of server-side session data, 2) session.cookie_lifetime controls the life cycle of client cookies. When set to 0, the cookie expires when the browser is closed.

The main advantages of using database storage sessions include persistence, scalability, and security. 1. Persistence: Even if the server restarts, the session data can remain unchanged. 2. Scalability: Applicable to distributed systems, ensuring that session data is synchronized between multiple servers. 3. Security: The database provides encrypted storage to protect sensitive information.

Implementing custom session processing in PHP can be done by implementing the SessionHandlerInterface interface. The specific steps include: 1) Creating a class that implements SessionHandlerInterface, such as CustomSessionHandler; 2) Rewriting methods in the interface (such as open, close, read, write, destroy, gc) to define the life cycle and storage method of session data; 3) Register a custom session processor in a PHP script and start the session. This allows data to be stored in media such as MySQL and Redis to improve performance, security and scalability.

SessionID is a mechanism used in web applications to track user session status. 1. It is a randomly generated string used to maintain user's identity information during multiple interactions between the user and the server. 2. The server generates and sends it to the client through cookies or URL parameters to help identify and associate these requests in multiple requests of the user. 3. Generation usually uses random algorithms to ensure uniqueness and unpredictability. 4. In actual development, in-memory databases such as Redis can be used to store session data to improve performance and security.

Managing sessions in stateless environments such as APIs can be achieved by using JWT or cookies. 1. JWT is suitable for statelessness and scalability, but it is large in size when it comes to big data. 2.Cookies are more traditional and easy to implement, but they need to be configured with caution to ensure security.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Linux new version
SublimeText3 Linux latest version
