


Process:
1. Obtain the user database of csdn and import it locally
When trying to open editplus, it prompts that there is insufficient memory and cannot find a way. My colleague checked it under Linux and the basic format is as follows:
Username # Password # Email
Username# Password# Email
Corresponding data structure:
CREATE TABLE IF NOT EXISTS `csdn_userdb` (
`id` int(10) NOT NULL auto_increment,
`username` varchar(50) character set gbk NOT NULL,
`password` varchar(50) character set gbk NOT NULL,
`email` varchar(50) character set gbk NOT NULL,
PRIMARY KEY (`id`),
KEY `username` (`username`),
KEY `email` (`email`)
) ENGINE=MyISAM DEFAULT CHARSET=gbk AUTO_INCREMENT=1;
I have always suspected that fopen opens files into the cache, but practice has proved that it is very fast, so there should be no Write to the cache, the following is the code for importing data
$ link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle = fopen("C:UserszhudongDesktopwww.csdn.net.sql ", "r");
while (!feof($handle)){
$i++;
$buffer = fgets($handle);
list($u,$p,$ e) = explode(" # ",$buffer);
mysql_query("INSERT INTO csdn_userdb(username,password,email) VALUES ('$u','$p','$e')",$link );
if ($i%1000 == 0) echo $i."n";
}
fclose($handle);
?>
The efficiency of the above code is very poor, so the modified code is as follows:
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle = fopen("C:UserszhudongDesktopwww.csdn .net.sql", "r");
$perpage = 50;
while (!feof($handle)){
$i++;
$buffer = fgets($handle);
list($u,$p,$e) = explode(" # ",$buffer);
$insertValue[] = "('$u','$p','$e') ";
if ($i% $perpage == 0){ $perpage == 0){
$insrtValueString = implode(',',$insertValue);
mysql_query("INSERT INTO csdn_userdb( username,password,email) VALUES $instrtValueString",$link);
echo $i."n";
$insertValue = array();
}
}
fclose($ handle);
In order to find out what factors affect the efficiency of imported data, we conducted tests based on different settings
Total CSDN user data 6428600
When $perpage=500 ;Data after import: 5,902,000; data loss 526600 Loss rate: 8%; Data table engine: MyISAM; Index: Yes; Total time taken: 15 minutes
When $perpage=200, total data after import: 6,210,200; data loss :218400; Loss rate: 3.3%; Data table engine: MYISAM; Index: Yes; Total time taken: 30 minutes
When $perpage=200, the total number of data after import: 6,210,200; Data loss: 218400; Loss rate: 3.3 %; Data table engine: INNODB; Index: Yes; Total time taken: 65 minutes
When $perpage=200, the total number of data after import: 6,210,200; Data loss: 218400; Loss rate: 3.3%; Data table engine: MYISAM ; Index: None; Total time taken: 14 minutes (rebuild the index separately after the data is imported)
When $perpage=50, the total number of data after import: 6,371,200; data loss: 57400, loss rate: 0.8%; data table engine : MYISAM; Index: None: Total time taken: 20 minutes
Based on the above situation, the summary is as follows:
1. Importing data first and then adding index is twice as efficient as adding index first and then importing data
2. The efficiency of InnoDB in single-process data insertion is much lower than that of MYISAM
3. When perpage=50, the data loss rate is less than 1%
Because there will be a timeout problem when executing through the browser, and the efficiency is low, it is run through the command line. I encountered a little trouble during the process and was delayed. It took less time
At first I executed the following code:
php.exe E:usrwwwimportcsdndb.php
But it kept reporting an error: call to undefined function mysql_connect
After much trouble, I found that php.ini was not loaded
Correct code For:
php.exe -c E:/usr/local/apache2/php.ini importcsdndb.php
2. Import the user data that needs to be matched to the local
command line Enter msyql (you don’t know how to use Baidu)
Then execute: mysql>source C:/Users/zhudong/Desktop/userdb.sql
3. Compare and filter users
After the comparison program is written, remember to add it in the command Run under the line:
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle_username = fopen("E:/records_username.txt","a");
//$handle_email = fopen("E:/records_email.txt","a");
$username_num = $email_num = $uid = 0;
while ($uid$nextuid=$uid+10000;
$query = mysql_query("SELECT * FROM pw_members WHERE uid>'$uid' AND uidwhile ($rt = mysql_fetch_array($query,MYSQL_ASSOC)) {
$username = $rt['username'];
$email = $rt['email'];
$query2 = mysql_query("SELECT * FROM scdn_userdb WHERE username='$username' OR email='$email'");
while ($rt2 = mysql_fetch_array($query2,MYSQL_ASSOC)) {
if ($rt['password'] = md5($rt2['password'])) {
if ($rt2['username'] == $username) {
$username_num++;
fwrite($handle_username,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt2['username'].'|'.$rt2['password'].'|'.$rt2['email']."rn");
echo 'username_num='.$username_num."rn";
continue;
}
/*
if ($rt2['email'] == $email) {
$email_num++;
fwrite($handle_email,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt2['username'].'|'.$rt2['password'].'|'.$rt2['email']."rn");
echo 'email_num='.$email_num."rn";
}
*/
}
}
mysql_free_result($query2);
}
$uid = $nextuid;
}
?>
您看到的以上的代码是非常蹩脚的,因为其效率特别低 ,几百万的数据,要跑10多个小时,怎么能忘记连表查询这么基本的东西呢,以下为修正后的方法
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle_username = fopen("E:/records_username.txt","a");
while($uid$nextuid= $uid+10000;
$query = mysql_query("SELECT m.uid,m.username,m.password,m.email,u.password as csdn_password,u.email as csdn_email FROM own_members m LEFT JOIN csdn_userdb u USING(username) WHERE m.uid>'$uid' AND m.uidwhile ($rt = mysql_fetch_array($query,MYSQL_ASSOC)) {
if ($rt['password'] == md5($rt['csdn_password'])) {
$username_num++;
fwrite($handle_username,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt['username'].'|'.$rt['csdn_password'].'|'.$rt['csdn_email']."rn");
echo 'username_num='.$username_num."rn";
}
}
$uid = $nextuid;
echo 'uid='.$uid;
}
?>
总对比时间25分钟,相比较之前10多个小时的执行真是大有提升
总重名用户:34175
占总会员比例:1.7%
1.7%的重名用户还是挺严重的,希望本文对各位站长对比出本站的用户有所帮助

“本地用户和组”实用程序内置于“计算机管理”中,可以从控制台访问,也可以独立访问。但是,一些用户发现Windows11中缺少本地用户和组。对于可以访问它的一些人来说,该消息显示,此管理单元可能不适用于此版本的Windows10。若要管理此计算机的用户帐户,请使用“控制面板”中的“用户帐户”工具。该问题已在上一次迭代Windows10中报告,并且通常是由于用户端的问题或疏忽引起的。为什么Windows11中缺少本地用户和组?您运行的是Windows家庭版,本地用户和组在专业版及更高版本上可用。活动

由于权限,并不总是可以访问某些文件夹,在今天的指南中,我们将向您展示如何在Windows11上的旧硬盘驱动器上访问用户文件夹。此过程很简单,但可能需要一段时间,有时甚至数小时,具体取决于驱动器的大小,因此请格外耐心并严格按照本指南中的说明进行操作。为什么我无法访问旧硬盘上的用户文件夹?用户文件夹的所有权属于另一台电脑,因此您无法对其进行修改。除了所有权之外,您对该文件夹没有任何权限。如何打开旧硬盘上的用户文件?1.取得文件夹的所有权并更改权限找到旧的用户目录,右键单击它,然后选择属性。导航到“安

Microsoft开始推出作为Windows503145511H22或更高版本的可选更新向公众KB2。这是第一个默认启用Windows11Moment4功能的更新,包括受支持区域中的WindowsCopilot、对“开始”菜单中项目的预览支持、任务栏的取消分组等。此外,它还修复了Windows11的几个错误,包括导致内存泄漏的潜在性能问题。但具有讽刺意味的是,2023年<>月的可选更新对于尝试安装更新的用户甚至已经安装更新的用户来说都是一场灾难。许多用户不会安装此Wi

Go语言正则表达式实践指南:如何匹配十六进制颜色代码引言:正则表达式是一种强大且灵活的工具,用于字符串的模式匹配和查找。在Go语言中,我们可以使用内置的正则表达式包regexp来实现这些操作。本文将介绍如何使用正则表达式在Go语言中匹配十六进制颜色代码。导入正则表达式包首先,我们需要导入Go语言的正则表达式包regexp。可以在代码的开头添加如下导入语句:i

微软邀请Canary和Dev频道的WindowsInsider项目成员,测试和体验新版画图(Paint)应用,最新版本号为11.2306.30.0。本次版本更新最值得关注的新功能是一键抠图功能,用户只需要点击一下,就能自动消除背景,凸显画面主体,便于用户后续操作。整个步骤非常简单,用户在新版画图应用中导入图片,然后点击工具栏上“移除背景”(removebackground)按钮,就可以删除图片中的背景,用户也可以使用矩形来选择要消除背景的区域。

PHP正则表达式实战:匹配字母和数字正则表达式是一种用于匹配字符串的工具,可以方便地实现字符串的搜索、替换、分割等操作。在PHP开发中,正则表达式也是一个非常有用的工具。本文将介绍如何使用PHP正则表达式来匹配字母和数字。匹配单个字符要匹配单个字符,可以使用正则表达式中的字符类。字符类用方括号[]表示,其中的字符表示可以被匹配的字符,可以使用连字符-表示范围

PHP正则表达式:精确匹配与排除模糊包含正则表达式是一种强大的文本匹配工具,能够帮助程序员在处理文本时进行高效的搜索、替换和筛选。在PHP中,正则表达式也被广泛应用于字符串处理和数据匹配中。本文将重点介绍在PHP中如何进行精确匹配和排除模糊包含的操作,同时结合具体的代码示例进行说明。精确匹配精确匹配意味着只匹配符合完全条件的字符串,不匹配任何变种或包含额外字

TranslucentTB是寻求时尚简约桌面外观的Windows11爱好者广泛使用的工具,遇到了障碍。自从发布以来Windows11内部版本22621.1344(22H2)28年2023月日,TranslucentTB对大多数用户不起作用。此错误使用户努力应对其任务栏的有限自定义选项。用户在寻求克服这一挫折的解决方案时,挫败感显而易见。在最近的Windows11更新之后,TranslucentTB无法正常工作的问题已在多个在线平台上广泛报道,包括论坛和社交媒体。用户一直在分享他们的经验,拼命寻找


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Linux new version
SublimeText3 Linux latest version

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Atom editor mac version download
The most popular open source editor

SublimeText3 Mac version
God-level code editing software (SublimeText3)
