search
HomeBackend DevelopmentPHP TutorialUse PHPdig to create your own Google [Graphic Tutorial]_PHP Tutorial
Use PHPdig to create your own Google [Graphic Tutorial]_PHP TutorialJul 21, 2016 pm 03:57 PM
googleoneWhatforeignGraphics and textbelongbuildTutorialyesflowusemy own

一、什么是PHPdig?

   PHPdig是国外非常流行的垂直搜索引擎产品(与其说是产品,不如说是一项区别于传统搜索引擎的搜索技术),采用PHP语言编写,利用了PHP程序运行的高效性,极大地提高了搜索反应速度,它可以像Google或者Baidu以及其它搜索引擎一样搜索互联网,搜索内容除了普通的网页外还包括txt, doc, xls, pdf等各式的文件,具有强大的内容搜索和文件解析功能。PHPdig同传统的搜索引擎一样,包含了以下三种最基本的技术:

   1.Spider技术

   2.网页结构化信息抽取技术或元数据采集技术

   3.分词、索引技术

   区别于传统搜索引擎,PHPdig适用于专业化更强、层次更深的个性化搜索引擎,利用它打造针对某一领域的垂直搜索引擎是最好的选择。

   二、如何获得这PHPdig?

   PHPdig是免费产品(需要保留版权),最新版本是 phpdig-1.8.9 为了避免Apache以及MYSQL的版本兼容性问题,建议采用较低级的版本,其网站地址是:http://www.phpdig.net ,下载地址是:http://www.phpdig.net/navigation.php?action=download 说明一下,我试用过phpdig-1.8.9版本,但出现了很多问题,改用PHPdig-1.8.8则问题较少。

   三、具体步骤

   1.获取产品

   访问http://www.phpdig.net/navigation.php?action=download下载PHPdig-1.8.8至桌面,解压缩至Apache服务器html目录,一般路径为:D:\usr\www\html\,(如果你没有安装Apache服务器请事先安装,推荐使用Mappm-Server v1.1.9 Final,Mappm-Server 采用傻瓜式安装,一次搞定,方便调试和运行 PHP/CGI MySQL 程序)。

   2.运行并配置PHPdig数据库

   打开浏览器输入http://localhost/phpdig/按回车键,页面列出PHPdig的所有文件及包含文件夹,找一找发现没有默认首页文件(default,index),单击search.php文件出现错误提示:Unable to connect to database : Check the connection script。提示无法完成数据库连接,原来我们还没有完成PHPdig的数据库配置。返回进入admin目录找到install.php文件,单击运行,乍一看,全英文界面(说明一下,PHPdig目前所有版本均不支持中文界面),没有关系,如果你有过汉化经验不妨自己动手将其汉化,这里提供一份我自己汉化的cn-language.php文档的下载(请将其拷贝至locales目录下)。另外你还需修改includes目录下的config.php文件(语言修改)和style.css文件(字体修改和样式修改)。

   进入install.php后系统要求我们输入PHPdig管理用户名和密码,默认情况下均为admin,进入后出现如下界面(汉化后):


(图1)

   所需提供的信息有:

   如果你是在本地测试,请输入默认情况下的服务器名称localhost(localhost是Mappm-Server下的默认务服务器名称,也就是mysql的默认服务器名称,Mappm-Server内置mysql数据库)数据库服务器端口默认为3126,可以不填,数据库sock协议默认为空,用户名默认为root(Mappm-Server默认用户名),密码是你在安装Mappm-Server时输入的用户密码,PHPdig数据库名称默认为phpdig,可任意修改,同时,你可以对数据库中的数据表加前缀,默认为空。

   如果你要上传到与Internet相连的web服务器请向服务器提供商索要mysql服务器的名称或者IP地址以及数据库服务器端口、sock协议、用户名、密码等,数据库名称以及数据表前缀的设置同上。

   至于右边的四个单选按钮,你可以视情况而定,初次使用(安装)选择默认的“建立数据库”

   确认上述信息无误后单击安装按钮,如果连接数据库不成功会提示“不能连接数据库”的错误信息,如果数据库连接成功则会直接跳入管理页面如下图:


(图2)

   3. 界面区域介绍

Area 1 is a text input area. The default text has three lines, all starting with http. At a glance, everyone knows that the website address of the website to be spidered is entered here (it is recommended to only spider one website at a time).

Area 2 is the spider option. The search depth refers to how many levels of directories the website has been spidered to. The number of links per page refers to the maximum number of linked web pages below that can be crawled for a certain web page. By default, they are all 0, which means that the entire site will be spidered.

Area 3 displays database status information, including websites that have been spidered, keywords, indexes, and site information that is being spidered.

Area 4 is a drop-down list box that lists the URLs of spidered sites. Select one of the sites and you can clear and update it in area 5.

Area 5 not only provides clearing and updating operations for the sites selected in Area 4, but also provides relevant statistical information entrances and spider control.

 4. Run spider for a specific site

If you are very interested in the content of Tianji Software channel, you can make a more professional search engine than Google to search for the content of Tianji Software. Your search engine will be more comprehensive and deeper than Google. Let's take the content of the spider Tianji software channel as an example to introduce how to spider a website.

1) Enter http://soft.yesky.com in Area 1 of Figure 2, and keep the search depth and number of links per page at the default of 0

 2) Click the spider button, the page jumps to the spider information page, and the program starts to automatically spider the content of the site http://soft.yesky.com.

Note: The process of the spider website is very slow. If the website has too much content, the process may last from a few hours to a day, but you don’t have to worry about the script running timeout because the system timeout is set to a maximum of 48 hours. . During this process, you can also interrupt the running of the spider program and restart the spider program to run the unfinished website. It should be noted that if you accidentally close the spider running page during this process, the system does not actually stop the spider and is still consuming system resources. You can reopen the spider page and click the Stop spider link to release system resources.


(Picture 3)

 5. Search using PHPdig

After a period of time, the result of running the spider program is to capture the information on the http://soft.yesky.com website into the server database, mainly the title information, keyword information and page address information of the other party's content. Wait, at this point, you can search by accessing search.php.


(Picture 4)

You can choose the number of search results to display, and you can choose fuzzy search or precise search. In addition, you can choose to search for a certain site. By default, all sites that have been spidered will be searched.


(Picture 5)

The picture above is the search results page for searching "QQ2006".

 6. Problems

Due to PHPdig’s language setting issues, system word segmentation issues, and character processing issues in the MYSQL database, there are still many uncertain factors in PHPdig’s search for Chinese vocabulary. These things need to be further solved and improved by us. We welcome your comments. Friends who are interested should go to the Taoba-PHPdig theme community to discuss this.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/317733.htmlTechArticle1. What is PHPdig? PHPdig is a very popular vertical search engine product abroad (not so much a product, but a search technology that is different from traditional search engines), using PHP language...
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
如何在Google Sheet中为图例添加标签如何在Google Sheet中为图例添加标签Feb 19, 2024 am 11:03 AM

本文将演示如何在GoogleSheet中为图例添加标签,这些标签侧重于单个事物,提供名称或标识。图例解释了事物的系统或组,为您提供相关的上下文信息。如何在GoogleSheet中为图例添加标签有时候,在使用图表时,我们想要让图表更易于理解。通过添加恰当的标签和图例,可以实现这一目的。接下来,我们将介绍如何在Google表格中为图例添加标签,让您的数据更加清晰明了。创建图表编辑图例标签的文本我们开始吧。1]创建图表要标记图例,首先,我们必须创建一个图表:首先,在GoogleSheets的列或行中输

Google Pixel 9 and Pixel 9 Pro rumoured to gain Creative Assistant AI upon releaseGoogle Pixel 9 and Pixel 9 Pro rumoured to gain Creative Assistant AI upon releaseJun 22, 2024 am 10:50 AM

Currently, four new Pixel smartphones are anticipated to land this autumn. To recap, the series is rumoured to feature thePixel 9 and Pixel 9 Pro at launch. However, the Pixel 9 Pro will be a rival to the iPhone 16 Pro rather than a Pixel 8 Pro (curr

优化谷歌浏览器下载速度的技巧与方法优化谷歌浏览器下载速度的技巧与方法Dec 27, 2023 pm 03:42 PM

在现代社会中,网络已经成为我们获取信息、分享资源和进行日常活动的主要方式。其中,文件下载是我们经常需要进行的操作之一,无论是从个人电脑到移动设备,还是从互联网服务器到本地存储设备。然而,快速稳定的文件下载可能会受到许多因素的影响,包括网络连接速度、服务器响应时间、浏览器性能等。今天,我们将重点讨论如何通过优化谷歌浏览器来提高文件下载速度。谷歌浏览器无法启动更新检查的解决方法1、打开谷歌浏览器,在地址栏输入【Chrome://flags】后按回车,进入到Chrome的实验功能中。 2、在搜索栏中搜

Google AI announces Gemini 1.5 Pro and Gemma 2 for developersGoogle AI announces Gemini 1.5 Pro and Gemma 2 for developersJul 01, 2024 am 07:22 AM

Google AI has started to provide developers with access to extended context windows and cost-saving features, starting with the Gemini 1.5 Pro large language model (LLM). Previously available through a waitlist, the full 2 million token context windo

顺手训了一个史上超大ViT?Google升级视觉语言模型PaLI:支持100+种语言顺手训了一个史上超大ViT?Google升级视觉语言模型PaLI:支持100+种语言Apr 12, 2023 am 09:31 AM

近几年自然语言处理的进展很大程度上都来自于大规模语言模型,每次发布的新模型都将参数量、训练数据量推向新高,同时也会对现有基准排行进行一次屠榜!比如今年4月,Google发布5400亿参数的语言模型PaLM(Pathways Language Model)在语言和推理类的一系列测评中成功超越人类,尤其是在few-shot小样本学习场景下的优异性能,也让PaLM被认为是下一代语言模型的发展方向。同理,视觉语言模型其实也是大力出奇迹,可以通过提升模型的规模来提升性能。当然了,如果只是多任务的视觉语言模

悄无声息,Google已禁止Colab上的Deepfake项目悄无声息,Google已禁止Colab上的Deepfake项目Apr 08, 2023 pm 07:11 PM

有消息显示,Google已于近日悄悄禁止了其在 Colaboratory(Colab)服务上的深度伪造(Deepfake)项目,这代表以Deepfake为目的大规模利用平台资源的时代或已画上句号。众所周知,Colab是一个在线计算资源平台,允许研究人员直接通过浏览器运行Python代码,同时使用包括GPU在内的免费计算资源来支持自己的项目。正由于GPU的多核特性,Colab是类似Deepfake模型机器学习项目或执行数据分析理想选择。经过一定训练,人们将Deepfake技术用于在视频片段中交换面

如何在 Ubuntu 上安装 Google 字体 22.04 LTS如何在 Ubuntu 上安装 Google 字体 22.04 LTSFeb 19, 2024 pm 11:18 PM

使用Google字体能够显著增强数字内容的视觉吸引力。透过精心挑选的字体,您可以为信息设定特定的氛围,提升文本的易读性,为观众带来更具吸引力的阅读体验。通过GoogleFonts,您可以轻松地探索各种字体风格,找到与您的设计理念完美契合的字体。在Ubuntu上安装GoogleFonts22.04LTSJammyJellyfish在开始安装之前,请务必确保您的Ubuntu22.04系统是最新的。保持系统更新不仅可确保您拥有最新功能和安全补丁,还有助于避免新软件安装时可能出现的兼容性问题。sudoa

Laravel开发:如何使用Laravel Socialite和Google实现第三方登录?Laravel开发:如何使用Laravel Socialite和Google实现第三方登录?Jun 14, 2023 am 09:30 AM

Laravel开发:如何使用LaravelSocialite和Google实现第三方登录?在现代的Web应用程序中,用户的登录和认证是必不可少的。传统的基于用户名和密码的身份验证方式已经无法满足其安全性和便利性的要求。第三方登录解决了这个问题,它允许用户使用他们在其他平台上已经创建的帐户即可登录您的应用程序。在这篇文章中,我们将介绍如何使用Laravel

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.