有一组非日常的英文词汇,我需要计算在英文文章中出现频次最多的。
于是我最初想到遍历数组,用 substr_count 依次计算每个词汇出现的次数,但这样就会造成对整篇文章多次重复的扫描。或者将文章也拆分成词汇,从中用数组函数计算交集数量,但依然觉得不理想。
各位有什么想法吗?这个应用其实也就是关键词提取。
回复讨论(解决方案)
拆成数组为何不好,英文入数组很方便啊,起码比中文简单多了
其实不太明白你的需求,纯粹统计 array_count_values 足够方便了
就是说你已经有了一个词库,现在需要在文章里检查词库词的出现次数
如果是的,那么可以使用 trie 算法(我发过的)
只需扫描文章一遍就可以了,当然要先构造词库
就是说你已经有了一个词库,现在需要在文章里检查词库词的出现次数
如果是的,那么可以使用 trie 算法(我发过的)
只需扫描文章一遍就可以了,当然要先构造词库
词库保存为什么格式比较好?mysql,json,xml,纯数组?
如果一篇文章有5kb,词库有1000个单词,那么把这1000个单词逐个foreach,匹配这篇文章,
mysql_query,
json_decode()
simplexml_load_file()
数组
哪个效率更高,更节省资源(CPU,RAM)?
5kb不太可能有1000个单词,全部都是冠词?
即使1000个,量也不算很大,去除重复应该就少很多了,一次数组交集就够了
我的思路是文章拆分为单词数组,array_count_values 就起到统计和去除重复两个功能
然后提取次数一定的部分(次数太少没匹配意义吧?),那剩下就很少了,再与现存词库求交集就足够了
虽然楼主是专指英文词汇,但是你的算法若只限于英文词汇的话,那就没有什么意义了
5kb不太可能有1000个单词,全部都是冠词?
即使1000个,量也不算很大,去除重复应该就少很多了,一次数组交集就够了
我的思路是文章拆分为单词数组,array_count_values 就起到统计和去除重复两个功能
然后提取次数一定的部分(次数太少没匹配意义吧?),那剩下就很少了,再与现存词库求交集就足够了
你说的也有道理
只是我觉得简单问题简单处理,他既然说英文,就按这样去想,没必要太花时间考虑算法
如果他说混杂多语种,估计我也只是旁观不会回这贴了,呵呵
虽然楼主是专指英文词汇,但是你的算法若只限于英文词汇的话,那就没有什么意义了
引用 4 楼 snmr_com 的回复:5kb不太可能有1000个单词,全部都是冠词?
即使1000个,量也不算很大,去除重复应该就少很多了,一次数组交集就够了
我的思路是文章拆分为单词数组,array_count_values 就起到统计和去除重复两个功能
然后提取次数……
版本给的前缀树怎么也没看懂,暂时先选择了多次扫描文章来实现
一个简单的例子
include 'TTrie.php';class wordkey extends TTrie { function b() { $t = array_pop($this->buffer); $this->buffer[] = "<b>$t</b>"; }}$p = new wordkey;$p->set('秦始皇', 'b');$p->set('洛阳', 'b');$t = $p->match('秦始皇东巡洛阳');echo join('', $t);秦始皇东巡洛阳

TooptimizePHPcodeforreducedmemoryusageandexecutiontime,followthesesteps:1)Usereferencesinsteadofcopyinglargedatastructurestoreducememoryconsumption.2)LeveragePHP'sbuilt-infunctionslikearray_mapforfasterexecution.3)Implementcachingmechanisms,suchasAPC

PHPisusedforsendingemailsduetoitsintegrationwithservermailservicesandexternalSMTPproviders,automatingnotificationsandmarketingcampaigns.1)SetupyourPHPenvironmentwithawebserverandPHP,ensuringthemailfunctionisenabled.2)UseabasicscriptwithPHP'smailfunct

The best way to send emails is to use the PHPMailer library. 1) Using the mail() function is simple but unreliable, which may cause emails to enter spam or cannot be delivered. 2) PHPMailer provides better control and reliability, and supports HTML mail, attachments and SMTP authentication. 3) Make sure SMTP settings are configured correctly and encryption (such as STARTTLS or SSL/TLS) is used to enhance security. 4) For large amounts of emails, consider using a mail queue system to optimize performance.

CustomheadersandadvancedfeaturesinPHPemailenhancefunctionalityandreliability.1)Customheadersaddmetadatafortrackingandcategorization.2)HTMLemailsallowformattingandinteractivity.3)AttachmentscanbesentusinglibrarieslikePHPMailer.4)SMTPauthenticationimpr

Sending mail using PHP and SMTP can be achieved through the PHPMailer library. 1) Install and configure PHPMailer, 2) Set SMTP server details, 3) Define the email content, 4) Send emails and handle errors. Use this method to ensure the reliability and security of emails.

ThebestapproachforsendingemailsinPHPisusingthePHPMailerlibraryduetoitsreliability,featurerichness,andeaseofuse.PHPMailersupportsSMTP,providesdetailederrorhandling,allowssendingHTMLandplaintextemails,supportsattachments,andenhancessecurity.Foroptimalu

The reason for using Dependency Injection (DI) is that it promotes loose coupling, testability, and maintainability of the code. 1) Use constructor to inject dependencies, 2) Avoid using service locators, 3) Use dependency injection containers to manage dependencies, 4) Improve testability through injecting dependencies, 5) Avoid over-injection dependencies, 6) Consider the impact of DI on performance.

PHPperformancetuningiscrucialbecauseitenhancesspeedandefficiency,whicharevitalforwebapplications.1)CachingwithAPCureducesdatabaseloadandimprovesresponsetimes.2)Optimizingdatabasequeriesbyselectingnecessarycolumnsandusingindexingspeedsupdataretrieval.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

WebStorm Mac version
Useful JavaScript development tools
