关于 PHP 中巨型数据对象的内存开销有关问题的研究 -PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

关于 PHP 中巨型数据对象的内存开销有关问题的研究

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 13, 2016 pm 12:32 PM

arraymemcachenbspphp

关于 PHP 中巨型数据对象的内存开销问题的研究
首先请大家不要误会，不是我要发表对这个问题的什么研究成果，而是想请大家帮我一起来分析研究一下这个问题 :)

描述一下简化了的问题背景：在一个用 PHP 实现的网站中，所有的程序文件都在一开始包含了一个公共的文件 common.php。现在由于业务需要，在 common.php 中定义了一个“巨型”的数据对象（一个含有约 500k 个 int 值的 array 对象），比如 $huge_array = array(1,2,3,...,500000)，并且在整个系统中对 $huge_array 只有“读”访问。假设系统需要持续稳定运行在 100 个并发请求的状态下。

问题 1：$huge_array 在 common.php 源文件中大概要占用 10M（这个姑且不算是问题），加载到内存中也许要占用 4M（只是估算一下，至于准确测量其尺寸，不是本文要讨论的要点）。问题在于，PHP 本身每处理一个 HTTP request，都是要启用一个独立的进程（或者是线程），那它是不是都要重新在内存中加载这个约 4M 的内存块呢？如果不能共享内存的话，那可能就要同时占用近 400M 的物理内存，无论在内存占用量还是内存访问效率方面，都是很不利的。

问题 2：当启用了某种缓存机制（比如 APC、XCache 等）的时候，我们知道这类缓存机制都具有对 opcode 进行缓存的能力，但似乎也只是减少了脚本编译环节的重复性工作，对于运行时的变量加载，是否也能起到共享内存的作用呢？希望它能起到一定的作用，毕竟那一大堆 int 值肯定也是作为 opcode 的一部分而存在的。

问题 3：如果上述借用 XCache 对 opcode 的缓存不能达到目的的话，那我直接操作 XCache 是否会有效呢？就是说，不把 $huge_array 写在 common.php 里，而是写到缓存引擎里。

本文意在通过分析研究，确定这种用法是否会导致内存使用瓶颈，如果有问题的话如何优化。当然，想尽办法减小这个巨型数据对象本身的尺寸是首先最值得考虑的，必须的，但那个属于数据结构和算法方面的话题，就不在本文中讨论了。欢迎大家发表一下自己对这个问题的分析观点，如果能给自己的观点设计一些可操作的测试验证方案就更好了，如果你没时间写代码，只要方案看上去合理，我愿意来写测试代码 ^_^

――――――――――――――――――――――――――――――――
基于CSDN论坛提供的插件扩展功能，自己做了个签名档工具，分享给大家，欢迎技术交流 :)

分享到：
------解决方案--------------------
既然是500多k的数据，是不是考虑一下索引，然后分段处理呢，干嘛要全部都加载进来？？？
------解决方案--------------------
1、是的，都要重新在内存中加载
2、既然是所有的程序文件都包含那么应该被缓存，因为他也是 opcode 的一部分
3、那就是一件很无聊的事情了，还不如优化数据结构和算法
------解决方案--------------------
1、老大说了
2、是opcode的一部分，当然要优化
3、如果编译的时间大于按索引获取数据的时间，那么为什么要编译呢？你编译也仅仅是为了按照key去获取相应的数据，当你有办法解决按key获取数据，为什么一定要用php的array数据的结构呢？这个问题有点像，是从一个txt文件中获取数据，还是从一个包含的php来获取数据？

例如 1.txt 的内容如下
1111
2222
3333
4444
……

1.php 的内容如下
$a = array(
    1111,
    2222,
    3333,
    4444,
    ……
);

如果是1.php的每个进程都要初始化$a，然后获取$a[3]
如果是1.txt的话，那么只要优化下怎么快速获取第4(3+1)行的内容就可以了,所有的进程都可以共享，

但是这里存在一个问题就是：并发得问题，怎么合理优化并发的问题？

没有一个通用的方式来解决这样的问题，要么是空间换时间（400M没有并发等待时间），要么是时间换空间（4M，但是要获取数据时有等待时间）

------解决方案--------------------
我觉得共享内存处理大数据是有必要的，否则每次http进程都会因为你这个大数据而增加内存压力。Xcache和apc都是opcode缓存吧？也就是即使php每次不用词法解析，直接到opcode执行阶段，依然要为大数组分配内存，重点是一个http请求就分配一次内存，倘若你把这个大数组存到memcache去，那就是多个http请求共用一块内存，既然无写操作，100个读并发memcache还是没啥问题的吧，即使涉及到读写并发，memcached已经支持乐观锁了。
------解决方案--------------------
首先,把这个玩意儿放memcache吧

需要的地方去取,就这样,就够了
------解决方案--------------------

引用:

那个包含大量 int 值的 array 初始值，作为常量数据，肯定是 opcode 的一部分。只是我还不太确定，当在程序中用它给一个变量赋值的时候，会不会先要分配一块内存来装载它。如果不分配的话，那就是要让这个变量直接指向存储 opcode 的那段地址，那么问题又来了，存储 opcode 的块内存是否是进程间共享的呢？

这个就不要想了，opcode仅仅是将语法编译的步骤给省略了，例如：
$a = 1 + 1;
替换成opcode以后就是
ZEND_ADD ~0 1 1
只不过~0代替了$a, 如果真正运行的话~0还是要单独分配空间的，否则，php怎么去获取数据呢？如果另外一个进程修改这个值，是否还要影响到另外一个进程呢？是不是就不能保证每个进程的逻辑正确性呢？因为每个进程都是相同的代码，我不知道我的$a还是否是刚初始化的状态？

所以opcode的缓存不会缓存数据的，仅仅是缓存另外一种代码（脚本）格式！具体的数据还是要在运行的时候重新分配内存的！

关于opcode更多可以点我

另外我说的那个例子（当然1.txt可以用任何工具替代，memcache、sql等），仅仅是你可以开发接口，例如读一个分配内存空间，如果要下一个内容还可以使用这点内存空间存取下一个内容，这样才能保证每个进程的内存空间的减少

伪代码示例：

while($data = get()) {
// dosamething
}

来起到
foreach($datas as $data){
}

通过加大获取一条数据的时间，来减小内存的消耗

------解决方案--------------------
很想加入这个话题，可惜个人水平有限，缓存什么的都不太懂，说不上什么

我从另一个角度去说说个人看法――

web编程是一定要考虑并发的，包括服务器连接，而不是单纯为了解决问题
这也是很多搞桌面开发的人没搞懂的地方，从而引发“php很差劲”之类的说法

php的优势就是快速解决简单问题，完成并扔给客户端，结束一个连接，腾空给下一个请求
复杂计算其实应该用更核心的语言配合更高质量的服务器去完成

一个复杂的计算消耗的资源对核心语言和高性能服务器来说都是值得的，因为主要的目的就在于计算得出结果
例如nasa的服务器和上面实用的程序，都要精益求精，可能计算错小数点亿万位都会造成“火星撞地球”的结果

但对信息传播网络来说损失却是巨大的，因为每消耗多一秒钟，你的信息就可能少传给一个甚至几十个人
消耗内存和消耗时间都是一样，内存使用太大，也会造成tcp连接的不稳定

所以，大的多的参数可能对程序的灵活度很好，适应更多的使用者，但对一般网络访问却是失败的选择
应该把这些参数切割，把用户群分类，每用户可选择更少的参数达到目的就够了
例如一个面向全球的网站，不是把所有语言都放在一起做参数程序自适应，而是由用户选择语言，只针对这种语言写代码

如果一些复杂的计算，应该选用其他语言或控件完成某些既定过程，缩减web程序的响应过程
#6说的是其中一种情况，当然这样未必合适你的需要，但他的提法是符合传播网站的业务逻辑的

说了那么多，估计也没能解决你的需求，能看完我都感到荣幸……
------解决方案--------------------
4M,咔咔,不算太大

把它简化成一个值, 然后测测你目前的运行的php的内存峰值比比看....

------解决方案--------------------
看看这段代码对你有没有帮助，无需载入文件到内存

<?php<br />
/*<br />
111<br />
222<br />
333<br />
444<br />
555<br />
<br />
*/<br />
<br />
<br />
$file = new SplFileObject(__FILE__);<br />
$file->seek(3);//这里是行数，从0开始<br />
echo $file->current().'<br>';<br />
?>

测试一个50M左右的xml，需时不到0.01秒――行数越大，需时越多

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

The Continued Use of PHP: Reasons for Its EnduranceApr 19, 2025 am 12:23 AM

What’s still popular is the ease of use, flexibility and a strong ecosystem. 1) Ease of use and simple syntax make it the first choice for beginners. 2) Closely integrated with web development, excellent interaction with HTTP requests and database. 3) The huge ecosystem provides a wealth of tools and libraries. 4) Active community and open source nature adapts them to new needs and technology trends.

PHP and Python: Exploring Their Similarities and DifferencesApr 19, 2025 am 12:21 AM

PHP and Python are both high-level programming languages that are widely used in web development, data processing and automation tasks. 1.PHP is often used to build dynamic websites and content management systems, while Python is often used to build web frameworks and data science. 2.PHP uses echo to output content, Python uses print. 3. Both support object-oriented programming, but the syntax and keywords are different. 4. PHP supports weak type conversion, while Python is more stringent. 5. PHP performance optimization includes using OPcache and asynchronous programming, while Python uses cProfile and asynchronous programming.

PHP and Python: Different Paradigms ExplainedApr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP and Python: A Deep Dive into Their HistoryApr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Choosing Between PHP and Python: A GuideApr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP and Frameworks: Modernizing the LanguageApr 18, 2025 am 12:14 AM

PHP remains important in the modernization process because it supports a large number of websites and applications and adapts to development needs through frameworks. 1.PHP7 improves performance and introduces new features. 2. Modern frameworks such as Laravel, Symfony and CodeIgniter simplify development and improve code quality. 3. Performance optimization and best practices further improve application efficiency.

PHP's Impact: Web Development and BeyondApr 18, 2025 am 12:10 AM

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip

How does PHP type hinting work, including scalar types, return types, union types, and nullable types?Apr 17, 2025 am 12:25 AM

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download

The most popular open source editor

Hot Topics

Where is the login entrance for gmail email?

7635

CakePHP Tutorial

1391

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

148