search
HomeBackend DevelopmentPHP TutorialWeird precision diff tracing_PHP tutorial

Weird precision diff tracking

1. Problems found in Query-diff test

Query-diff is a commonly used testing method on the retrieval end. The idea is to use A set of the same retrieval information respectively requests the baseline version and the test version of a system or module. Typically, there are only minor differences (program functionality/configuration, etc.) between the baseline version and the version under test. After sending the request, compare the search results returned by the two versions to verify whether the difference affects the final calculation result.

Weird precision diff tracing_PHP tutorial

The tested module A in this case is written in C, and the output core data is a single-precision floating point number, recorded as Q.

When performing the query-diff test after a certain upgrade of module A, it was found that there is a precision diff in the Q value, the proportion is about 1%, the maximum diff is in the decimal place, and this upgrade is expected to be diff-free.

2. In-depth investigation

Usually when diff occurs, you must first clarify the direction of the investigation. If you cannot see the reason at a glance, you need to use the elimination method to verify the suspects one by one, narrow the scope, and reduce the Unnecessary investment of energy. So two major investigation directions are listed: environment or program.

Look at the environment first:

l Carefully checked the configuration and vocabulary of the old and new environments at the environment site, and they were in line with expectations, excluding factors related to environment construction tools.

l Since this upgrade is forward compatible, the configuration and vocabulary of the old and new environments are unified, retested, and diff reproduced, eliminating configuration differences.

There seems to be no problem with the environment, let’s go back to the verification process:

l Since multiple sets of tests have been done, the verification results have not changed, ruling out the possibility of random strategy diff.

l Print the debug log and check the intermediate results of each step in the processing. There are no problems. Only diff appears in the last step of calculating the Q value. Thread dirty data, process-level cache dirty data and variable types are successively excluded. Conversion and other risk points.

l For complete confirmation, directly replace the programs in the old and new environments with the new versions and retest. If it is really caused by the program, there should be no diff. However, the diff reappears! Obviously there is no random diff? ! !

At this time, the bottleneck has been identified. The reasons for the environment and the program seem to be wrong.

Calm down and think again. The previous investigation explained the concept of environment as the configuration and vocabulary used. It was believed that if the two are the same, the environment is the same. This is one-sided. The meaning of environment should also include the compilation environment and running environment of the system and hardware. So we have a new verification idea:

l Both the old and new versions of the program are produced using the company's cloud compilation cluster, so there should be no problem. However, to avoid taking things for granted, we carefully checked the compilation parameters and re-run them on the same local machine. Compiled the old and new versions, confirmed the diff recurrence, and eliminated compilation factors;

l Copy the old and new environments to the same machine, repress the request, and the diff disappears! Confirmed to be a factor in the operating environment

The operating environment includes the operating system and hardware levels. Strike while the iron is hot and continue to investigate:

l Confirm that the operating systems of the two machines where the diff appears are consistent, both are centos 4.3, and are ruled out Operating system;

l The difference in hard disk and memory models is less likely to cause diff, so we will not verify it yet;

l The CPU version of the machine where the new environment is located is Xeon E5645, and the CPU version of the machine where the old environment is located Xeon E5-2620, suspecting that the CPU model is different, I found another machine with the same CPU as the old environment to deploy the new environment, retested, the diff disappeared, and the target was locked to the CPU.

Weird precision diff tracing_PHP tutorial

2. Revealing the truth

After analyzing the CPU, after simply excluding the number of cores, the maximum number of threads, and the first, second and third level caches, the instruction set differences in the CPU feature list caught my attention. .
Weird precision diff tracing_PHP tutorial

Supplementary knowledge 1: The role of the cpu instruction set

The instruction set is a hard program stored inside the CPU that guides and optimizes CPU operations. With these instruction sets, the CPU can run more efficiently. To explain how instruction sets are optimized, two technologies have to be mentioned: SISD (Single Instruction Single Data) and SIMD (Single Instruction Multiple Data).

Take the addition instruction as an example. After using the SISD CPU to decode the addition instruction, the execution unit first accesses the memory to obtain the first operand, and then accesses the memory again to obtain the second operand. Only then can the summation operation be performed. In a CPU using SIMD, after the instruction is decoded, several execution units access the memory at the same time and obtain all the operands at once for operation. This feature makes SIMD particularly suitable for data-intensive operations.

The SSE series and AVX in the CPU instruction set are used for floating point operations, and AVX is one of the differences between the two CPUs, which is highly suspicious. Now we need to find evidence that the program is optimized using AVX.

However, there is no directly optimized code logic in the ASQ module. Although the program involving Q value calculation calls the static libA interface, the libA code does not use the instruction set. However, libA compiled static libB, so we traced all the way to the bottom layer and found that the fourth layer of compilation dependencies was libX provided by IDL. The code was confidential and could not be viewed.

I had to ask the relevant RD for advice. The RD informed that libX does use SSE instruction optimization and the math function library MKL provided by Intel, but does not use AVX.

Weird precision diff tracing_PHP tutorial

Is this another dead end? With the last bit of hope, I checked MKL’s official introduction on Intel and found an unexpected gain: AVX optimization was introduced in MKL! 【1】

Weird precision diff tracing_PHP tutorial


Now we have the last step to confirm that AVX is the culprit of the diff source. Soon, further evidence was found in Intel's products [2]:

Weird precision diff tracing_PHP tutorial

The FMA instructions in AVX2 involve floats in matrix multiplication, dot product, polynomial evaluation, etc. The efficiency and accuracy of point operations have been improved compared to previous instruction sets, because FMA can complete multiplication and accumulation operations at one time. I also found posts from relevant technical personnel in the official forum to support [3]:

Weird precision diff tracing_PHP tutorial

Supplementary knowledge two: floating point number storage methods in computers

float and double They all comply with IEEE specifications in terms of storage methods. Float complies with IEEE R32.24, and double complies with R64.53.

Whether it is single precision or double precision, storage is divided into three parts:

1. Sign bit (Sign): 0 represents positive, 1 represents negative

2. Exponent bit (Exponent): used to store exponent data in scientific notation, and uses shift storage

3. Mantissa part (Mantissa): mantissa part

where float The storage method is as shown in the following table:

Total length

Mantissa part

Exponent part

Sign bit

Single precision

32bit

0-22

23-30

31

Double precision

64bit

0-51

52-62

63

Extended Double

80bit

0-63

64-78

79

At the hardware level, the floating point operation logic of the CPU is implemented on the FPU (Floating Point Operation Unit) (whether SSE or AVX). The default calculation precision of the FPU is 80bit, while the float precision output by SSE and AVX Not that high (both are 32bit). If there are differences in calculation accuracy in the FPU (provided that they are both greater than 32bit), the calculated output is truncated to 32bit and then stored in the memory, which will inevitably cause a diff in the result due to approximate truncation.

Since Intel’s underlying algorithm is confidential, we can only guess that the FPU accuracy set when implementing the optimization functions of AVX and SSE is different, but the conclusion of the accuracy difference is certain.

The truth has emerged at this time: AVX’s FMA has 1 bit more accuracy than SSE. When there are iterative calculations, the difference will accumulate. The generation of the Q value undergoes complex matrix operations, and this tiny 1-bit difference is magnified to ten thousandths of a decimal point. At the same time, Intel ensures the compatibility of various machines. MKL code will be downgraded to SSE when running on a CPU that does not support AVX.

Supplementary knowledge three: Methods of using SSE and AVX to optimize programs

Still taking the addition instruction as an example, the introduction of relevant header files and preparation of compilation instructions will not be introduced here. Please refer to Related information.

Basic version:

Simple loop to accumulate and sum.

Weird precision diff tracing_PHP tutorial

SSE optimized version

SSE register 128bit, 16 bytes, can store 4 single-precision floating point numbers at a time, and can be stored in groups of 4 Register, use the built-in addition function to sum, then add the 4 group sums, and finally add the remaining items of the group to get the final result.

Weird precision diff tracing_PHP tutorial

AVX optimized version

AVX optimization method is similar to SSE, but the AVX register uses 256bit, 32 bytes, and can store 8 single-precision floating point numbers. Each group of 8 floats needs to be stored in the register.

Weird precision diff tracing_PHP tutorial

Now randomly generate the input array and write a simple test case to verify the effect of optimization. The following is a performance comparison of the three algorithms. The unit is cumulative per second. The number of floats. As a result, SSE efficiency is increased to 4 times that of the regular version, while AVX is 8 times higher! 【4】

Weird precision diff tracing_PHP tutorial

2. Summary and Enlightenment

Problem Summary:

l During the Query-diff compatibility test, it was found that module A is new or old There is a diff in the Q value calculated by the version;

l After investigation, it is determined that the accuracy diff comes from the floating point instruction set difference (AVX/SSE) supported by the program's running environment CPU

l In this case The proportion and absolute value of diff are both small. Although it currently does not affect online services, if the algorithm is further complicated and diff accumulates to the percentile, it will cause the strategy to fail.

l If the floating-point number operations of other modules use instruction set optimization, you also need to check whether the same problem exists.

Solution:

l When allocating test resources, ensure that the CPU of the machine where the new and old environments are located is consistent;

l Add an environment check mechanism before executing query-diff, and confirm again that the hardware is intact Difference;

l When deploying services online, you also need to make sure that the machine supports the AVX instruction set to achieve optimal performance and accuracy;

l Check whether other modules have similar use of instruction set optimization to avoid risks in advance.

Inspiration and suggestions:

l Floating-point operation-intensive programs can consider using instruction set functions such as SSE/AVX to optimize performance, which can usually significantly improve operating efficiency (SSE: 4 times, AVX : 8 times);

l When using the instruction set, pay attention to controlling the number of iterations (that is, the output of the instruction set function is used as the input of the instruction set function again) to avoid accumulating precision diffs to a level that cannot be ignored;

l Query-diff testing can be applied to more compatibility testing scenarios, such as comparing the impact of underlying system and hardware differences on applications such as CPU, operating system, and basic libraries.

Software engineering is inseparable from hardware support. Differences in compilation and running environments may cause differences in service performance and final calculation results. Such issues require special attention at all stages of development, testing, and launch. It is important to be a programmer who combines software and hardware!

Reference materials:

【1】 https://software.intel.com/zh-cn/articles/whats-new-in-intel-mkl

【 2】 https://software.intel.com/zh-cn/articles/intel-xeon-processor-e7-88004800-v3-product-family-technical-overview

【3】 https:// software.intel.com/en-us/forums/topic/507004

【4】 http://www.cnblogs.com/zyl910/archive/2012/10/22/simdsumfloat.html

Baidu MTC is the industry's leading mobile application testing service platform, providing solutions to the cost, technology and efficiency issues faced by developers in mobile application testing. At the same time, industry-leading Baidu technology is shared, and the authors come from Baidu employees and industry leaders.

>>If you have any questions, please feel free to communicate with me

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/1088920.htmlTechArticle Weird precision diff tracing 1. Problems discovered by Query-diff test Query-diff is a commonly used test method on the retrieval end , the idea is to use the same set of retrieval information to request a system or...
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
如何在 iPhone 和 Android 上关闭蓝色警报如何在 iPhone 和 Android 上关闭蓝色警报Feb 29, 2024 pm 10:10 PM

根据美国司法部的解释,蓝色警报旨在提供关于可能对执法人员构成直接和紧急威胁的个人的重要信息。这种警报的目的是及时通知公众,并让他们了解与这些罪犯相关的潜在危险。通过这种主动的方式,蓝色警报有助于增强社区的安全意识,促使人们采取必要的预防措施以保护自己和周围的人。这种警报系统的建立旨在提高对潜在威胁的警觉性,并加强执法机构与公众之间的沟通,以共尽管这些紧急通知对我们社会至关重要,但有时可能会对日常生活造成干扰,尤其是在午夜或重要活动时收到通知时。为了确保安全,我们建议您保持这些通知功能开启,但如果

在Android中实现轮询的方法是什么?在Android中实现轮询的方法是什么?Sep 21, 2023 pm 08:33 PM

Android中的轮询是一项关键技术,它允许应用程序定期从服务器或数据源检索和更新信息。通过实施轮询,开发人员可以确保实时数据同步并向用户提供最新的内容。它涉及定期向服务器或数据源发送请求并获取最新信息。Android提供了定时器、线程、后台服务等多种机制来高效地完成轮询。这使开发人员能够设计与远程数据源保持同步的响应式动态应用程序。本文探讨了如何在Android中实现轮询。它涵盖了实现此功能所涉及的关键注意事项和步骤。轮询定期检查更新并从服务器或源检索数据的过程在Android中称为轮询。通过

如何在Android中实现按下返回键再次退出的功能?如何在Android中实现按下返回键再次退出的功能?Aug 30, 2023 am 08:05 AM

为了提升用户体验并防止数据或进度丢失,Android应用程序开发者必须避免意外退出。他们可以通过加入“再次按返回退出”功能来实现这一点,该功能要求用户在特定时间内连续按两次返回按钮才能退出应用程序。这种实现显著提升了用户参与度和满意度,确保他们不会意外丢失任何重要信息Thisguideexaminesthepracticalstepstoadd"PressBackAgaintoExit"capabilityinAndroid.Itpresentsasystematicguid

Android逆向中smali复杂类实例分析Android逆向中smali复杂类实例分析May 12, 2023 pm 04:22 PM

1.java复杂类如果有什么地方不懂,请看:JAVA总纲或者构造方法这里贴代码,很简单没有难度。2.smali代码我们要把java代码转为smali代码,可以参考java转smali我们还是分模块来看。2.1第一个模块——信息模块这个模块就是基本信息,说明了类名等,知道就好对分析帮助不大。2.2第二个模块——构造方法我们来一句一句解析,如果有之前解析重复的地方就不再重复了。但是会提供链接。.methodpublicconstructor(Ljava/lang/String;I)V这一句话分为.m

如何在2023年将 WhatsApp 从安卓迁移到 iPhone 15?如何在2023年将 WhatsApp 从安卓迁移到 iPhone 15?Sep 22, 2023 pm 02:37 PM

如何将WhatsApp聊天从Android转移到iPhone?你已经拿到了新的iPhone15,并且你正在从Android跳跃?如果是这种情况,您可能还对将WhatsApp从Android转移到iPhone感到好奇。但是,老实说,这有点棘手,因为Android和iPhone的操作系统不兼容。但不要失去希望。这不是什么不可能完成的任务。让我们在本文中讨论几种将WhatsApp从Android转移到iPhone15的方法。因此,坚持到最后以彻底学习解决方案。如何在不删除数据的情况下将WhatsApp

同样基于linux为什么安卓效率低同样基于linux为什么安卓效率低Mar 15, 2023 pm 07:16 PM

原因:1、安卓系统上设置了一个JAVA虚拟机来支持Java应用程序的运行,而这种虚拟机对硬件的消耗是非常大的;2、手机生产厂商对安卓系统的定制与开发,增加了安卓系统的负担,拖慢其运行速度影响其流畅性;3、应用软件太臃肿,同质化严重,在一定程度上拖慢安卓手机的运行速度。

Android中动态导出dex文件的方法是什么Android中动态导出dex文件的方法是什么May 30, 2023 pm 04:52 PM

1.启动ida端口监听1.1启动Android_server服务1.2端口转发1.3软件进入调试模式2.ida下断2.1attach附加进程2.2断三项2.3选择进程2.4打开Modules搜索artPS:小知识Android4.4版本之前系统函数在libdvm.soAndroid5.0之后系统函数在libart.so2.5打开Openmemory()函数在libart.so中搜索Openmemory函数并且跟进去。PS:小知识一般来说,系统dex都会在这个函数中进行加载,但是会出现一个问题,后

iOS 16.2 引入“自定义辅助功能模式”,为 iPhone 和 iPad 提供简化的体验iOS 16.2 引入“自定义辅助功能模式”,为 iPhone 和 iPad 提供简化的体验Apr 13, 2023 am 11:07 AM

苹果公司周二向开发人员发布了iOS 16.2 beta 2,因为该公司准备在 12 月向公众提供更新。正式地,它添加了新的 Freeform 协作应用程序和对 Home 应用程序的改进。在后台,9to5Mac发现 Apple 一直在开发一种新的“自定义辅助功能模式”,该模式将为 iPhone 和 iPad 提供“流线型”体验。自定义辅助功能模式这种代号为“Clarity”的新模式基本上用更精简的模式取代了 Springboard(这是 iOS 的主要界面)。该功能在当前测试版中仍对用户不可用,将

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),