Preface: At work, when it comes to addition, subtraction, multiplication and division with decimal points, they will think of using BigDecimal to solve it, but many people are confused as to why double or float lose precision. And how to solve BigDecimal? Without further ado, let’s get started.
1. What is a floating point number?
Floating point numbers are a data type used by computers to represent decimals, using scientific notation. In Java, double is a double precision, 64-bit, floating point number, and the default is 0.0d. float is single precision, 32 bits. Floating point number, the default is 0.0f;
Store in memory
float Sign bit (1bit) Exponent (8 bit) Mantissa (23 bit)
double Sign bit (1bit) Exponent (11 bit) Mantissa (52 bit)
The exponent of float in the memory is 8bit, because the exponent actually stores For the frameshift of the exponent, assuming that the true value of the exponent is e and the order code is E, then E=e (2^n-1 -1). Among them, 2^n-1 -1 is the exponential offset specified by the IEEE754 standard. According to this formula, we can get 2^8 -1=127. Therefore, the exponent range of float is -128 127, while the exponent range of double is -1024 1023. The negative exponent determines the non-zero number with the smallest absolute value that a floating-point number can express; while the positive exponent determines the number with the largest absolute value that a floating-point number can express, which also determines the value range of a floating-point number.
The range of float is -2^128 ~ 2^127, that is, -3.40E 38 ~ 3.40E 38;
The range of double is -2^1024 ~ 2^1023, also That is -1.79E 308 ~ 1.79E 308
2. Enter the scientific notation of distortion
Let’s talk about scientific notation first. Scientific notation is a method of simplifying counting. Use To approximately represent a very large or small number with a large number of digits, scientific notation has no advantage for values with a small number of digits, but for values with a large number of digits, the advantages of the counting method are very obvious. For example: the speed of light is 300000000 meters/second, and the world's population is approximately 6100000000. Large numbers like the speed of light and the world's population are inconvenient to read and write, so the speed of light can be written as 3*10^8, and the world's population can be written as 6.1*10^9. So the calculator uses scientific notation to indicate that the speed of light is 3E8, and the world's population is approximately 6.1E9.
When we were kids, we used to play with calculators and like to add or subtract like crazy. In the end, the calculator would display the picture below. This is the result displayed by scientific notation
The real value in the picture is -4.86*10^11=-486000000000. Decimal scientific notation requires that the integer part of the significant digit must be within the interval [1, 9].
3. Get into the precision of distortion
When computers process data, they involve data conversion and various complex operations, such as conversion of different units and different bases. (such as binary decimal) conversion, etc., many division operations cannot be divided, such as 10÷3=3.3333...infinite, and the accuracy is limited, 3.3333333x3 is not equal to 10, the decimal obtained after complex processing The data is not precise, and the higher the precision, the more accurate it is. The accuracy of float and double is determined by the number of digits in the mantissa. The integer part is always an implicit "1". Since it is unchanged, it cannot affect the accuracy. float: 2^23 = 8388608, a total of seven digits. Since the leftmost digit is omitted, it means that it can represent up to 8 digits: 28388608 = 16777216. There are 8 significant digits, but it is absolutely guaranteed to be 7 digits, that is, the precision of float is 7~8 significant digits; double: 2^52 = 4503599627370496, a total of 16 digits, similarly, the precision of double is 16~17 Bit.
When it reaches a certain value, it automatically starts using scientific notation and retains significant figures of relevant precision, so the result is an approximate number and the exponent is an integer. In the decimal system, some decimals cannot be fully expressed in binary. Therefore, it can only be represented by limited bits, so there may be errors during storage. To convert decimal decimals into binary, use the multiplication by 2 method to calculate. After removing the integer part, continue to multiply the remaining decimals by 2 until the decimal parts are all 0.
If you encounter the situation where
the output is 0.19999999999999998
double type 0.3-0.1. You need to convert 0.3 into binary in the operation
0.3 * 2 = 0.6 => .0 (.6), take 0 and leave 0.6
0.6 * 2 = 1.2 => .01 (. 2) Take 1 and leave 0.2
0.2 * 2 = 0.4 => .010 (.4) Take 0 and leave 0.4
0.4 * 2 = 0.8 => .0100 (.8) Take 0 and leave 0.8
0.8 * 2 = 1.6 => .01001 (.6) takes 1 and leaves 0.6
.............
3. Summary
After reading the above, it is probably clear why floating point numbers have precision problems. Simply put, the float and double types are mainly designed for scientific calculations and engineering calculations. They perform binary floating point operations, which are carefully designed to provide more accurate and fast near-sum calculations over a wide range of values. However, they do not provide completely accurate results and should not be used for precise results. Floating point numbers that reach a certain size will automatically use scientific notation. Such representation is only an approximation of the real number but not equal to the real number. Infinite loops or exceeding the length of the floating-point mantissa may also occur when converting decimal digits to binary.
4. So how do we use BigDecimal to solve it?
Look at the two outputs below
##Output results: 0.299999999999999888977697537484345957636833190917968750.3
The above is the detailed content of Why do double floating point operations lose precision?. For more information, please follow other related articles on the PHP Chinese website!