Home > Article > Backend Development > How to Effectively Smooth Curves when Dealing with Noisy Datasets?
Smoothing Curves with Dataset Noise: A Practical Guide
Smoothing curves for noisy datasets is a common challenge in data analysis. To address this, consider a dataset with a 20% variation due to noise:
import numpy as np x = np.linspace(0, 2*np.pi, 100) y = np.sin(x) + np.random.random(100) * 0.2
For this situation, the Savitzky-Golay filter is an effective choice. This filter works by fitting a polynomial to a window of data points and using the polynomial to estimate the value at the center of the window. The window is then shifted along the data, and the process repeats, resulting in a smoothed curve.
Here's how to implement the Savitzky-Golay filter in Python:
<code class="python">import numpy as np import matplotlib.pyplot as plt</code>
<code class="python">yhat = savgol_filter(y, 51, 3) # window size 51, polynomial order 3</code>
<code class="python">plt.plot(x, y) plt.plot(x, yhat, color='red') plt.show()</code>
The resulting curve will be smoother than the original while still preserving the underlying signal.
Note: If you don't have the savgol_filter function available, you can install it using the following command:
pip install scipy
The above is the detailed content of How to Effectively Smooth Curves when Dealing with Noisy Datasets?. For more information, please follow other related articles on the PHP Chinese website!