Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Curve fitting
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Process of constructing a curve that has the best fit to a series of data points}} {{Redirect|Best fit|placing ("fitting") variable-sized objects in storage|Fragmentation (computing)}} [[File:Regression pic assymetrique.gif|thumb|upright=1.5|Fitting of a noisy curve by an asymmetrical peak model, with an iterative process ([[Gauss–Newton algorithm]] with variable damping factor α).]] '''Curve fitting'''<ref>Sandra Lach Arlinghaus, PHB Practical Handbook of Curve Fitting. CRC Press, 1994.</ref><ref>William M. Kolb. [https://books.google.com/books?id=ZiLYAAAAMAAJ&q=%22Curve+fitting%22 Curve Fitting for Programmable Calculators]. Syntec, Incorporated, 1984.</ref> is the process of constructing a [[curve]], or [[function (mathematics)|mathematical function]], that has the best fit to a series of [[data points]],<ref>S.S. Halli, K.V. Rao. 1992. Advanced Techniques of Population Analysis. {{ISBN|0306439972}} Page 165 (''cf''. ... functions are fulfilled if we have a good to moderate fit for the observed data.)</ref> possibly subject to constraints.<ref>[https://books.google.com/books?id=SI-VqAT4_hYC ''The Signal and the Noise: Why So Many Predictions Fail-but Some Don't.''] By Nate Silver</ref><ref>[https://books.google.com/books?id=hhdVr9F-JfAC Data Preparation for Data Mining]: Text. By Dorian Pyle.</ref> Curve fitting can involve either [[interpolation]],<ref>Numerical Methods in Engineering with MATLAB®. By Jaan Kiusalaas. Page 24.</ref><ref>[https://books.google.com/books?id=YlkgAwAAQBAJ&q=%22curve+fitting%22 Numerical Methods in Engineering with Python 3]. By Jaan Kiusalaas. Page 21.</ref> where an exact fit to the data is required, or [[smoothing]],<ref>[https://books.google.com/books?id=UjnB0FIWv_AC&q=smoothing Numerical Methods of Curve Fitting]. By P. G. Guest, Philip George Guest. Page 349.</ref><ref>See also: [[Mollifier]]</ref> in which a "smooth" function is constructed that approximately fits the data. A related topic is [[regression analysis]],<ref>[https://books.google.com/books?id=g1FO9pquF3kC&q=%22regression+analysis%22 Fitting Models to Biological Data Using Linear and Nonlinear Regression]. By Harvey Motulsky, Arthur Christopoulos.</ref><ref>[https://books.google.com/books?id=Us4YE8lJVYMC&q=%22regression+analysis%22 Regression Analysis] By Rudolf J. Freund, William J. Wilson, Ping Sa. Page 269.</ref> which focuses more on questions of [[statistical inference]] such as how much uncertainty is present in a curve that is fitted to data observed with random errors. Fitted curves can be used as an aid for data visualization,<ref>Visual Informatics. Edited by Halimah Badioze Zaman, Peter Robinson, Maria Petrou, Patrick Olivier, Heiko Schröder. Page 689.</ref><ref>[https://books.google.com/books?id=rdJvXG1k3HsC&q=%22Curve+fitting%22 Numerical Methods for Nonlinear Engineering Models]. By John R. Hauser. Page 227.</ref> to infer values of a function where no data are available,<ref>Methods of Experimental Physics: Spectroscopy, Volume 13, Part 1. By Claire Marton. Page 150.</ref> and to summarize the relationships among two or more variables.<ref>Encyclopedia of Research Design, Volume 1. Edited by Neil J. Salkind. Page 266.</ref> [[Extrapolation]] refers to the use of a fitted curve beyond the [[range (statistics)|range]] of the observed data,<ref>[https://books.google.com/books?id=ba0hAQAAQBAJ&q=%22Curve+fitting%22+OR+extrapolation Community Analysis and Planning Techniques]. By Richard E. Klosterman. Page 1.</ref> and is subject to a [[Uncertainty|degree of uncertainty]]<ref>An Introduction to Risk and Uncertainty in the Evaluation of Environmental Investments. DIANE Publishing. [https://books.google.com/books?id=rJ23LWaZAqsC&pg=PA69 Pg 69]</ref> since it may reflect the method used to construct the curve as much as it reflects the observed data. For linear-algebraic analysis of data, "fitting" usually means trying to find the curve that minimizes the vertical (''y''-axis) displacement of a point from the curve (e.g., [[ordinary least squares]]). However, for graphical and image applications, geometric fitting seeks to provide the best visual fit; which usually means trying to minimize the [[orthogonal distance]] to the curve (e.g., [[total least squares]]), or to otherwise include both axes of displacement of a point from the curve. Geometric fits are not popular because they usually require non-linear and/or iterative calculations, although they have the advantage of a more aesthetic and geometrically accurate result.<ref>{{citation |first=Sung-Joon |last=Ahn |title=Geometric Fitting of Parametric Curves and Surfaces |journal=Journal of Information Processing Systems |volume=4 |issue=4 |pages=153–158 |date=December 2008 |doi=10.3745/JIPS.2008.4.4.153 |url=http://jips-k.org/dlibrary/JIPS_v04_no4_paper4.pdf |url-status=dead |archiveurl=https://web.archive.org/web/20140313084307/http://jips-k.org/dlibrary/JIPS_v04_no4_paper4.pdf |archivedate=2014-03-13 }}</ref><ref>{{citation |first1=N. |last1=Chernov |first2=H. |last2=Ma |year=2011 |contribution=Least squares fitting of quadratic curves and surfaces |title=Computer Vision |editor-first=Sota R. |editor-last=Yoshida |publisher=Nova Science Publishers |isbn=9781612093994 |pages=285–302 |url=<!-- http://people.cas.uab.edu/~mosya/papers/CM1nova.pdf No indication of copyright --> }}</ref><ref>{{citation |first1=Yang |last1=Liu |first2=Wenping |last2=Wang |year=2008 |contribution=A Revisit to Least Squares Orthogonal Distance Fitting of Parametric Curves and Surfaces |editor1-first=F. |editor1-last=Chen |editor2-first=B. |editor2-last=Juttler |title=Advances in Geometric Modeling and Processing |series=Lecture Notes in Computer Science |volume=4975 |pages=384–397 |doi=10.1007/978-3-540-79246-8_29 |isbn=978-3-540-79245-1|citeseerx=10.1.1.306.6085 }}</ref> ==Algebraic fitting of functions to data points{{anchor|Functions|Algebraic}}== Most commonly, one fits a function of the form {{math|''y''{{=}}''f''(''x'')}}. ===Fitting lines and polynomial functions to data points{{anchor|Polynomials}}=== {{main|Polynomial regression}} {{See also|Polynomial interpolation}} [[File:Curve fitting.svg|alt=Polynomial curves fitting a sine function|thumb|upright=1.3|Polynomial curves fitting points generated with a sine function. The black dotted line is the "true" data, the red line is a <span style="color:red">first degree polynomial</span>, the green line is <span style="color:green">second degree</span>, the orange line is <span style="color:orange">third degree</span> and the blue line is <span style="color:blue">fourth degree.</span>]] The first degree [[polynomial]] equation :<math>y = ax + b\;</math> is a line with [[slope]] ''a''. A line will connect any two points, so a first degree polynomial equation is an exact fit through any two points with distinct x coordinates. If the order of the equation is increased to a second degree polynomial, the following results: :<math>y = ax^2 + bx + c\;.</math> This will exactly fit a simple curve to three points. If the order of the equation is increased to a third degree polynomial, the following is obtained: :<math>y = ax^3 + bx^2 + cx + d\;.</math> This will exactly fit four points. A more general statement would be to say it will exactly fit four '''constraints'''. Each constraint can be a point, [[angle]], or [[curvature]] (which is the reciprocal of the radius of an [[osculating circle]]). Angle and curvature constraints are most often added to the ends of a curve, and in such cases are called '''end conditions'''. Identical end conditions are frequently used to ensure a smooth transition between polynomial curves contained within a single [[spline (mathematics)|spline]]. Higher-order constraints, such as "the change in the rate of curvature", could also be added. This, for example, would be useful in highway [[Cloverleaf interchange|cloverleaf]] design to understand the rate of change of the forces applied to a car (see [[Jerk (physics)|jerk]]), as it follows the cloverleaf, and to set reasonable speed limits, accordingly. The first degree polynomial equation could also be an exact fit for a single point and an angle while the third degree polynomial equation could also be an exact fit for two points, an angle constraint, and a curvature constraint. Many other combinations of constraints are possible for these and for higher order polynomial equations. If there are more than ''n'' + 1 constraints (''n'' being the degree of the polynomial), the polynomial curve can still be run through those constraints. An exact fit to all constraints is not certain (but might happen, for example, in the case of a first degree polynomial exactly fitting three [[collinear points]]). In general, however, some method is then needed to evaluate each approximation. The [[least squares]] method is one way to compare the deviations. There are several reasons given to get an approximate fit when it is possible to simply increase the degree of the polynomial equation and get an exact match.: * Even if an exact match exists, it does not necessarily follow that it can be readily discovered. Depending on the algorithm used there may be a divergent case, where the exact fit cannot be calculated, or it might take too much computer time to find the solution. This situation might require an approximate solution. * The effect of averaging out questionable data points in a sample, rather than distorting the curve to fit them exactly, may be desirable. * [[Runge's phenomenon]]: high order polynomials can be highly oscillatory. If a curve runs through two points ''A'' and ''B'', it would be expected that the curve would run somewhat near the midpoint of ''A'' and ''B'', as well. This may not happen with high-order polynomial curves; they may even have values that are very large in positive or negative [[magnitude (mathematics)|magnitude]]. With low-order polynomials, the curve is more likely to fall near the midpoint (it's even guaranteed to exactly run through the midpoint on a first degree polynomial). * Low-order polynomials tend to be smooth and high order polynomial curves tend to be "lumpy". To define this more precisely, the maximum number of [[inflection point]]s possible in a polynomial curve is ''n-2'', where ''n'' is the order of the polynomial equation. An inflection point is a location on the curve where it switches from a positive radius to negative. We can also say this is where it transitions from "holding water" to "shedding water". Note that it is only "possible" that high order polynomials will be lumpy; they could also be smooth, but there is no guarantee of this, unlike with low order polynomial curves. A fifteenth degree polynomial could have, at most, thirteen inflection points, but could also have eleven, or nine or any odd number down to one. (Polynomials with even numbered degree could have any even number of inflection points from ''n'' - 2 down to zero.) The degree of the polynomial curve being higher than needed for an exact fit is undesirable for all the reasons listed previously for high order polynomials, but also leads to a case where there are an infinite number of solutions. For example, a first degree polynomial (a line) constrained by only a single point, instead of the usual two, would give an infinite number of solutions. This brings up the problem of how to compare and choose just one solution, which can be a problem for both software and humans. Because of this, it is usually best to choose as low a degree as possible for an exact match on all constraints, and perhaps an even lower degree, if an approximate fit is acceptable. [[File:Gohana inverted S-curve.png|thumb|upright=1.25|Relation between wheat yield and soil salinity<ref>[https://www.waterlog.info/sigmoid.htm Calculator for sigmoid regression]</ref>]] ===Fitting other functions to data points=== Other types of curves, such as [[trigonometric functions]] (such as sine and cosine), may also be used, in certain cases. In spectroscopy, data may be fitted with [[Normal distribution|Gaussian]], [[Cauchy distribution|Lorentzian]], [[Voigt function|Voigt]] and related functions. In biology, ecology, demography, epidemiology, and many other disciplines, the [[Population growth|growth of a population]], the spread of infectious disease, etc. can be fitted using the [[logistic function]]. In [[agriculture]] the inverted logistic [[sigmoid function]] (S-curve) is used to describe the relation between crop yield and growth factors. The blue figure was made by a sigmoid regression of data measured in farm lands. It can be seen that initially, i.e. at low soil salinity, the crop yield reduces slowly at increasing soil salinity, while thereafter the decrease progresses faster. ==Geometric fitting of plane curves to data points{{anchor|Plane curves|Geometric}}== If a function of the form <math>y=f(x)</math> cannot be postulated, one can still try to fit a [[plane curve]]. Other types of curves, such as [[conic sections]] (circular, elliptical, parabolic, and hyperbolic arcs) or [[trigonometric functions]] (such as sine and cosine), may also be used, in certain cases. For example, trajectories of objects under the influence of gravity follow a parabolic path, when air resistance is ignored. Hence, matching trajectory data points to a parabolic curve would make sense. Tides follow sinusoidal patterns, hence tidal data points should be matched to a sine wave, or the sum of two sine waves of different periods, if the effects of the Moon and Sun are both considered. For a [[parametric curve]], it is effective to fit each of its coordinates as a separate function of [[arc length]]; assuming that data points can be ordered, the [[chord distance]] may be used.<ref>p.51 in Ahlberg & Nilson (1967) ''The theory of splines and their applications'', Academic Press, 1967 [https://books.google.com/books?id=S7d1pjJHsRgC&pg=PA51]</ref> ===Fitting a circle by geometric fit{{anchor|Circles}}=== [[File:Regression circulaire coope arc de cercle.svg|thumb|Circle fitting with the Coope method, the points describing a circle arc, centre (1 ; 1), radius 4.]] [[File:Wp ellfitting.png|thumb|different models of ellipse fitting]] [[File:Regression elliptique distance algebrique donnees gander.svg|thumb|Ellipse fitting minimising the algebraic distance (Fitzgibbon method).]] Coope<ref>{{cite journal|author=Coope, I.D.|title=Circle fitting by linear and nonlinear least squares|journal=Journal of Optimization Theory and Applications |volume =76|issue =2|year=1993|doi=10.1007/BF00939613|pages=381–388|hdl=10092/11104|s2cid=59583785 |hdl-access=free}}</ref> approaches the problem of trying to find the best visual fit of circle to a set of 2D data points. The method elegantly transforms the ordinarily non-linear problem into a linear problem that can be solved without using iterative numerical methods, and is hence much faster than previous techniques. ===Fitting an ellipse by geometric fit{{anchor|Ellipses}}=== The above technique is extended to general ellipses<ref>Paul Sheer, [http://wiredspace.wits.ac.za/bitstream/handle/10539/22434/Sheer%20Paul%201997.pdf?sequence=1&isAllowed=y A software assistant for manual stereo photometrology], M.Sc. thesis, 1997</ref> by adding a non-linear step, resulting in a method that is fast, yet finds visually pleasing ellipses of arbitrary orientation and displacement. ==Fitting surfaces== {{Further|Computer representation of surfaces}} {{See also|Multivariate interpolation|Smoothing}} Note that while this discussion was in terms of 2D curves, much of this logic also extends to 3D surfaces, each patch of which is defined by a net of curves in two parametric directions, typically called '''u''' and '''v'''. A surface may be composed of one or more surface patches in each direction. ==Software== Many [[List of statistical packages|statistical packages]] such as [[R (programming language)|R]] and [[List of numerical-analysis software|numerical software]] such as the [[gnuplot]], [[GNU Scientific Library]], [[Igor Pro]], [[MLAB]], [[Maple (software)|Maple]], [[MATLAB]], TK Solver 6.0, [[Scilab]], [[Mathematica]], [[GNU Octave]], and [[SciPy]] include commands for doing curve fitting in a variety of scenarios. There are also programs specifically written to do curve fitting; they can be found in the [[List of statistical software|lists of statistical]] and [[List of numerical-analysis software|numerical-analysis programs]] as well as in [[:Category:Regression and curve fitting software]]. ==See also== {{div col|colwidth=25em}} * [[Calibration curve]] * [[Curve-fitting compaction]] * [[Discretization]] * [[Estimation theory]] * [[Function approximation]] * [[Genetic programming]] * [[Goodness of fit]] * [[Least-squares adjustment]] * [[Levenberg–Marquardt algorithm]] * [[Line fitting]] * [[Linear interpolation]] * [[Linear trend estimation]] * [[Mathematical model]] * [[Multi expression programming]] * [[Multi-curve framework]] and [[Bootstrapping (finance)]] * [[Nonlinear regression]] * [[Overfitting]] * [[Plane curve]] * [[Probability distribution fitting]] * [[Progressive-iterative approximation method]] * [[Sinusoidal model]] * [[Smoothing]] * [[Spline (mathematics)|Splines]] ([[Spline interpolation|interpolating]], [[Smoothing spline|smoothing]]) * [[Time series]] * [[Total least squares]] {{div col end}} {{clear}} ==References== {{Reflist|30em}} ==Further reading== {{Commons category|Curve fitting}} *N. Chernov (2010), ''Circular and linear regression: Fitting circles and lines by least squares'', Chapman & Hall/CRC, Monographs on Statistics and Applied Probability, Volume 117 (256 pp.). [http://people.cas.uab.edu/~mosya/cl/] {{Authority control}} [[Category:Curve fitting| ]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Anchor
(
edit
)
Template:Authority control
(
edit
)
Template:Citation
(
edit
)
Template:Cite journal
(
edit
)
Template:Clear
(
edit
)
Template:Commons category
(
edit
)
Template:Div col
(
edit
)
Template:Div col end
(
edit
)
Template:Further
(
edit
)
Template:ISBN
(
edit
)
Template:Main
(
edit
)
Template:Math
(
edit
)
Template:Redirect
(
edit
)
Template:Reflist
(
edit
)
Template:See also
(
edit
)
Template:Short description
(
edit
)