What is “Least Squares” Regression?
Web resource used: http://www.keypress.com/sketchpad/java_gsp/squares.html
Objective: To gain conceptual understanding with regard to the principle of least squares regression; to understand the meaning of a “line of best fit.”
Created by: Felice Shore (Baltimore City Community College) for Project Synergy
This site illustrates the principle of least squares regression. You see six points, labeled P(1), P(2),…,P(6) with a line drawn through them (not necessarily the line of best fit!). But what are those squares??…and what makes a line of “best fit” ?
On the graph, you see a line that is “going through the neighborhood” of six points. Ideally, we’d have a line that goes through all six points, but that is impossible if the points are not collinear. So each point has an error associated with the line, which is the vertical distance from that point to the line. This is called the residual, or error. That error is visually depicted as the vertical side of each square coming off of each point. If we squared that error, we get a square number, visually depicted as the area of each square. The SSE is the sum of squared errors for all points in a data set. That is, imagine finding that squared residual for all data points in a set, and then adding up those squared residuals, to get the SSE. In the graph, the large red square is the total area of all six squares; it is a visual depiction of the SSE.
A line of “best fit” is constructed in such a way as to minimize the SSE. That is, the line that will fit that data best will be the line that is associated with the smallest red square you can possibly make. Notice that the total area of the red square is given to you.
First, be aware of the six data points in this graph. They are labeled P(1) through P(6). A y-intercept and a slope “control point” mark the line itself. You can effectively change the location and orientation of the line by clicking and dragging either of those two control points. Those are the only two points you should move.
If you click and drag the “y-intercept” up and down the y-axis, you will move your line up or down without changing the slope. If you click on and drag the “slope” point on the line, you will rotate the line, thereby changing only it’s slope, but not the y-intercept. Remember, the large red square in the bottom right is showing you the total area of all six smaller squares.