Conceptualizing Correlation and Regression Equation
Correlation coefficient

Slope of regression line (y=mx+b)
Open the Fathom file called CorrelParts.ftm. Working with a partner, do and answer the following:
Part I:
1. Examine Collection 1. The first two columns are the numerical values of the data points shown in the scatterplot. If you change any of the numerical values in the table, the graph will update. Likewise, if you move one of the data points in the scatterplot, the numerical values in the table will change accordingly. The rest of the columns in Collection 1 have been calculated based on the X and Y values and are needed calculations for the formula for r above.

2. The next two columns represent the
deviations of each X and Y value from the respective means. To help make sense
of these deviations graphically, plot a horizontal line (Plot function) to indicate
the Ymean and a vertical line (Plot value) to indicate the Xmean. Drag any of
the data points in the scatterplot and observe how the means update accordingly
as well as the calculated values in Collection 1.
3. What data point is represented at the intersection of the vertical and horizontal lines?
·
Is it possible that all points can be on one
side of either
or
? Why or why not?
·
Can nine of the 10 data points be on one side of
either
or
? Why or why not?
·
Can the data be changed in such a way that nine
of the data points lie in the new “third quadrant” with the last point in the
“first quadrant”? What would this result say about the mean?
Part II: DoubleClick the Collection 1 icon to
display calculated measures. These measures are based on the calculated values
in collection 1 and correspond to parts of the formula for r. Make sense of
what these measures represent before proceeding.
1. Move the points on the graph so they are approximately on a line with positive slope.
· What do you notice about the magnitude and sign and of Xdeviations and Ydeviations??
· What do you notice about the magnitude and sign of XdevSquared and YdevSquared?
· How are these values influencing the value of r? Think about the formula.
· Click on the Graph. Then under the Graph menu, choose Least-Squares Line. Does the equation show a positive slope?
2. Move the points so they are approximately on a line with negative slope.
· What do you notice about the magnitude and sign of Xdeviations and Ydeviations??
· What do you notice about the magnitude and sign of XdevSquared and YdevSquared?
· How are these values influencing the value of r?
3. Move the points so they appear to have no association.
· What do you notice about the magnitude and sign of Xdeviations and Ydeviations??
· What do you notice about the magnitude and sign of XdevSquared and YdevSquared?
· How are these values influencing the value of r?
4. Place the points in a positive linear trend. Drag one of the points on the graph so that it is clearly an outlier. Observe the effects on the regression line and the value of r.
· Based on the formula for r, describe why the value of r is affected so greatly by an outlier.
· Pick up the outlier point and drag it to different locations on the graph. Find three different locations of an outlier that cause the regression line to drastically change. Where did you have to place the outlier for this effect? Why does this make sense?
5. In the lower left corner of the coordinate plane, place 9 of your points in a “cloud” that appears to have no trend. Then move one point to the upper right corner.
· Is this scatterplot linear?
· What is the value of r? Reason from the formula for r and the displayed measures to make sense of this value.
· Find two other sets of data points that give a high r value but show no linear trend.
· Does a high r value necessarily mean that the data are generally linear?
· Does a low r value necessarily mean that the data are NOT generally linear?
6. Place your points in a nearly horizontal line on the graph. What is the value of r? Why?
7. Place your points in a nearly vertical line on the graph. What is the value of r? Why?
8. Based on how the formula for r is computed, why do you think the values of r are constrained between –1 and 1?
9. Throughout
the investigation in Part 2, what did you notice about the relationship between
the data point
and the regression
line?
10. How is correlation coefficient r related to the slope of the regression equation? Is the value of r the same as the slope of the line? Does an r value of 1 imply a y=x relationship? Why or why or not?
11. Recall that
the formula for slope of a regression line can be expressed as
. Thus, by calculating the means and standard deviations for
x values and Y values in a data set, as well as r, one can derive the line of
best fit using algebraic techniques. Test this procedure with 10 data points.