Conceptualizing Correlation and Regression Equation

 

Correlation coefficient    

 

Slope of regression line (y=mx+b)               

 

Open the Fathom file called CorrelParts.ftm. Working with a partner, do and answer the following:

Part I:

1.      Examine Collection 1. The first two columns are the numerical values of the data points shown in the scatterplot. If you change any of the numerical values in the table, the graph will update. Likewise, if you move one of the data points in the scatterplot, the numerical values in the table will change accordingly. The rest of the columns in Collection 1 have been calculated based on the X and Y values and are needed calculations for the formula for r above.

 

2.      The next two columns represent the deviations of each X and Y value from the respective means. To help make sense of these deviations graphically, plot a horizontal line (Plot function) to indicate the Ymean and a vertical line (Plot value) to indicate the Xmean. Drag any of the data points in the scatterplot and observe how the means update accordingly as well as the calculated values in Collection 1.

 

3.      What data point is represented at the intersection of the vertical and horizontal lines?

 

·        Is it possible that all points can be on one side of either  or ? Why or why not?

 

·        Can nine of the 10 data points be on one side of either  or ? Why or why not?

 

·        Can the data be changed in such a way that nine of the data points lie in the new “third quadrant” with the last point in the “first quadrant”? What would this result say about the mean?

Part II:  DoubleClick the Collection 1 icon to display calculated measures. These measures are based on the calculated values in collection 1 and correspond to parts of the formula for r. Make sense of what these measures represent before proceeding.

 

1.      Move the points on the graph so they are approximately on a line with positive slope.

·        What do you notice about the magnitude and sign and of Xdeviations and Ydeviations??

 

·        What do you notice about the magnitude and sign of XdevSquared and YdevSquared?

 

·        How are these values influencing the value of r? Think about the formula.

 

·        Click on the Graph. Then under the Graph menu, choose Least-Squares Line. Does the equation show a positive slope?

 

 

2.      Move the points so they are approximately on a line with negative slope.

·        What do you notice about the magnitude and sign of Xdeviations and Ydeviations??

 

·        What do you notice about the magnitude and sign of XdevSquared and YdevSquared?

 

·        How are these values influencing the value of r?

 

 

3.      Move the points so they appear to have no association.

·        What do you notice about the magnitude and sign of Xdeviations and Ydeviations??

 

·        What do you notice about the magnitude and sign of XdevSquared and YdevSquared?

 

·        How are these values influencing the value of r?

 

 

4.      Place the points in a positive linear trend. Drag one of the points on the graph so that it is clearly an outlier. Observe the effects on the regression line and the value of r. 

·        Based on the formula for r, describe why the value of r is affected so greatly by an outlier.

 

·        Pick up the outlier point and drag it to different locations on the graph. Find three different locations of an outlier that cause the regression line to drastically change. Where did you have to place the outlier for this effect? Why does this make sense?

5.      In the lower left corner of the coordinate plane, place 9 of your points in a “cloud” that appears to have no trend. Then move one point to the upper right corner.

·        Is this scatterplot linear?

 

·        What is the value of r? Reason from the formula for r and the displayed measures to make sense of this value.

 

·        Find two other sets of data points that give a high r value but show no linear trend.

 

·        Does a high r value necessarily mean that the data are generally linear?

 

·        Does a low r value necessarily mean that the data are NOT generally linear?

 

 

6.      Place your points in a nearly horizontal line on the graph. What is the value of r? Why?

 

 

7.      Place your points in a nearly vertical line on the graph. What is the value of r? Why?

 

 

 

8.      Based on how the formula for r is computed, why do you think the values of r are constrained between –1 and 1?

 

 

 

9.      Throughout the investigation in Part 2, what did you notice about the relationship between the data point  and the regression line?

 

 

10.  How is correlation coefficient r related to the slope of the regression equation? Is the value of r the same as the slope of the line? Does an r value of 1 imply a y=x relationship? Why or why or not?

 

 

 

 

11.  Recall that the formula for slope of a regression line can be expressed as . Thus, by calculating the means and standard deviations for x values and Y values in a data set, as well as r, one can derive the line of best fit using algebraic techniques. Test this procedure with 10 data points.