Aron,
 Aron, Coups, & Aron




        Chapter 3
Correlation and P di i
C    l i      d Prediction



                       Copyright © 2011 by Pearson
                       Education, Inc. All rights reserved
Correlations
 Can be thought of as a descriptive statistic for
 the relationship b
  h      l i    hi between two variables
                                    i bl
 Describes the relationship between two equal-
 interval numeric variables
 ◦ e.g., the correlation between amount of time
   studying and amount learned
         y g
 ◦ e.g., the correlation between number of years
   of education and salary




                            Copyright © 2011 by Pearson
                            Education, Inc. All rights reserved
Scatter Diagram
Graphing a Scatter Diagram
To make a scatter diagram:
   Draw the axes and decide which variable goes on which axis
                                                         axis.
      The values of one variable go along the horizontal axis and the values of the
      other variable go along the vertical axis.
   Determine the range of values to use for each variable and mark them on
   the axes
       axes.
      Numbers should go from low to high on each axis starting from where the
      axes meet .
      Usually your low value on each axis is 0.
      Each axis should continue to the highest value your measure can possibly have
                                                                               have.
   Make a dot for each pair of scores.
      Find the place on the horizontal axis for the first pair of scores on the
      horizontal-axis variable.
      Move up to the height for the score for the first pair of scores on the vertical-
                                                                                vertical
      axis variable and mark a clear dot.
      Keep going until you have marked a dot for each person.




                                                   Copyright © 2011 by Pearson
                                                   Education, Inc. All rights reserved
Linear Correlation
A linear correlation
◦ relationship between two variables that shows
  up on a scatter diagram as dots roughly
  approximating strai ht
  a ro imatin a straight line
Curvilinear Correlation
 Curvilinear correlation
 ◦ any association between two variables other
   than a linear correlation
 ◦ relationship between two variables that shows
   up on a scatter diagram as dots following a
   systematic pattern that is not a straight line
No Correlation
 No correlation
 ◦ no systematic relationship between two
   variables




                           Copyright © 2011 by Pearson
                           Education, Inc. All rights reserved
Positive and Negative Linear
Correlation
Positive Correlation
   High scores go with high scores.
   Low scores go with low scores.
   Medium scores go with medium scores
                                    scores.
   When graphed, the line goes up and to the right.
      e.g., level of education achieved and income
Negative Correlation
  g
   High scores go with low scores.
       e.g., the relationship between fewer hours of
    sleep and higher levels of stress
Strength of the Correlation
   how close the dots on a scatter diagram fall to a simple straight line



                                                Copyright © 2011 by Pearson
                                                Education, Inc. All rights reserved
Importance of Identifying the
Pattern of Correlation
 Use a scatter diagram to examine the pattern, direction,
 and strength of a correlation
                   correlation.
  ◦ First, determine whether it is a linear or curvilinear relationship.
  ◦ If linear, look to see if it is a positive or negative
    correlation.
           l i
  ◦ Then look to see if the correlation is large, small, or
    moderate.
 Approximating the direction and strength of a
 correlation allows you to double check your
 calculations later.




                                         Copyright © 2011 by Pearson
                                         Education, Inc. All rights reserved
The Correlation Coefficient
A number that gives the exact correlation
between two variables

◦ can tell you both direction and strength of relationship
  between two variables (X and Y)
◦ uses Z scores to compare scores on different variables
                 t                      diff    t    i bl




                                        Copyright © 2011 by Pearson
                                        Education, Inc. All rights reserved
The Correlation Coefficient
(r)
 The sign of r (Pearson correlation
 coefficient) tells the general trend of a
 relationship between two variables.
  + sign means the correlation is positive.
  - sign means the correlation is negative.
 The value of r ranges from -1 to 1.
     A correlation of 1 or -1 means that the variables are perfectly
     correlated.
     0 = no correlation
Strength of Correlation Coefficients


Correlation Coefficient Value            Strength of Relationship
+/- .70-1.00                             Strong
                                              g
+/- .30-.69                              Moderate
+/- .00-.29                              None (.00) to Weak



    The value of a correlation defines the strength of the
    correlation regardless of the sign
                                  sign.
               e.g., -.99 is a stronger correlation than .75
Formula for a Correlation
Coefficient
 r = ∑ZxZy
      N
    Zx = Z score for each person on the X variable
    Zy = Z score for each person on the Y variable
                   f      h          h         bl
    ZxZy = cross-product of Zx and Zy
    ∑ZxZy = sum of the cross-products of the Z scores over all
    participants in the study




                                    Copyright © 2011 by Pearson
                                    Education, Inc. All rights reserved
Steps for Figuring the Correlation
Coefficient
C ffi i

Change all scores to Z scores.
◦ Figure the mean and the standard deviation of each variable.
◦ Change each raw score to a Z score
                                score.
Calculate the cross-product of the Z scores
for each person.
         p
◦ Multiply each person’s Z score on one variable by his or her
  Z score on the other variable.
Add up the cross-products of the Z scores
            cross products         scores.
Divide by the number of people in the
study.
    y
                                    Copyright © 2011 by Pearson
                                    Education, Inc. All rights reserved
Calculating a Correlation Coefficient
          g
Number of Hours Slept (X)   Level of Mood (Y)             Calculate r




   X         Zscore Sleep   Y           Zscore Mood   Cross Product ZXZY

   5              ‐1.23     2                 ‐1.05          1.28

   7              0.00      4                 0.00           0.00

   8              0.61      7                 1.57           0.96

   6              ‐0.61
                   0 61     2                 ‐1.05
                                               1 05          0.64
                                                             0 64

   6              ‐0.61     3                 ‐0.52          0.32

  10              1.84      6                 1.05           1.93




   MEAN= 7                      MEAN= 4                                  5.14   ΣZXZY

       SD= 1.63
       SD 1 63                     SD= 1.91
                                   SD 1 91                          r=5.14/6
                                                                      5 14/6      ΣZXZY
                                                                                r=ΣZXZY

                                                                        r=.85
Issues in Interpreting the
Correlation Coefficient
 Direction of causality
                      y
  ◦ path of causal effect (e.g., X causes Y)
 You cannot determine the direction
 of causality just because two
 variables are correlated.




                               Copyright © 2011 by Pearson
                               Education, Inc. All rights reserved
Reasons Why We cannot Assume
Causality
 Variable X causes variable Y.
 ◦ e.g., less sleep causes more stress
 Variable Y causes variable X.
 ◦ e.g., more stress causes people to sleep less
 There is a third variable that causes both
 variable X and variable Y.
 ◦ e.g., working longer hours causes both stress
   and fewer hours of sleep


                            Copyright © 2011 by Pearson
                            Education, Inc. All rights reserved
Ruling Out Some Possible
Directions of Causality
Longitudinal Study
◦ a study where people are measured at two or
  more points in time
   e.g., evaluating number of hours of sleep at one time point and
   then evaluating their levels of stress at a later time point
True Experiment
◦ a study in which participants are randomly
  assigned to a particular level of a variable and
  then measured on another variable
   h            d         h       i bl
   e.g., exposing individuals to varying amounts of sleep in a
   laboratory environment and then evaluating their stress levels


                                        Copyright © 2011 by Pearson
                                        Education, Inc. All rights reserved
The Statistical Significance of a Correlation
Coefficient
A correlation is statistically significant if it is
unlikely that you could have gotten a
correlation as big as you did if in fact there
was no relationship between variables.
                   p
◦ If the probability (p) is less than some small degree
  of probability (e.g., 5% or 1%), the correlation is
  considered statistically significant.
Prediction
Predictor Variable (X)
  variable being predicted from
     e.g., level of education achieved
Criterion Variable (Y)
  variable being predicted to
     e.g.,
     e g income
If we expect level of education to predict income, the
predictor variable would be level of education and
the criterion variable would b i
 h     it i        i bl     ld be income.



                                         Copyright © 2011 by Pearson
                                         Education, Inc. All rights reserved
Prediction Using Z Scores
Prediction Model
  A person’s predicted Z score on the criterion
  variable is found by multiplying the standardized
  regression coefficient (β) by that person s Z score
                                         person’s
  on the predictor variable.
Formula for the prediction model using Z scores:
  Predicted
  P di t d Zy = (β)(Zx)
  Predicted Zy = predicted value of the particular person’s Z
   score on the criterion variable Y
   Zx = particular person’s Z   ’      score in the predictor
   variable X


                                          Copyright © 2011 by Pearson
                                          Education, Inc. All rights reserved
Steps for Prediction Using Z Scores
 Determine the standardized regression
                                 g
 coefficient (β).
 Multiply the standardized regression
   u t p y t e sta a     e eg ess o
 coefficient (β) by the person’s Z score on
 the predictor variable.
     p




                         Copyright © 2011 by Pearson
                         Education, Inc. All rights reserved
How Are You Doing?
 So, let’s say that we want to try to predict a
 person’s oral presentation score b d on a
          ’     l      t ti         based
 known relationship between self-confidence
 and presentation ability.
      p                  y
Which is the predictor variable (Zx)? The
 criterion variable (Zy)?
 If r = .90 and Zx = 2 25 th Zy = ?
         90 d         2.25 then


 So what? What does this predicted value
 tell us?
                           Copyright © 2011 by Pearson
                           Education, Inc. All rights reserved
Prediction Using Raw Scores
Change the person’s raw score on the predictor
             person s
variable to a Z score.
Multiply the standardized regression coefficient (β)
by the person’s Z score on the predictor variable.
  Multiply β by Zx.
     This gives the predicted Z score on the criterion variable.
       Predicted Zy = (β)(Zx)
Change the person’s predicted Z score on the
      g      p        p
criterion variable back to a raw score.
   Predicted Y = (SDy)(Predicted Zy) + My

                                         Copyright © 2011 by Pearson
                                         Education, Inc. All rights reserved
Example of Prediction Using Raw
Scores: Change Raw Scores to Z
S       Ch     R S         t
Scores
 From the sleep and mood study example, we known the
 mean for sleep is 7 and the standard deviation is 1.63, and
 that the mean for happy mood is 4 and the standard
 deviation is 1.92.
 The correlation between sleep and mood is .85.
                               p
 Change the person’s raw score on the predictor variable
 to a Z score.
 ◦ Zx = (X - Mx) / SDx
 ◦ (4-7) / 1.63 = -3 / 1.63 = -1.84



               Copyright © 2011 by Pearson Education, Inc. All rights reserved
Example of Prediction Using Raw
Scores: Find the Predicted Z Score
o t C t o a ab
on the Criterion Variable
 Multiply the standardized regression coefficient
 (β) by the person’s Z score on the predictor
            person s
 variable.
 ◦ Multiply β by Zx.
        py     y
    This gives the predicted Z score on the criterion variable.
      Predicted Zy = (β)(Zx) = (.85)(-1.84) = -1.56




             Copyright © 2011 by Pearson Education, Inc. All rights reserved
Example of Prediction Using Raw
     p                    g
Scores: Change Raw Scores to Z
Scores
Change the person’s predicted Z score on the
criterion variable to a raw score
                            score.
◦ Predicted Y = (SDy)(Predicted Zy) + My
◦ Predicted Y = (1.92)(-1.56) + 4 = -3 00 + 4 =
                 (1 92)(-1 56)      -3.00
  1.00




          Copyright © 2011 by Pearson Education, Inc. All rights reserved
The Correlation Coefficient and the
Proportion of Variance Accounted for
P           fV         A        df
Proportion of variance accounted for (r2)
◦ To compare correlations with each other, you
  have to square each correlation
                       correlation.
◦ This number represents the proportion of the
  total variance in one variable that can be
  explained by the other variable.
◦ If you have an r= .2, your r2= .04
                 r
◦ Where, a r= .4, you have an r2= .16
◦ So, relationship with r = .4 is 4x stronger than
     ,           p                        g
  r=.2

Aronchpt3correlation

  • 1.
    Aron, Aron, Coups,& Aron Chapter 3 Correlation and P di i C l i d Prediction Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 2.
    Correlations Can bethought of as a descriptive statistic for the relationship b h l i hi between two variables i bl Describes the relationship between two equal- interval numeric variables ◦ e.g., the correlation between amount of time studying and amount learned y g ◦ e.g., the correlation between number of years of education and salary Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 3.
  • 4.
    Graphing a ScatterDiagram To make a scatter diagram: Draw the axes and decide which variable goes on which axis axis. The values of one variable go along the horizontal axis and the values of the other variable go along the vertical axis. Determine the range of values to use for each variable and mark them on the axes axes. Numbers should go from low to high on each axis starting from where the axes meet . Usually your low value on each axis is 0. Each axis should continue to the highest value your measure can possibly have have. Make a dot for each pair of scores. Find the place on the horizontal axis for the first pair of scores on the horizontal-axis variable. Move up to the height for the score for the first pair of scores on the vertical- vertical axis variable and mark a clear dot. Keep going until you have marked a dot for each person. Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 5.
    Linear Correlation A linearcorrelation ◦ relationship between two variables that shows up on a scatter diagram as dots roughly approximating strai ht a ro imatin a straight line
  • 6.
    Curvilinear Correlation Curvilinearcorrelation ◦ any association between two variables other than a linear correlation ◦ relationship between two variables that shows up on a scatter diagram as dots following a systematic pattern that is not a straight line
  • 7.
    No Correlation Nocorrelation ◦ no systematic relationship between two variables Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 8.
    Positive and NegativeLinear Correlation Positive Correlation High scores go with high scores. Low scores go with low scores. Medium scores go with medium scores scores. When graphed, the line goes up and to the right. e.g., level of education achieved and income Negative Correlation g High scores go with low scores. e.g., the relationship between fewer hours of sleep and higher levels of stress Strength of the Correlation how close the dots on a scatter diagram fall to a simple straight line Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 9.
    Importance of Identifyingthe Pattern of Correlation Use a scatter diagram to examine the pattern, direction, and strength of a correlation correlation. ◦ First, determine whether it is a linear or curvilinear relationship. ◦ If linear, look to see if it is a positive or negative correlation. l i ◦ Then look to see if the correlation is large, small, or moderate. Approximating the direction and strength of a correlation allows you to double check your calculations later. Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 10.
    The Correlation Coefficient Anumber that gives the exact correlation between two variables ◦ can tell you both direction and strength of relationship between two variables (X and Y) ◦ uses Z scores to compare scores on different variables t diff t i bl Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 11.
    The Correlation Coefficient (r) The sign of r (Pearson correlation coefficient) tells the general trend of a relationship between two variables. + sign means the correlation is positive. - sign means the correlation is negative. The value of r ranges from -1 to 1. A correlation of 1 or -1 means that the variables are perfectly correlated. 0 = no correlation
  • 12.
    Strength of CorrelationCoefficients Correlation Coefficient Value Strength of Relationship +/- .70-1.00 Strong g +/- .30-.69 Moderate +/- .00-.29 None (.00) to Weak The value of a correlation defines the strength of the correlation regardless of the sign sign. e.g., -.99 is a stronger correlation than .75
  • 13.
    Formula for aCorrelation Coefficient r = ∑ZxZy N Zx = Z score for each person on the X variable Zy = Z score for each person on the Y variable f h h bl ZxZy = cross-product of Zx and Zy ∑ZxZy = sum of the cross-products of the Z scores over all participants in the study Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 14.
    Steps for Figuringthe Correlation Coefficient C ffi i Change all scores to Z scores. ◦ Figure the mean and the standard deviation of each variable. ◦ Change each raw score to a Z score score. Calculate the cross-product of the Z scores for each person. p ◦ Multiply each person’s Z score on one variable by his or her Z score on the other variable. Add up the cross-products of the Z scores cross products scores. Divide by the number of people in the study. y Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 15.
    Calculating a CorrelationCoefficient g Number of Hours Slept (X) Level of Mood (Y) Calculate r X Zscore Sleep Y Zscore Mood Cross Product ZXZY 5 ‐1.23 2 ‐1.05 1.28 7 0.00 4 0.00 0.00 8 0.61 7 1.57 0.96 6 ‐0.61 0 61 2 ‐1.05 1 05 0.64 0 64 6 ‐0.61 3 ‐0.52 0.32 10 1.84 6 1.05 1.93 MEAN= 7 MEAN= 4 5.14 ΣZXZY SD= 1.63 SD 1 63 SD= 1.91 SD 1 91 r=5.14/6 5 14/6 ΣZXZY r=ΣZXZY r=.85
  • 16.
    Issues in Interpretingthe Correlation Coefficient Direction of causality y ◦ path of causal effect (e.g., X causes Y) You cannot determine the direction of causality just because two variables are correlated. Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 17.
    Reasons Why Wecannot Assume Causality Variable X causes variable Y. ◦ e.g., less sleep causes more stress Variable Y causes variable X. ◦ e.g., more stress causes people to sleep less There is a third variable that causes both variable X and variable Y. ◦ e.g., working longer hours causes both stress and fewer hours of sleep Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 18.
    Ruling Out SomePossible Directions of Causality Longitudinal Study ◦ a study where people are measured at two or more points in time e.g., evaluating number of hours of sleep at one time point and then evaluating their levels of stress at a later time point True Experiment ◦ a study in which participants are randomly assigned to a particular level of a variable and then measured on another variable h d h i bl e.g., exposing individuals to varying amounts of sleep in a laboratory environment and then evaluating their stress levels Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 19.
    The Statistical Significanceof a Correlation Coefficient A correlation is statistically significant if it is unlikely that you could have gotten a correlation as big as you did if in fact there was no relationship between variables. p ◦ If the probability (p) is less than some small degree of probability (e.g., 5% or 1%), the correlation is considered statistically significant.
  • 20.
    Prediction Predictor Variable (X) variable being predicted from e.g., level of education achieved Criterion Variable (Y) variable being predicted to e.g., e g income If we expect level of education to predict income, the predictor variable would be level of education and the criterion variable would b i h it i i bl ld be income. Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 21.
    Prediction Using ZScores Prediction Model A person’s predicted Z score on the criterion variable is found by multiplying the standardized regression coefficient (β) by that person s Z score person’s on the predictor variable. Formula for the prediction model using Z scores: Predicted P di t d Zy = (β)(Zx) Predicted Zy = predicted value of the particular person’s Z score on the criterion variable Y Zx = particular person’s Z ’ score in the predictor variable X Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 22.
    Steps for PredictionUsing Z Scores Determine the standardized regression g coefficient (β). Multiply the standardized regression u t p y t e sta a e eg ess o coefficient (β) by the person’s Z score on the predictor variable. p Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 24.
    How Are YouDoing? So, let’s say that we want to try to predict a person’s oral presentation score b d on a ’ l t ti based known relationship between self-confidence and presentation ability. p y Which is the predictor variable (Zx)? The criterion variable (Zy)? If r = .90 and Zx = 2 25 th Zy = ? 90 d 2.25 then So what? What does this predicted value tell us? Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 25.
    Prediction Using RawScores Change the person’s raw score on the predictor person s variable to a Z score. Multiply the standardized regression coefficient (β) by the person’s Z score on the predictor variable. Multiply β by Zx. This gives the predicted Z score on the criterion variable. Predicted Zy = (β)(Zx) Change the person’s predicted Z score on the g p p criterion variable back to a raw score. Predicted Y = (SDy)(Predicted Zy) + My Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 26.
    Example of PredictionUsing Raw Scores: Change Raw Scores to Z S Ch R S t Scores From the sleep and mood study example, we known the mean for sleep is 7 and the standard deviation is 1.63, and that the mean for happy mood is 4 and the standard deviation is 1.92. The correlation between sleep and mood is .85. p Change the person’s raw score on the predictor variable to a Z score. ◦ Zx = (X - Mx) / SDx ◦ (4-7) / 1.63 = -3 / 1.63 = -1.84 Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 27.
    Example of PredictionUsing Raw Scores: Find the Predicted Z Score o t C t o a ab on the Criterion Variable Multiply the standardized regression coefficient (β) by the person’s Z score on the predictor person s variable. ◦ Multiply β by Zx. py y This gives the predicted Z score on the criterion variable. Predicted Zy = (β)(Zx) = (.85)(-1.84) = -1.56 Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 28.
    Example of PredictionUsing Raw p g Scores: Change Raw Scores to Z Scores Change the person’s predicted Z score on the criterion variable to a raw score score. ◦ Predicted Y = (SDy)(Predicted Zy) + My ◦ Predicted Y = (1.92)(-1.56) + 4 = -3 00 + 4 = (1 92)(-1 56) -3.00 1.00 Copyright © 2011 by Pearson Education, Inc. All rights reserved
  • 29.
    The Correlation Coefficientand the Proportion of Variance Accounted for P fV A df Proportion of variance accounted for (r2) ◦ To compare correlations with each other, you have to square each correlation correlation. ◦ This number represents the proportion of the total variance in one variable that can be explained by the other variable. ◦ If you have an r= .2, your r2= .04 r ◦ Where, a r= .4, you have an r2= .16 ◦ So, relationship with r = .4 is 4x stronger than , p g r=.2