Reliability, validity,
generalizability and the use of
multi-item scales
Edward Shiu (Dept of Marketing)
edward.shiu@strath.ac.uk
Reliable? Valid?
Generalizable?
Multi-item scales
How to use a questionnaire from
published work
• Appendix with items
• Methodology section
Existing multi-item scales
• Used by many
• Reliability and validity may be known
• Good starting block
• Basis to compare / contrast results
Development of a Multi-item Scale
(Doing it the HARD way!! See Malhotra & Birks, 2007)
Develop Theory
Generate Initial Pool of Items: Theory, Secondary Data, and
Qualitative Research
Collect Data from a Large Pretest Sample
Statistical Analysis
Develop Purified Scale
Collect More Data from a Different Sample
Final Scale
Select a Reduced Set of Items Based on Qualitative Judgment
Evaluate Scale Reliability, Validity, and Generalizability
Example of Scale Development
• See Richins & Dawson (1992) “A Consumer
Values Orientation for Materialism and its
Measurement: Scale Development and
Validation,” Journal of Consumer Research, 19
(December), 303-316.
• Materialism scale (7 items)
– Marketing Scales Handbook (Vol IV) p. 352.
1.It is important to me to have really nice things.
2.I would like to be rich enough to buy anything I want.
3.I‟d be happier if I could afford to buy more things.
4. ......
• Note, published scales not always perfect!!!
Scale Evaluation
(See Malhotra & Birks, 2007)
Discriminant NomologicalConvergent
Test/
Retest
Alternative
Forms
Internal
Consistency
Content Criterion Construct
GeneralizabilityReliability Validity
Scale Evaluation
Reliability & Validity
• Reliability - extent a measuring
procedure yields consistent results on
repeated administrations of the scale
• Validity - degree a measuring
procedure accurately reflects or assesses
or captures the specific concept that the
researcher is attempting to measure
Reliable  Valid
Reliability
• Internal consistency reliability
DO THE ITEMS IN THE SCALE GEL WELL TOGETHER
• Split-half reliability, the items on the scale are divided
into two halves and the resulting half scores are
correlated
• Cronbach alpha (α)
– average of all possible „split-half‟ correlation coefficients resulting
from different ways of splitting the scale items
– value varies from 0 to 1
– α < 0.6 indicates unsatisfactory internal consistency reliability
(see Malhotra & Birks, 2007, p.358)
– Note: alpha tends to increase with an increase in the number of
items in scale
• test-retest reliability
– identical scale items administered at two different
times to same set of respondents
– assess (via correlation) if respondents give similar
answers
• alternative-forms reliability
– two equivalent forms of the scale are constructed
– same respondents are measured at two different
times, with a different form being used each time
– assess (via correlation) if respondents give similar
answers
– Note. Hardly ever practical
Construct Validity
• Construct validity is evidenced if we can establish
– convergent validity, discriminant validity and nomological validity
• Convergent validity extent to which scale correlates
positively with other measures of the same construct
• Discriminant validity extent to which scale does not
correlate with other conceptually distinct constructs
• Nomological validity extent to which scale correlates in
theoretically predicted ways with other distinct but
related constructs.
• Also read Malhotra & Birks, 2007, 358-359 on
– content (or face) validity, criterion (concurrent & predictive)
validity
Generalizability
• Refers to extent you can generalise from
your specific observations to beyond your
limited study, situation, items used,
method of administration, context.....
• Hardly even possible!!!
Fun time
• Now onto the data (COCB.sav) !!!!!!
• Read my forthcoming JBR article for
background on COCB and the scale
• 1st SPSS and Cronbach alpha
• Next, Amos and CFA
• Followed by Excel to calculate
composite/construct reliability and AVE, as
well as establish discriminant validity
Cronbach alpha (α)
• SPSS (Analyze…Scale…Reliability Analysis)
• α < 0.6 indicates unsatisfactory internal
consistency reliability (see Malhotra &
Birks, 2007, p.358)
• α > 0.7 indicates satisfactory internal
consistency reliability (Nunnally &
Berstein,1994)
Ref: Nunnally JC & Berstein IH. (1994) Psychometric
Theory. New York: McGraw-Hill.
SPSS output for α
Alpha value for dimension Credibility = 0.894 > 0.7 hence satisfactory
SPSS further output for α
• We note that alpha value for the Credibility
dimension would increase in value (from 0.894
to 0.902) if item cred4 is removed.
• However, unless the improvement is dramatic
AND there is separate reasons (e.g. similar
findings from other studies), then we should
leave the item as part of the dimension.
Limitations for Cronbach alpha
• We should employ multiple measures of
reliability (Cronbach alpha, composite/construct
reliability CR & Average Variance Extracted
AVE)
– Alpha and CR values often are very similar
but AVE‟s can vary much more from alpha
values
– AVE‟s are also used to assess construct
discriminant validity
Composite/Construct Reliability
• CR = {(sum of standardized loadings)2} / {(sum of
standardized loadings)2 + (sum of indicator
measurement errors)}
• AVE = Average Variance Extracted = Variance Extracted
= {sum of (standardzied loadings squared)} / {[sum of
(standardzied loadings squared)] + (sum of indicator
measurement errors)}
• Note: Recommended thresholds: CR > 0.6 & AVE > 0.5,
then construct internal consistency is evidenced (Fornell
& Larker, 1981).
Ref: Fornell, Claes and David G. Larcker (1981). “Evaluating Structural
Equation Models with Unobservable Variables and Measurement
Error,” Journal of Marketing Research, 18(1, February): 39-50.
Discriminant validity
• Discriminant validity is assessed by comparing
the shared variance (squared correlation)
between each pair of constructs against the
minimum of the AVEs for these two constructs.
• If within each possible pairs of constructs, the
shared variance observed is lower than the
minimum of their AVEs, then discriminant validity
is evidenced (Fornell and Larker, 1981).
Amos (Analysis of Moment Structures)
Comm
comm2e2
1
comm1e3 11
Bene
bene3e4
bene2e5
bene1e6
1
1
1
1
Cred
cred3e8
cred2e9
cred1e10
cred4e11
1
1
1
1
1
COCB
ave_SSI e12
ave_POC e13
ave_Voice e14
ave_wom e15
1
1
1
1
1
ave_BAoSF e16
1
ave_DoRA e17
1
ave_Flex e18
1
ave_PiFA e19
1
Loyalty
loy1
e22
1
1
loy2
e23
1
loy3
e24
1
Rectangles
= observed variables
Ellipses
= unobserved variables
loy1; loy2; loy3; comm1;
comm2;….; cred1; ….
bene1;....;ave_PiFA
= SPSS variables
e1 to e24
= error variances
= uniqueness
Loyalty; Comm; Cred;
Bene; COCB
= latent factors
= unobserved factors
CFA and goodness of fit
• See Hair et al.‟s book
• E.g.,
• The CFA resulted in an acceptable overall fit
(GFI=.90, CFI=.94, TLI=.92, RMSEA=.068, and
χ2=524.64, df=160, p<.001). All indicators load
significantly (p<.001) and substantively
(standardized coef >.5) on to their respective
constructs; thus providing evidence of
convergent validity.
Refs
• Baumgartner H, Homburg C. (1996). “Applications of structural
equation modeling in marketing and consumer research: a review,”
International Journal of Research in Marketing,13(2):139–61.
• Churchill, Gilbert A., Jr. (1979). “A Paradigm for Developing Better
Measures of Marketing Constructs,” Journal of Marketing Research,
16(1, February): 64-73.
• Fornell, Claes and David G. Larcker (1981). “Evaluating Structural
Equation Models with Unobservable Variables and Measurement
Error,” Journal of Marketing Research, 18(1, February): 39-50.
• Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham, and
William C. Black (1998), Multivariate Data Analysis. 5th ed.
Englewood Cliffs, NJ: Prentice Hall.
• Nunnally JC & Berstein IH. (1994) Psychometric Theory. New York:
McGraw-Hill.

Reliability, validity, generalizability and the use of multi-item scales

  • 1.
    Reliability, validity, generalizability andthe use of multi-item scales Edward Shiu (Dept of Marketing) edward.shiu@strath.ac.uk Reliable? Valid? Generalizable?
  • 2.
  • 4.
    How to usea questionnaire from published work • Appendix with items • Methodology section
  • 5.
    Existing multi-item scales •Used by many • Reliability and validity may be known • Good starting block • Basis to compare / contrast results
  • 6.
    Development of aMulti-item Scale (Doing it the HARD way!! See Malhotra & Birks, 2007) Develop Theory Generate Initial Pool of Items: Theory, Secondary Data, and Qualitative Research Collect Data from a Large Pretest Sample Statistical Analysis Develop Purified Scale Collect More Data from a Different Sample Final Scale Select a Reduced Set of Items Based on Qualitative Judgment Evaluate Scale Reliability, Validity, and Generalizability
  • 7.
    Example of ScaleDevelopment • See Richins & Dawson (1992) “A Consumer Values Orientation for Materialism and its Measurement: Scale Development and Validation,” Journal of Consumer Research, 19 (December), 303-316. • Materialism scale (7 items) – Marketing Scales Handbook (Vol IV) p. 352. 1.It is important to me to have really nice things. 2.I would like to be rich enough to buy anything I want. 3.I‟d be happier if I could afford to buy more things. 4. ...... • Note, published scales not always perfect!!!
  • 8.
    Scale Evaluation (See Malhotra& Birks, 2007) Discriminant NomologicalConvergent Test/ Retest Alternative Forms Internal Consistency Content Criterion Construct GeneralizabilityReliability Validity Scale Evaluation
  • 9.
    Reliability & Validity •Reliability - extent a measuring procedure yields consistent results on repeated administrations of the scale • Validity - degree a measuring procedure accurately reflects or assesses or captures the specific concept that the researcher is attempting to measure Reliable  Valid
  • 10.
    Reliability • Internal consistencyreliability DO THE ITEMS IN THE SCALE GEL WELL TOGETHER • Split-half reliability, the items on the scale are divided into two halves and the resulting half scores are correlated • Cronbach alpha (α) – average of all possible „split-half‟ correlation coefficients resulting from different ways of splitting the scale items – value varies from 0 to 1 – α < 0.6 indicates unsatisfactory internal consistency reliability (see Malhotra & Birks, 2007, p.358) – Note: alpha tends to increase with an increase in the number of items in scale
  • 11.
    • test-retest reliability –identical scale items administered at two different times to same set of respondents – assess (via correlation) if respondents give similar answers • alternative-forms reliability – two equivalent forms of the scale are constructed – same respondents are measured at two different times, with a different form being used each time – assess (via correlation) if respondents give similar answers – Note. Hardly ever practical
  • 12.
    Construct Validity • Constructvalidity is evidenced if we can establish – convergent validity, discriminant validity and nomological validity • Convergent validity extent to which scale correlates positively with other measures of the same construct • Discriminant validity extent to which scale does not correlate with other conceptually distinct constructs • Nomological validity extent to which scale correlates in theoretically predicted ways with other distinct but related constructs. • Also read Malhotra & Birks, 2007, 358-359 on – content (or face) validity, criterion (concurrent & predictive) validity
  • 13.
    Generalizability • Refers toextent you can generalise from your specific observations to beyond your limited study, situation, items used, method of administration, context..... • Hardly even possible!!!
  • 14.
    Fun time • Nowonto the data (COCB.sav) !!!!!! • Read my forthcoming JBR article for background on COCB and the scale • 1st SPSS and Cronbach alpha • Next, Amos and CFA • Followed by Excel to calculate composite/construct reliability and AVE, as well as establish discriminant validity
  • 15.
    Cronbach alpha (α) •SPSS (Analyze…Scale…Reliability Analysis) • α < 0.6 indicates unsatisfactory internal consistency reliability (see Malhotra & Birks, 2007, p.358) • α > 0.7 indicates satisfactory internal consistency reliability (Nunnally & Berstein,1994) Ref: Nunnally JC & Berstein IH. (1994) Psychometric Theory. New York: McGraw-Hill.
  • 17.
    SPSS output forα Alpha value for dimension Credibility = 0.894 > 0.7 hence satisfactory
  • 18.
    SPSS further outputfor α • We note that alpha value for the Credibility dimension would increase in value (from 0.894 to 0.902) if item cred4 is removed. • However, unless the improvement is dramatic AND there is separate reasons (e.g. similar findings from other studies), then we should leave the item as part of the dimension.
  • 19.
    Limitations for Cronbachalpha • We should employ multiple measures of reliability (Cronbach alpha, composite/construct reliability CR & Average Variance Extracted AVE) – Alpha and CR values often are very similar but AVE‟s can vary much more from alpha values – AVE‟s are also used to assess construct discriminant validity
  • 20.
    Composite/Construct Reliability • CR= {(sum of standardized loadings)2} / {(sum of standardized loadings)2 + (sum of indicator measurement errors)} • AVE = Average Variance Extracted = Variance Extracted = {sum of (standardzied loadings squared)} / {[sum of (standardzied loadings squared)] + (sum of indicator measurement errors)} • Note: Recommended thresholds: CR > 0.6 & AVE > 0.5, then construct internal consistency is evidenced (Fornell & Larker, 1981). Ref: Fornell, Claes and David G. Larcker (1981). “Evaluating Structural Equation Models with Unobservable Variables and Measurement Error,” Journal of Marketing Research, 18(1, February): 39-50.
  • 21.
    Discriminant validity • Discriminantvalidity is assessed by comparing the shared variance (squared correlation) between each pair of constructs against the minimum of the AVEs for these two constructs. • If within each possible pairs of constructs, the shared variance observed is lower than the minimum of their AVEs, then discriminant validity is evidenced (Fornell and Larker, 1981).
  • 22.
    Amos (Analysis ofMoment Structures) Comm comm2e2 1 comm1e3 11 Bene bene3e4 bene2e5 bene1e6 1 1 1 1 Cred cred3e8 cred2e9 cred1e10 cred4e11 1 1 1 1 1 COCB ave_SSI e12 ave_POC e13 ave_Voice e14 ave_wom e15 1 1 1 1 1 ave_BAoSF e16 1 ave_DoRA e17 1 ave_Flex e18 1 ave_PiFA e19 1 Loyalty loy1 e22 1 1 loy2 e23 1 loy3 e24 1 Rectangles = observed variables Ellipses = unobserved variables loy1; loy2; loy3; comm1; comm2;….; cred1; …. bene1;....;ave_PiFA = SPSS variables e1 to e24 = error variances = uniqueness Loyalty; Comm; Cred; Bene; COCB = latent factors = unobserved factors
  • 23.
    CFA and goodnessof fit • See Hair et al.‟s book • E.g., • The CFA resulted in an acceptable overall fit (GFI=.90, CFI=.94, TLI=.92, RMSEA=.068, and χ2=524.64, df=160, p<.001). All indicators load significantly (p<.001) and substantively (standardized coef >.5) on to their respective constructs; thus providing evidence of convergent validity.
  • 24.
    Refs • Baumgartner H,Homburg C. (1996). “Applications of structural equation modeling in marketing and consumer research: a review,” International Journal of Research in Marketing,13(2):139–61. • Churchill, Gilbert A., Jr. (1979). “A Paradigm for Developing Better Measures of Marketing Constructs,” Journal of Marketing Research, 16(1, February): 64-73. • Fornell, Claes and David G. Larcker (1981). “Evaluating Structural Equation Models with Unobservable Variables and Measurement Error,” Journal of Marketing Research, 18(1, February): 39-50. • Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham, and William C. Black (1998), Multivariate Data Analysis. 5th ed. Englewood Cliffs, NJ: Prentice Hall. • Nunnally JC & Berstein IH. (1994) Psychometric Theory. New York: McGraw-Hill.