Reliability and Validity
Sean P. Gyll, Ph.D. of JUNTORESEARCH conducted an independent 3rd-party reliability and validity analysis of BlueEQ. The results were impressive. See Dr. Gyll’s full analysis below:
Analysis
Item analysis was conducted on 150 items across five (5) skill domains and 25 dimensions; each skill domain contained five (5) dimensions; each dimension contained six (6) items. At this juncture, analysis of item-level data involved the computation and examination of any statistical property of candidates’ responses to an individual test item. Item parameters examined fell into two categories:
1. Inter-scale correlation: describes the degree of relationship between responses to the item and assessment scale scores.
2. Item reliability: a function of item variance and relationship to assessment scores.
Each of these measures was observed in conjunction with the distribution of total scores to arrive at decisions regarding item and scale validity.Inter-scale correlation indicates the correlation, or degree of relationship between the skill domain/dimension score and total score. Larger correlations indicate that performance on the skill domain/dimension accurately discriminates between low-scoring and high-scoring candidates (i.e., candidates with a high score on the skill domain/dimension tended to receive higher overall scores than candidates with a low score). Correlations above .30 are generally considered acceptable. Please refer to Figures 1 and 2.
Figure 1. Inter-scale correlations for skill domain scores. (* p<.05, ** p<.01)
Figure 2. Inter-scale correlations for dimension scores. (* p<.05, ** p<.01)
Cronbach’s alpha is a measure of internal consistency, which estimates the reliability of all possible split-halves. A value of zero (0) suggests that the assessment measures 0% of a candidate’s true score whereas a value of one (1) implies that the assessment measures 100% of their true score. Reliability for BlueEQ was .91 across all 150 items. Skill domain reliability estimates ranged between .58 – .81, as shown in Figure 3, below 1.
Figure 3. Reliability by skill domain.
1Reliability estimates above .70 are generally considered acceptable for overall assessment scores; estimates above .60 are generally considered acceptable at the dimension level.