Abstract

Objective: This study was aimed at determining the external validity of the psychometric properties of a two-factor Collegiate Learning Assessment Performance Task Diagnostic Instrument (CLAPTDI) for use in assessing learning skills among predominantly black college students. The construct validity of the two factors CLAPTDI had been established in a previous study exploratory and confirmatory factor analyses. Establishing the external validity involved conducting a multi-group test of the measurement instrument’s factorial scores equivalence across panels of lower class and upper class students from a predominantly black college. Method: The study relied on a strict test of equivalence categorization by focusing on tests for invariance across the two groups with respect to factor loadings, intercepts, and error factor loadings by estimating the difference in chi-square goodness-of-fit statistic and comparative fit index (CFI). Sets of measurement and structural parameters were put to the test in a logically ordered and increasingly restrictive manner. Results: The analyses found that the CLAPTDI scale’s factorial measurement structure was invariant across lower class and upper class PBC students. Conclusion: The collegiate learning assessment performance task diagnostic instrument with two latent factors and five observed variables is a valid measurement scale for assessing the level of analytic reasoning and problem solving learning among predominantly black college students.

Keywords: Factorial equivalence, Collegiate learning assessment, CLA, Collegiate learning assessment Task diagnostic instrument, CLAPTDI, Confirmatory factor analysis, Multi-group invariance, AMOS.

Received:18 April 2017 / Revised: 12 May 2017 / Accepted:25 May 2017/ Published:5 June 2017

Contribution/ Originality

The study is one of very few studies which have investigated the external validity of the psychometric properties of a two-factor Collegiate Learning Assessment Performance Task Diagnostic Instrument (CLAPTDI) for use in assessing learning skills among predominantly black college students.

1. INTRODUCTION

The Collegiate Learning Assessment Performance Task Diagnostic Instrument (CLAPTDI) is an assessment tool used nationwide in the United States to measure the contribution of an educational institution to learning gained by its students (Classroom Academy, 2008). The CLAPTDI measures a student’s ability to perform cognitively demanding tasks from which quality of responses are scored on a 4-point scale ranging from 0 =Not Attempted to 4=Mastering (Classroom Academy, 2008). The tool is considered to be better than standardized test scores, grade point averages and course test scores in assessing students’ learning outcomes (AAC and U, 2005; AASCU, 2006) as well as effective in promoting a culture of evidence-based assessment in higher education (AAC and U, 2005; Arum et al., 2008). Unlike traditional assessment instruments that rely on multiple choice items to measure the responses of the study participants, CLAPTDI utilizes open-ended prompts requiring constructive responses to measure high order thinking skills such as, critical thinking, analytic reasoning, written communications, and problem solving (Arum et al., 2008; Classroom Academy, 2008). However, as any diagnostic instrument, the utility of the CLAPTDI as a gauge in determining student learning depends on its validity (both internal and external) in measuring student learning. While the CLA seems quite promising in assessing student learning, some scholars have raised a number of methodological issues about this approach and the CLAPTDI’s ability to effectively capture a student’s learning (AAC and U, 2005; Banta and Pike, 2007; Klein et al., 2007; Shavelson, 2007). Perhaps the most serious issue involved the validity of the psychometric properties used to measure the major constructs (i.e., critical thinking, analytic reasoning, written communications, and problem solving) the CLAPTDI. An extensive review of the literature reveals that despite its widespread use in colleges and universities across the United States, very few studies have focused attention to validating the CLAPTDI. To be sure, only one study to date has examined the psychometric properties of the CLAPTDI (Mongkuo et al., 2013). The study was, however, limited to using Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CLA) to assess the internal validity or stability of the key latent constructs of the CLAPTDI. For the instrument to have a broader universal use, it is necessary that the validation process be extended to establishing the external validity of its psychometric properties. Byrne (2010) suggest that to do so require testing the factorial equivalence of the CLAPTDI across groups. The purpose of this study was to address this external validity void by extending the CLAPTDI validation process to performing a multi-group factorial equivalence test of the CLAPTDI. In particular, this study extends Mongkuo and colleagues (Mongkuo et al., 2013) CLATDI validation process to determining the extent to which the instrument is equivalent across lower class and upper class predominantly black college students. To do so, the study addressed the following research question: Is the factorial structure of the CLAPTDI scale equivalent across lower class and upper class predominantly black college student? Providing empirically-grounded answer to this question generally involves testing hypotheses related to multi-group invariance of a single measurement scale across two different panels of PBC students. According to Joreskog (1971) this test for equivalence begin with a global test of the equality of covariance structures across the groups of interest. The null hypothesis (H0) for the test is Σ1 = Σ2 = ⋯ ΣG, where Σ is the population variance–covariance matrix, and G is the number of groups. Rejection of the null hypothesis argues for the nonequivalence of the groups and, thus, for the subsequent testing of increasingly restrictive hypotheses in order to identify the source of nonequivalence. On the other hand, if H0 cannot be rejected, the groups are considered to have equivalent measurement and covariance structures and, thus, tests for invariance are not needed.

2. METHODS

2.1. Research Design

The study use a pre-experimental one-shot case study design (Leedy and Ormod, 2010). Schematic representation of the design is as follows:

Fig-1. Pre-experimental one-shot case study design

Source: Leedy and Ormod (2010).

Where X is exposure of a predominantly black student to high school and/or college core curriculum courses. O2 is the level of a student’s learning abilities (that is, critical thinking/analytic reasoning, problem solving, persuasive writing, and writing mechanics).

2.2. Participants and Procedure

Participants in the study included a purposive, convenience sample of students attending a predominantly black college in southeastern United States. The college has a population of 5,567 students enrolled. A breakdown of the population by race/ethnicity shows that approximately 70% is black or African American, 17% is Caucasian, 4% is Hispanic, 1% is Native American and 4% is other racial/ethnic groups. The age distribution of the student population consists of 55% in the age range of 17-25 years old, 31% aged 26-40 years, and 14% is over 40 years. Most of the students (68%) are females, while 32% is males. The distribution of the population by academic class shows that 19% is freshmen, 15% is sophomore, 18% is junior, 32% is senior, and 11% is post-bachelors. Most of the students (66%) attending the university are enrolled as full-time students, while 34% are part-time. The distribution of the student population by academic class shows that 43% are lower class (Freshman and Sophomore) students and 61% is upper class (juniors, seniors and graduates) students.

However, the CLA conducted at this institution does not focus on the level of student learning by demographics beyond academic class. Instead, the institution requires freshman; rising junior, and senior students to take the CLA as an integral part of the overall university strategic plan for determining the level of student learning at each academic level. In particular, all incoming freshmen are required to take the CLA as a baseline measure of learning ability upon entering the university. Those same students are tested again as rising juniors to assess any increase in skill levels and ability. Finally, that same group of students is tested as graduating seniors so that the test scores at all levels can be compared to ensure that program learning outcomes are being met. The data generated from the CLA is used by university administrators to identify areas of learning strengths or deficiencies in order to designing effective corrective action plan to improve or maintain acceptable retention and graduation rates. Based on this requirement, the population for this study was delimited to students who had taken to CLA during their freshman, junior and senior years only. University records show that in academic year 2013-2014, a total 764 students had taken the CLA in all three years. The participants in this study consisted of a random sample of 320 students obtained from the University’s CLA Data File who took the CLA Performance Diagnostic Task during their freshman, sophomore, junior, and senior years. After data screening and deletion of cases with excessive missing values, the actual sample used for the study was 253 students, representing 79% participation rate.

2.3. CLA Measures

The CLA Performance Task Diagnostic Instrument (CLAPTDI) consisted of eight items aimed at measuring four interrelated higher order thinking abilities or skills – critical thinking/ analytic reasoning, problem solving, persuasive writing, and writing mechanics (Classroom Academy, 2008)

Critial thinking/Analytic reasoning. Critical thinking/analytic reasoning skill was measured by the following two items scored on a 4-point scale ranging from 0=not attempted to 4=mastery: (a) How well does the student assess the quality and relevance of evidence in terms of determining what information is or is not pertinent to the task at hand, distinguishing between rational claims and emotional ones, facts from unsupported opinion, recognizing the ways in which the evidence might be limited or compromised; sporting deception and holes in the argument of others, and considering all sources of evidence; and (b) How well does the student analyze and synthesize data and information, including; presenting his/her own analysis of the data or information rather than “as is”; recognizing and avoiding logical flaws such as distinguishing correlation from causation; breaking down the evidence into its component parts; drawing connections between discrete sources of data and information; and attending to contradictory, inadequate or ambiguous information.

Problem solving. Problem solving skill was measure by two items scored on a 4-point scale ranging from 0=not attempted to 4=mastery: (a) How well does the student form a conclusion from his/her analysis, including, constructing cogent arguments rooted in data/information rather than speculation/opinion, selecting the strongest and most relevant set of supporting data, avoiding overstated or understated conclusions, and identifying holes in the evidence and subsequently suggesting additional information that might resolve the issue; (b) How well does the student consider other options and acknowledge that his/her answer is not the only perspective, including, recognizing that the problem is complex with no clear answer, proposing other options and weighing them in the decision, considering all stakeholders or affected parties in suggesting a course of action, and qualifying responses and acknowledging the need for additional information in making an absolute determination.

Persuasive writing. Persuasive writing was measured by two items scored on a 4-point scale ranging from 0=not attempted to 4=mastery: (a) How effective is the writing structure in terms of logical and cohesive organization of the argument, avoidance of extraneous elements in the argument’s development, and presentation of evidence in an order that contributes to a persuasive and coherent argument; (b) How well does the student defend the argument in terms of effective presentation of the evidence in support of the argument, drawing thoroughly and extensively from available range of evidence, analysis of the evidence in addition to simply presenting it, and considering counter-arguments and addressing weaknesses in his/her own argument.

Writing mechanics. Writing mechanics was measured by two items scored on a 4-point scale ranging from 0=not attempted to 4=mastery: (a) How clear and concise is the argument in terms of clear articulation of the argument and the context for the argument, correct and precise use of evidence to defend the argument, comprehensible and coherent presentation of evidence, and citation of sources correctly and consistently; (b) What is the quality of the student’s writing in terms of using vocabulary and punctuation correctly and effectively, demonstrating a strong understanding of grammar, using sentence structure that is basic or more complex and creative, using proper transition, and structuring paragraphs logically and effectively.

2.4. Data Analysis

The statistical test for factorial and structural invariance or equivalence involved a series of hierarchical analyses using AMOS 24.0 (Arbuckle, 2016). Following Joreskog (1971) guidelines, the test began with a determination of the CLAPTDI baseline model (with no between-group constraints) for each group of PBC students separately. The model is one that best fit the data in terms of parsimony and substantive meaningfulness (Byrne, 2010). Generating this best fitting model was accomplished by performing a first-order confirmatory factor analysis (CFA) of the four-factor CLAPTDI. Following completion of this preliminary task, test for the equivalence of parameters were conducted across the two groups of students at each of several increasingly stringent levels beginning with the scrutiny of the measurement model. In particular, patterns of factor loadings for each observed measure was tested for its equivalence across the groups. Once it was known which measure was group-invariant, these parameters were constrained equal while subsequent tests of the structural parameters were conducted. As subsequent new sets of parameters were tested, those known to be group-invariant were cumulatively constrained equal. Thus, the process of determining nonequivalence of measurement and structural parameters of the CLAPTDI parameters across groups involved the testing of a series of increasingly restrictive hypotheses.

If the model fit the data well for both groups of young adults, it will be maintained as the hypothesized model in the test for equivalence across the two groups of young adults. If the model exhibit a poor fit to the data for each group of young adults, it will be modified accordingly and become the hypothesized multi-group model under test. Because the estimation of the baseline model involves no between-group constraints, the data was analyzed separately for each group. However, in testing for invariance, equality constraints was imposed on particular parameters and, thus allowing for the data for the two groups to be analyzed simultaneously to obtain efficient estimates. In essence, the model being tested here, commonly termed the configural model (Hornes et al., 1983) is a multi-group representation of the baseline models because it contained the baseline models of lower class and upper class students within the same file. Hence, we tested for configural invariance. Because no equality constraints were imposed on any parameters in the model, no determination of group differences related to either the items or the factor convariance could be made. Such claim was derived from subsequent tests for invariance. In testing for invariance, the fit of the configural model provided the baseline value against which all subsequently specified invariance models were compared.

Given that this model comprised the final best-fitting baseline model for each group, it was expected that results will be indicative of a well-fitting model. However, Byrne (2010) notes that despite evidence of good fit to the multi-sample data, the only information that we have at this point of the test is that the factor structure is similar, but not necessarily equivalent across groups. Given that no equality constraints are imposed on any parameters in the model, no determination of group differences related to either the items or the factor covariances could be made. Despite the multi-group structure of this and subsequent models, analyses yield only one set of fit statistics for overall model fit. Using ML estimation, the χ2 statistics were summative and, thus, the overall χ2 value for the multi-group model was equal to the sum of the χ2 values obtained when the baseline model was tested separately for each group of students (Byrne, 2010).

A number of indices were used to evaluate the goodness of fit of the two-factor orthogonal CLAPTDI configural model. The models absolute fit was assessed using chi-square (χ2) statistics, with low χ2 considered good fit (Joreskog, 1971). Incremental fit was evaluated using the Root Mean Square Errors of Approximation (RMSEAs) with a value less than 0.06 indicating a relatively good fit, along with Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) with values of .95 or greater considered desirable (Joreskog, 1971; Hu and Bentler, 1999; Brown, 2006; Blunch, 2010; Hair et al., 2016). Assessing invariance involved comparing the goodness-of-Fit for the configural model to the constrained measurement and structural model, with evidence of non-invariance claimed if the χ2 difference (Δχ2) value is statistically significant (Joreskog, 1971; Byrne, 2010) and/or the CFI difference (ΔCFI) is less than .01 (Joreskog, 1971; Cheung and Renwsvolt, 2002). Assessing multi-group invariance involved comparing the goodness-of-Fit for the configural model to the constrained measurement and structural model, with evidence of non-invariance claimed if the χ2 difference (Δχ2) value is statistically significant (Joreskog, 1971; Leedy and Ormod, 2010) and/or the CFI difference (ΔCFI) is less than .01 (Cheung and Renwsvolt, 2002).

Normality of the distribution of the variables in the model was assessed by Mardia (1970); Mardia (1974) normalized estimate of multivariate kurtosis with a value of 5 or less reflexive of normal distribution. Multivariate outliers were detected by computation of the squared Mahalanobis distance (D2) for each case with D2 values standings distinctively apart from all the other D2 values as indicative of an outlier (Tabachnick and Fidell, 2007; Mertler and Vannatta, 2013).

3. RESULTS

Preliminary first-order CFA of the CLATPDI identified a hypothesized model with two latent constructs: analytic reasoning/critical thinking with three observed variables and problem solving with two observed variables, respectively (see Figure 2 below).

Figure-2. Hypothesized model of 5-item CLAPTDI structure for Lower Class and Upper Class Students

Table 1 displays the goodness-of-fit test results for the CLATPDI multi-group invariance. The results of the multi-group model testing for the configural invariance reveal the χ2 value to be 19.199 with 8 degrees of freedom. The CFI and RMSEA values are .991 and .022, respectively. From this information, we conclude that the hypothesized multi-group configural model of the CLATPDI structure is well fitting across lower class and upper class PBC students. The results of the goodness-of–fit statistics for the measurement model shows the fit to be fairly consistent with the configural model (CFI = .982; RMSEA = .022). The test for factor loadings invariance reveals a non-significant χ2 difference between the configural model and the measurement model (Δχ2 (8) = 19.899, p > .01), and a CFI difference of .001. Thus, these results provide evidence of factor invariance between lower class and upper class PBC students for the measurement model of CLATPDI scale. The results of the test for structural invariance shows the factor convariance to be equivalent across lower class and upper class PBC students (Δχ2 (3) = 21.238, p > .01), ΔCFI = >.01).

Table-1. Summary of Goodness-of-fit Statistics for Tests of CFA Multi-group Invariance

Model Description	Comparative Model	χ2	df	Δχ2	Δdf	Sig	CFA	ΔCFA
*Phase I:* Baseline model fit for each academic class student
Lower Class Students	-	24.836 4	-	-	.001	.984	-
Upper Class Students	-	8.102 4	-	-	.088	.975	-
*Phase II:* Factorial invariance across student academic class groups
1. Configural Model: No constraint Imposed	-	19.899 8	-	-	NS	.991	-
2. Measurement Model: All factor Loadings Constrained equal	2 versus 1	22.610 11	2.711	3	NS	.992	.001
3. Structural Mod Model B with Covariance among AR and PS Constrained equal	3 versus 1	41.137 14	21.238	6	S	.980	.01

Notes: χ2 value between models; Δdf = difference in number of degrees of freedom between models; ΔCFI = difference in CFI values between models; AR = Analytic reasoning; PS = Problem Solving.

3.1. Root Mean Square Error Approximation (RMSEA)

Lower class baseline model: .035

Upper class baseline model: .027

Model 1 (Configural): .022

Model 2 (Measurement): .024

Model 3 (Structural): .023

As reported in Table 1, the multi-group test of the CLATPDI yielded evidence of factorial invariance of the measurement model (Measurement Model: Δχ2 (3) = 2.711, p >.01, ΔCFI of .001); and the Structural model (Δχ2 (6) = 238, p >.01, ΔCFI of .01).

4. CONCLUSION

This study was aimed at assessing the factorial invariance of the psychometric measurement and structure of the Collegiate Learning Assessment Performance Task Diagnostic Instrument (CLAPTDI) across lower class lower class and upper class students attending a predominantly black college (PBC). The study was second in a series of studies aimed at developing a valid measurement scale for assessing the contribution of the college curriculum to students’ analytic reasoning, critical thinking and problem solving skills. The first step in establishing the external validity of the CLAPTDI involved conducting a multi-group test the equivalence of the factorial structure of the measurement scale across to panels of students: lower class and upper class students. In testing for invariance across the groups, sets of parameters were put to the test in a logically ordered and increasingly restrictive manner. The study relied on Meredith (1993) strict test of equivalence by focusing on tests for invariance across the groups with respect to factor loadings, intercepts, and error factor loadings. The invariance of these parameters across the two groups of students was tested by estimating the chi-square goodness-of-fit statistic and comparative fit index (CFI). The analyses found that the CLAPTDI scale’s factorial measurement structure were invariant across lower class and upper class predominantly black college students, thus, confirming the external validity of the scale.

This study had a limitation that should be noted. The study did not cross-validate the CLATPDI by replicating the factorial structure of the CLATPDI across independent samples drawn from the same predominantly black college student population. Future studies should extend the factorial invariance test to cross-validation of independent samples of predominantly black college students. With regards to contribution to future research, it is important to note that while this study has established the validity of the CLATPDI for use in assessing student learning in a predominantly black college setting, preliminary confirmatory factor analysis reduced the number of constructs of the original CLATPDI and their corresponding observed variables from five latent constructs to two valid latent constructs. We named the first latent construct “analytic reasoning/problem solving” measured by three observed variables (drawing conclusions, evaluating evidence, and persuasive writing), and named the second construct “critical thinking” measured by two observed variables (written mechanics and persuasive writing). All the observed variables or items of the 2-factor CLATPDI were scored on a 4-point scale ranging from 0=not attempted to 4=mastery. Hence, we recommend that the future assessment of learning among predominantly black college students using the CLATPDI should be delimited to determining the level of critical thinking/analytic reasoning and problem solving.

Funding: This study received no specific financial support.

Competing Interests: The authors declare that they have no competing interests.

Contributors/Acknowledgement: Both authors contributed equally to the conception and design of the study.

REFERENCES

AAC and U, 2005. Liberal education outcomes. Washington, DC: Association of American Colleges and Universities.

AASCU, 2006. Value-added assessment perspectives. Washington, DC: American Association of State Colleges and Universities.

Arbuckle, J.L., 2016. IBM SPSS AMOS 24 user's guide [Computer Software and Manual]: New York:NY; IBM.

Arum, R., J. Josipa and M. Velez, 2008. Learning to reason and communicate in college: Initial report of findings from the CLA longitudinal study. New York: The Science Research Council.

Banta, T.W. and G.R. Pike, 2007. Revisiting the blind alley of value-added. Assessment update. Bloomington, IN: National Survey of Student Engagement, 19(1).

Blunch, N.J., 2010. Introduction to structural equation modeling using SPSS and AMOS. Thousand Oak, CA: Sage Publication.

Brown, T.A., 2006. Confirmatory factor analysis for applied research. New York: Guildford Press.

Byrne, B.M., 2010. Structural equation modeling with AMOS: Basic concept, applications, and programming. 2nd Edn., New York, USA: Taylor & Francis Group, Routledge.

Cheung, G.W. and R.B. Renwsvolt, 2002. Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2): 233-255. View at Google Scholar | View at Publisher

Classroom Academy, 2008. Diagnostic scoring faculty handbook. New York: Collegiate Learning Assessment.

Hair, J.F., W.C. Black, B.J. Babin, R.E. Anderson and R.L. Tatham, 2016. Multivariate data analysis. Upper Saddle River, N.J: Pearson Prentice Hall.

Hornes, J.L., J.J. McArdle and R. Mason, 1983. When is invariance not invariant: A practical scientist’s look at the ethereal concept of factor invariance. Psychologist, 1(1): 179-188. View at Google Scholar

Hu, L. and P.M. Bentler, 1999. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1): 1-55. View at Google Scholar | View at Publisher

Joreskog, K.G., 1971. Simultaneous factor analysis in several populations. Psychometrika, 36(4): 409-426. View at Google Scholar | View at Publisher

Klein, S., R. Benjamin, R. Shavelson and R. Bolus, 2007. The collegiate learning assessment: Facts or fantasies. Evaluation Review, 31(5): 415-439. View at Google Scholar | View at Publisher

Leedy, P.D. and J.E. Ormod, 2010. Practical research: Planning and design. Upper Saddle River: Pearson Publishers.

Mardia, K.V., 1970. Measures of multivariate skewness and kurtosis with applications. Bikometrika, 57(3): 519-530. View at Google Scholar | View at Publisher

Mardia, K.V., 1974. Application of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya, B, 36(3): 115-128. View at Google Scholar