By Richard Capone, CEO/Co-Founder of Let's Go Learn, Inc.
Here is the short answer:
Not all the time. In the case of benchmark-based tests, validity and reliability are based on how well the test works with large numbers of students. So if the test is consistent when the data is averaged for 10,000+ students, it will be declared valid and/or reliable. But diagnostic at the individual student level requires the test to be able to make conclusions about one single student! Generally, assessments used in special education and assessments like DORA, ADAM, and DOMA can do this, but few others can.
Here is the long answer:
In order to answer this question, I am going to quickly define validity and then move on to explain diagnostic.
First, validity* means whether or not the claims of the assessment can be believed. When a test makes a claim about a student, is it essentially trustworthy? Often, as purchasers of tests, we look to the research, and if the research data says the assessment is valid, we accept this statement. But validity means a lot more. If I take a middle school English language learner and test him in fluency, and the test comes back saying the student has poor reading skills, can this be believed? In the past, educators across the nation said “yes” when the assessment was a “valid” test like DIBELS. But if we dig deeper, we realize this may not be the case. Fluency was determined to be a skill that predicted reading success. But the case of a student who is not a native English speaker is not the norm. If this student studied English for multiple years but simply lacked practice in speaking, he or she would test high on a comprehension assessment but low on a fluency test. So the prediction that this student is a poor reader would be wrong. I saw this in an entire school of students a few years ago. All the students came back with low “reading” scores based on Scholastic’s SRI assessment. But when these students took our DORA assessment, they were all high comprehension readers. The reason turned out to be that these were all exchange students from Hong Kong who studied English for 4 to 6 years. But they all lacked practical vocabulary mastery and thus could not comprehend grade-level reading materials. So in conclusion, what is actually being tested also determines whether it is “valid” for a particular testing situation. Just because an assessment is a “reading” assessment doesn’t mean it will work for a particular assessment situation.
We see claims that assessments are diagnostic all over the K-12 U.S. market. Virtually every benchmark test or test-prep assessment makes the claim that it is formative and diagnostic. This is incorrect and in many cases an unethical statement to make. These publishers state that their assessments are valid and reliable, but the dirty little secret is that their research data is based on examining large numbers of students and making conclusions about these large numbers. So when they make a “diagnostic” prediction about an individual student, the claim is only valid if the student is the “average” student who fits the average student mold. Diagnostic results on individual students must be based on the assumption that students are not the norm. A true diagnostic assessment cannot make assumptions. Let me give you an example of this incorrect use of testing data. If someone with fair skin went to her dermatologist regularly to test suspect patches of skin, the dermatologist could say, “You know, 90% of the patches of skin I test don’t have cancer. Therefore, from now on I’m going to take samples of your skin but I won’t send them to the lab. I will just conclude that they are negative, since statistically I would have a 90% chance of being correct. How does this sound to you?” You of course would fire the dermatologist and tell him to go somewhere very, very hot.
Well, this is what usually happens when you take an assessment that was designed as a benchmark or test-prep test and try to use it to diagnose students. Just because a report show breakouts in multiple areas of reading or mathematics doesn’t mean it can reliably draw those conclusions. Valid benchmark tests may be valid when the data is rolled up at the school level, but when the same test says a student is low in phonics or comprehension, you won’t know if it is accurate.
So what can you do to find out whether an assessment will meet your diagnostic needs? One good suggestion for evaluating an assessment is to administer at least 20 tests to kids with different levels of skill: high, low, medium. Also, give some students the test two times to see if the results are consistent (“test-retest”). Usually, shorter tests that don’t administer enough items in specific areas will show greater changes. One day the student is very high in skill A and the next day 2 years below! Also, use some common sense. For instance, a diagnostic assessment in reading can’t really be shorter than 30 minutes unless the student is at the extreme range of abilities--just starting to read or very advanced and able to max out very quickly.
* Validity relates to the question of whether a test assesses what it claims or intends to assess. It deals with whether or not an assessor's findings correspond to some form of objective reality. The data collected during an assessment must in some way accurately reflect the actions being assessed. To the extent that this is so, the assessment is valid. Here is an article from Illinois State University that has a good definition of validity and reliability (View article)