The Standards of Learning (SOL) reform:
Real consequences require real tests

Lawrence H. Cross

    EDITOR'S NOTE: This article was written in response to an article in the fall 1999 issue of VIA.

Mr. Mark Christie, a member of the Virginia Board of Education, wrote an article for the fall 1999 issue of Virginia Issues and Answers ("Standards of Learning: Why Virginia's education reform is working," volume 6, number 2) in response to an earlier article of mine ("Are Virginia's public schools failing? Assessing the assessments," volume 6, number 1) that was critical of the Standards of Learning (SOL) assessments. Rather than address my criticisms of the SOL tests, Mr. Christie contends that what is more important about the SOL tests is not whether they are multiple-choice or some other format, whether they have more items or fewer, but the fact that they count, they have real, not sham, consequences [Emphasis in the original].

I cannot help but wonder if Mr. Christie, a lawyer, would embrace such a cavalier attitude about the Virginia Bar Examination or medical tests that also have real, not sham, consequences. Contrary to Mr. Christie's assertion, the more important the decisions to be made from tests, the more important it is to have confidence in the tests. Make no mistake about it, the SOL tests are being used to make important decisions about public schools and the children they serve, but the reliability and validity evidence cannot support the intended uses of the SOL test scores.

Short tests, sham inferences

The SOL tests are simply too short to instill confidence in using their scores for diagnostic purposes or for holding students and schools accountable for performance on them. The SOL tests range in length from 30 to 63 multiple-choice questions and ostensibly measure proficiency in SOLs spanning one to four years of instruction. They are administered only in grades 3, 5, and 8 and in high school as end-of-course tests in English, math, science, history, and social studies.

Nonetheless, the SOL tests are described as criterion-referenced tests that can inform parents, teachers, and the public about what it is that students know and can do with regard to the highly touted, but often criticized SOLs. They cannot. With so few questions available to measure so many, often elaborate SOLs, it is impossible to reference the scores to specific skills or even to the standards themselves.

Accordingly, it is misleading for the proponents to refer to these tests as criterion-referenced tests that offer greater value for accountability and diagnosis than norm-referenced achievement tests. Unlike norm-referenced tests that report performance in terms of age and grade level norms as well as local, state, and national percentile ranks, performance on the SOL tests is reported as less than proficient, proficient, or advanced. Because the scores are not referenced to the SOLs, these classifications are a sham. All that such classifications indicate is whether student scores are above or below the "bars" arbitrarily set at unrealistically high levels by the Board of Education in its zeal to lead the nation in adopting high and rigorous standards.

Using passing rates on these or any tests for school accountability without considering the challenges faced by the schools and the children they serve is also indefensible. In his 1999 presidential address to the National Council on Measurement in Education, Professor Edward Haertel noted, "Inferences from test scores to quality of schooling are problematic and must depend a great deal on contextual information" (Haertel, E. H. [1999], "Validity arguments for high stakes testing: In search of the evidence," Educational Measurement: Issues and Practice, 18, 4, pp. 5-9). The folly of this invalid use of test scores is evidenced by the fact that the percentage of public schools that achieved the 70 percent passing rates--the level required for accreditation in 2007--increased from 2.2 percent in 1998 to only 6.5 percent in 1999. Faithful implementation of the SOLs over the next six years may help, but many schools and students will require a miracle.

A modest proposal

I had hoped that the technical concerns that I and many others had raised about the SOL tests and their intended uses would have caused our leaders in Richmond to acknowledge the severe limitations of the SOL tests and to abandon this test-driven reform of public schools. I was wrong. In January 2000, Education Secretary Wilbert Bryant told a senate committee that the SOL reform will succeed, despite "well organized opposition" from a "vocal minority" of critics "that will go the way of the dinosaur" ("Education secretary trumpets SOLs' virtues," Roanoke Times, January 14, 2000). In February 2000, Governor Gilmore reaffirmed his commitment to "keeping Virginia's academic standards and accountability a model for the nation" ("Gilmore rejects easing of school accreditation rules," Washington Post, February 9, 2000).

In the spirit of "if you can't beat ‘em, join ‘em," I offer two suggestions that I believe will improve Virginia's accountability program and make more judicious use of public funds, if not make ours a model for the nation. Specifically, I recommend that accountability be limited to testing the three Rs and testing be instituted at every grade using one of the well established national standardized achievement tests such as the Iowa Tests of Basic Skills, the California Achievement Tests, or the Stanford Achievement Tests.

Why only the three Rs?

A major impetus for the SOL reform movement, as explained by Mr. Christie, is the longstanding perception among employers that high school graduates lack basic skills in reading, writing, and arithmetic. No mention is made of earth science, technology, or even history. Mr. Christie also suggests that the ability to read and write the English languages "is a test-taking skill that will benefit each and every child immeasurably" (emphasis added). For students who do not have this "test-taking skill," what sense does it make to hold them responsible for achieving high and rigorous standards in challenging subjects? If we must have test-driven education reform in Virginia, we would do well to focus on the three Rs and not get bogged down in debates about whether Thurgood Marshall and Colin Powell should be included in the history and social science standards.

Advantages of a standardized test battery

Standardized achievement tests have been with us for generations, and while they, too, depend largely on multiple-choice items, considerable evidence suggests that they measure the three Rs quite well. These test batteries are available in multiple forms at every grade at far less cost than the custom-made SOL tests that are available in only one form for selected grades. Alternate forms could be used for diagnostic purposes at the beginning of each school year.

The full-length versions of these achievement batteries contain tests of the three Rs that are markedly longer than the SOL tests. For example, the eighth-grade SOL test of arithmetic contains 50 questions to assess SOLs spanning grades 6, 7, and 8. In contrast, the eighth grade arithmetic test in the Iowa Tests of Basic Skills contains 135 questions just for the eighth grade. And test length does matter! Not only is the test able to sample more adequately the skills and knowledge to be tested, but a fundamental fact about test scores is that reliability increases with test length. If absolute performance standards are desired, committees of teachers could be convened to establish benchmarks for proficient and advanced standing on these tests as was done for the SOL tests. The availability of national and state percentile ranks will allow the public to see just how reasonable, or ridiculous, these benchmarks really are. Moreover, use of national tests will permit comparisons of Virginia students with students in other states.

Implications for accountability

Testing at each grade level will make it possible to track student achievement from one year to the next. This will allow parents, educators, and the public to judge how much progress a child or a school has made. Under the present system, where SOL test scores measure achievement across several school years, it is not clear who is accountable. With testing at every grade level, accountability can be based on an improvement model rather than the current model that requires all students and schools to attain the same uniformly high performance standards that are unrealistic for many.

Waiting until the third grade to impose high-stakes tests is too late to help those having difficulties with the three Rs. And when children with deficiencies are identified in the first grade or before, our leaders in Richmond should be held accountable to do all they can to help these students, their teachers, and their schools, rather than propose policies that blame the teachers, retain the students, and "dis" accredit the schools. The deeply rooted social and economic problems associated with poor test performance cannot be resolved by shaming the victims or the schools.


Lawrence H. Cross is a professor of educational research and evaluation at Virginia Tech and has conducted funded research on assessment, testing, and grading practices, including the first statewide validation and standards-setting study of the National Teachers Examination in Virginia. He has been the technical coordinator for contracts with the National Board of Professional Teaching Standards at the University of North Carolina at Greensboro, a research and evaluation specialist for the National Regional Resource Center of Pennsylvania, and a consultant to a number of school districts and state departments of education. He is a member and former president of the Virginia Educational Research Association, which selected him to receive its Charles E. Clear Award for Outstanding Research in Education and its Certificate for Distinguished Research in Education for past research.

SPRING 00 VIA