Standardised testing as a strategy has worked to tame the sheer complexity and uncertainty of the process of education for as long as it has been around. It does so by using the test scores as proxies of learning achievements. These tests appeared in the early 20th century in the form of intelligence testing and primarily served the needs of military recruiters. But it is in the last two decades that they have become quite widespread and pervasive.
This has happened in part due to advancements in information and computing technologies that have made possible collection and processing of large amounts of data. But more importantly, they have gained influence as part of a larger package of reforms usually referred to by education scholars as the Global Managerial Education Reforms (GMERs). These reforms use testing together with a potpourri of market and managerialist policy solutions together with such ideas as choice, competition, incentives, and accountability.
The origins of the GMERs can be loosely located in the US in mid1980s. In 1983, the Reagan administration appointed a commission called the National Commission for Excellence (NCEE) to take stock of the state of education in the US and make recommendations for its improvement. The report of NCEE, entitled A Nation at Risk painted a doomsday scenario and warned that US was fast losing its economic preeminence to other competitors due to the falling quality of American public schools. It described American schools as faltering in their mission and urgently called for a nation wide movement to raise the quality of education in public schools.
Ever since the publication of A Nation at Risk, the talk of standards and accountability has filled the air. The standards are set and tests administered frequently to assess if they have been met. The evidence generated by tests constituted what came to be known as ‘achievement gaps’, between different racial and socio-economic groups as a central policy concern, ultimately paving the way for the famous No Child Left Behind (NCLB) act passed by the Bush administration in 2001. NCLB took the stakes associated with the standardised tests to new heights. According to some US-based scholars, ‘gap gazing’ became the fetish of education research in the US and beyond. While some reformers supported testing and accountability based on them, others opposed them. A controversy around the GMERs followed in the Western countries in the wake of these debates.
Of course, like many other ‘travelling reform ideas’, the discourse-practice of standardised testing crept surreptitiously into the global education discourses as part of the GMERs. But as it happens usually in the case of such travelling ideas, the ‘standardised tests’ were stripped of the debate/controversy surrounding them in the US. In Pakistani context, the meaning of standardised tests also changed as some began to use the term for any large-scale test irrespective of whether it has been developed through a ‘standard’ procedure or not.
While the GMERs in general need some scrutiny, in this article I restrict myself to a discussion on the perception about standardised and large-scale tests as means to hold the system accountable and measure progress toward ‘reforms’? This discussion is important since they sound immediately attractive to policymakers and other stakeholders. We hear plenty about their advantages but it is important to recognize their cons as well. In fact, the entire field of psychometrics owes itself to the attempts to find a way out of the formidable problems posed by the use of standardised tests to allow fair comparisons between individual children and between groups of children.
To be sure, I am not against testing per se but only worry about the consequences of using it for teacher accountability. The existing evidence suggests that learning-deprivation can continue to coexist with the best of standardised testing regimens. When used for accountability purposes by raising their stakes, such tests can have unintended consequences. To ignore such consequences in general, including how good teaching is converted into learning gains, can hardly be an adequate basis for an acceptable evaluative system.
When the same test is given to a large number of students in the same grade, it is assumed that the test scores will allow for comparisons of various kinds. Does this assumption make sense in the context of Pakistan? I think it does not stand some very preliminary scrutiny primarily due to the enormous, almost intractable, diversity in Pakistan. Pakistan is afflicted by the worst of inequities in the distribution of capabilities across its population. The contextual factors (both inside and outside of the school) that have an effect on learning vary significantly across different groups of children. So the same test being given to people located at different points on any interpersonal comparison of well-being can potentially produce very different results.
If different test takers taking the same test are on vastly different locations on measures of socio-economic well-being, then it does not seem very defensible to rely on the test scores to make judgments about accountability and performance of teachers. The level of effort that teachers must put in their instruction will vary for different individuals and groups. A focus on student outcomes alone, despite its usefulness for several purposes, can blind us to the actual work done by teachers. Imagine a teacher who is fairly well-qualified and motivated and is working tirelessly with children from high poverty groups. Yet, the socio-economic status of the households that children come from together with other problems associated with poverty could impact student outcomes negatively. Tying accountability with test scores will end up discouraging good teachers in such circumstances.
Then, there is the other, and much larger, problem — that of mortgaging your life to the test. Even in the rich countries, test-based accountability has had unintended effects and has led to widespread manipulation of the system. Test-based accountability can push the stakeholders with high stakes to manipulate the system in different ways, including cheating or lowering the standards to keep the scores higher.
As Diane Ravitch, an eminent American historian of education, puts it: “Test scores are misused, however, when they become blunt instruments to punish teachers or schools. States’ standardised tests are not the equivalent of yardsticks or barometers. They have margins of error. If Johnny takes a test on a Monday, he could take the same test a week later and get a higher or lower score depending on any number of things, including Johnny’s mood, his health, the weather, the testing conditions in the room, or just random variation. The tests also sometimes contain errors or ambiguities. These are weak reeds on which to hang the fate and future of students, teachers, and schools.”
Another scholar of education, Dan C. Lortie, has famously said that education reforms are “long on prescription and short on description”. Though said in the context of the US, it is equally true for Pakistan. The suggestions to use the standardised tests for accountability need to be seen against the backdrop of this history of quick fixes and half-thought prescriptions with less deliberation and analysis.
I also find it interesting, that accountability talk focuses on teachers alone, but not on other cogs of the system. It would be a great idea to make the political leaders in a constituency also accountable for providing the kind of support to schools and teachers that is needed for good results. It would be equally heartening to also see the commissioners, or DCOs, or whosoever is incharge to be accountable for the performance of schools in their districts.
It is easy to pull the teachers out for election and other sorts of duties and blame them for low performance of children as well. Let the standardised test scores, then, make it more difficult for every cog in the system, and not just for teachers. To be sure, I am not defending bad teachers, but just suggesting some critical issues for consideration by the readers regarding the use of standardised tests for accountability.
Let me end with an example. In 1996, I attended a lecture from a Russian professor of Mathematics at Columbia University. He was talking about mathematics education in the erstwhile Soviet Union. The American audience were shocked when told that in the Soviet Union, the parents of the children would get a letter of displeasure and a warning from the party offices if their kids did not do well in mathematics. Well, like it or not, this sort of targeting of parents for accountability also worked by presumably influencing the parental attitude toward learning of a specific school subject by their children.
While doing something like this may be a stretch of imagination in Pakistan, we should at least consider the idea of raising the stakes for education for everyone in the society. Why only teachers? Why should the communities, parents, school councils, and departments not share the burden of accountability? As the evidence from other countries show, the use of standardised testing for holding teachers accountable for children’s learning in the midst of this global epidemic of GMERs can lead to unintended consequences.
The policymakers and others worrying about education in Pakistan will do well to think diffusing accountability throughout the system and reconsider the suggestions to raise the stakes of large-scale standardised tests for teachers and students.