Assessments and the subsequent awarding of grades are the most widely employed measure of learning; a way for an instructor to gather information about what has been learnt and make a judgement (Brookhart, 1999). The grades that constitute a degree are relied upon by educators and employers as a valid indicator of a student’s academic performance.
Discussing the application of psychological principles to grading systems has lead me to pose a question that, as psychologists, we are always encouraged to consider when using measures to collect and analyse research data.
Are grades a valid and reliable measure of student learning?
Brookhart (1999) stipulates that judgements about the work that students produce must be meaningful and accurate; in other words, the measure of student learning must be valid and reliable.
But is this the case?
There are many definitions of validity in the literature depending on what that definition is being applied to (Winter, 2000). However, a definition of validity that is relatable to assessment in education is:
“An account is valid or true if it represents accurately those features of the phenomena, that it is intended to describe, explain or theorise” (Hammersley, 1987).
Put simply, validity is the degree to which a measure measures what it is supposed to (Black & Champion, 1976).
Reliability can be defined as the ability to measure consistently and to maintain the capacity to yield the same measurement, otherwise known as stability (Black & Champion, 1976; Johnson & Pennypacker, 1980).
Taking these definitions into account, let me refer back to grading systems and evaluate to what extent they are valid and reliable. First, consider validity. When sitting an exam or writing an essay, what is actually being measured in that assessment? Educators claim that assessments measure how well students have learnt information relating to particular learning outcomes. Assessments intend to measure how much relevant knowledge and data has been acquired by a student, in addition to how this knowledge can be applied to problem solving and real world situations (Race, 2009). At least, this is what employers are encouraged to think when they are presented with a student who has been awarded a first class degree and who has managed to get A grades in almost every module. What learning outcomes don’t tell you is that actually, exam and coursework assessments measure how learners write (Race, 2009).
Think about an exam: You are expected to produce work that demonstrates what you have learnt by answering several SAQs and then one essay question. In that work, you are expected to write clearly, concisely and extremely quickly, without noticeable grammatical error in a two hour time period. Does this method of assessment adequately measure learning? How is it possible to condense months of learning and effort into a two hour exam? What is measured is not learning, but “neatness, speed and eloquence of learners’ writing” (Race, 2009). Furthermore, think about revision for exams. In an attempt to regurgitate all relevant knowledge necessary to acquire a good grade in an exam, students will try to absorb considerable amounts of knowledge about a particular subject area. They will revise the relevant information in the hope that they can then regurgitate in the exam. Some may even attempt past papers to get a feeling of the structure and format of that exam. This may be what is necessary to get an A grade in the exam, but does it measure learning? It could be argued that this method of assessment measures only the ability to pass an exam, rather than measuring learning.This implies that the validity of grades is not what is claimed by educators because grades do not represent what they claim to represent. Race, Brown and Smith (2005) argue that assessments must be valid; they should assess what it is that educators genuinely want to measure. Thus, it is misleading for learning outcomes to propose that, for example, problem solving skills will be measured, when in fact the resulting grade is heavily dependent on the quality and style of writing.
Secondly, consider reliability. Race (2009) argues that for many, reliability is synonymous with consistency and fairness. He proposes that reliability is important because assessing the work of students fairly and reliably is the single most important thing educators can do for learners. Race, Brown and Smith (2005) argue that reliability can be achieved by inter-rater marking, that is, different assessors marking students’ work and coming to a unified decision. Furthermore, the researchers argue that all assignments should be marked to the same standard. Here in lies my problem. When I spoke to Jesse a few weeks ago to discuss grading systems, he told me that when marking, he would very much focus on the knowledgeable content and theory, and didn’t place as much importance on correct APA referencing, grammar and so on. However, he informed me that different members of staff in the faculty placed much more emphasis on correct spelling, grammar and writing style, rather than the knowledgeable content of a student’s work. Bearing this in mind, what would happen if a student produced an informatively novel and innovative piece of work, that unfortunately contained several spelling mistakes and incorrect APA referencing at the end? Would one teacher provide the same mark as another, considering the differences in their marking preferences? This leads me to question the extent to which grades are a reliable indicator of student learning. If the difference between an A grade and a C grade is a correct APA reference for one educator, yet a paragraph of innovative research ideas for another, how can reliability possibly be present in such a grading system?
These discrepancies in the validity and reliability of grading systems lead me to question their place in learning environments. If grades are not valid or reliable, then they are of little use to the students who receive them, let alone the future employers who regard them as valid and reliable indicators of student ability.
Brookhart, S. (1999). The art and science of classroom assessment: The missing part of pedagogy. Washington, DC: ERIC Clearinghouse on Higher Education.
Black, J. A., & Champion, D. J. (1976). Methods and issues in social research. New York, NY: Wiley.
Hammersley, M. (1987). Some notes on the terms ‘validity’ and ‘reliability’. British Educational Research Journal, 13(1), 73-81.
Johnson, J. M., & Pennypacker, H. S. (1980). Strategies and tactics of human behavioural research. Hillsdale, NJ: Lawrence Erlbaum Associates.
Race, P. (2009). Designing assessment to improve physical sciences learning: A physical sciences practical guide. Retrieved from http://www.heacademy.ac.uk/assets/ps/documents/practice_guides/practice_guides/ps0069_designing_assessment_to_improve_physical_sciences_learning_march_2009.pdf
Race, P., Brown, S., & Smith, B. (2005). Tips on assessment: 2nd Edition. London, England: Routledge.
Winter, G. (2000). A comparative discussion on the notion of ‘validity’ in quantitative and qualitative research. The Qualitative Report, 4. Retrieved from http://www.nova.edu/ssss/QR/QR4-3/winter.html