Home » General » Grades: Are grades valid or reliable?

Grades: Are grades valid or reliable?

reliability validity

Assessments and the subsequent awarding of grades are the most widely employed measure of learning; a way for an instructor to gather information about what has been learnt and make a judgement (Brookhart, 1999). The grades that constitute a degree are relied upon by educators and employers as a valid indicator of a student’s academic performance.

Discussing the application of psychological principles to grading systems has lead me to pose a question that, as psychologists, we are always encouraged to consider when using measures to collect and analyse research data.

Are grades a valid and reliable measure of student learning?

Brookhart (1999) stipulates that judgements about the work that students produce must be meaningful and accurate; in other words, the measure of student learning must be valid and reliable.

But is this the case?



There are many definitions of validity in the literature depending on what that definition is being applied to (Winter, 2000). However, a definition of validity that is relatable to assessment in education is:

“An account is valid or true if it represents accurately those features of the phenomena, that it is intended to describe, explain or theorise” (Hammersley, 1987).

Put simply, validity is the degree to which a measure measures what it is supposed to (Black & Champion, 1976).


Reliability can be defined as the ability to measure consistently and to maintain the capacity to yield the same measurement, otherwise known as stability (Black & Champion, 1976; Johnson & Pennypacker, 1980).


Taking these definitions into account, let me refer back to grading systems and evaluate to what extent they are valid and reliable. First, consider validity. When sitting an exam or writing an essay, what is actually being measured in that assessment? Educators claim that assessments measure how well students have learnt information relating to particular learning outcomes. Assessments intend to measure how much relevant knowledge and data has been acquired by a student, in addition to how this knowledge can be applied to problem solving and real world situations (Race, 2009). At least, this is what employers are encouraged to think when they are presented with a student who has been awarded a first class degree and who has managed to get A grades in almost every module. What learning outcomes don’t tell you is that actually, exam and coursework assessments measure  how learners write (Race, 2009).

Think about an exam: You are expected to produce work that demonstrates what you have learnt by answering several SAQs and then one essay question. In that work, you are expected to write clearly, concisely and extremely quickly, without noticeable grammatical error in a two hour time period. Does this method of assessment adequately measure learning? How is it possible to condense months of learning and effort into a two hour exam? What is measured is not learning, but “neatness, speed and eloquence of learners’ writing” (Race, 2009). Furthermore, think about revision for exams. In an attempt to regurgitate all relevant knowledge necessary to acquire a good grade in an exam, students will try to absorb considerable amounts of knowledge about a particular subject area. They will revise the relevant information in the hope that they can then regurgitate in the exam. Some may even attempt past papers to get a feeling of the structure and format of that exam. This may be what is necessary to get an A grade in the exam, but does it measure learning? It could be argued that this method of assessment measures only the ability to pass an exam, rather than measuring learning.This implies that the validity of grades is not what is claimed by educators because grades do not represent what they claim to represent. Race, Brown and Smith (2005) argue that assessments must be valid; they should assess what it is that educators genuinely want to measure. Thus, it is misleading for learning outcomes to propose that, for example, problem solving skills will be measured, when in fact the resulting grade is heavily dependent on the quality and style of writing.

Secondly, consider reliability. Race (2009) argues that for many, reliability is synonymous with consistency and fairness. He proposes that reliability is important because assessing the work of students fairly and reliably is the single most important thing educators can do for learners. Race, Brown and Smith (2005) argue that reliability can be achieved by inter-rater marking, that is, different assessors marking students’ work and coming to a unified decision. Furthermore, the researchers argue that all assignments should be marked to the same standard. Here in lies my problem. When I spoke to Jesse a few weeks ago to discuss grading systems, he told me that when marking, he would very much focus on the knowledgeable content and theory, and didn’t place as much importance on correct APA referencing, grammar and so on. However, he informed me that different members of staff in the faculty placed much more emphasis on correct spelling, grammar and writing style, rather than the knowledgeable content of a student’s work. Bearing this in mind, what would happen if a student produced an informatively novel and innovative piece of work, that unfortunately contained several spelling mistakes and incorrect APA referencing at the end? Would one teacher provide the same mark as another, considering the differences in their marking preferences? This leads me to question the extent to which grades are a reliable indicator of student learning. If the difference between an A grade and a C grade is a correct APA reference for one educator, yet a paragraph of innovative research ideas for another, how can reliability possibly be present in such a grading system?

These discrepancies in the validity and reliability of grading systems lead me to question their place in learning environments. If grades are not valid or reliable, then they are of little use to the students who receive them, let alone the future employers who regard them as valid and reliable indicators of student ability.



Brookhart, S. (1999). The art and science of classroom assessment: The missing part of pedagogy. Washington, DC: ERIC Clearinghouse on Higher Education.

Black, J. A., & Champion, D. J. (1976). Methods and issues in social research. New York, NY: Wiley.

Hammersley, M. (1987). Some notes on the terms ‘validity’ and ‘reliability’. British Educational Research Journal, 13(1), 73-81.

Johnson, J. M., & Pennypacker, H. S. (1980). Strategies and tactics of human behavioural research. Hillsdale, NJ: Lawrence Erlbaum Associates.

Race, P. (2009). Designing assessment to improve physical sciences learning: A physical sciences practical guide. Retrieved from http://www.heacademy.ac.uk/assets/ps/documents/practice_guides/practice_guides/ps0069_designing_assessment_to_improve_physical_sciences_learning_march_2009.pdf

Race, P., Brown, S., & Smith, B. (2005). Tips on assessment: 2nd Edition. London, England: Routledge.

Winter, G. (2000). A comparative discussion on the notion of ‘validity’ in quantitative and qualitative research. The Qualitative Report, 4. Retrieved from http://www.nova.edu/ssss/QR/QR4-3/winter.html


6 thoughts on “Grades: Are grades valid or reliable?

  1. Hi Emma! I think you’ve put together a very compelling argument, and I just want to add to your point about validity of grades because I agree with you that they are not always (or even usually) valid in measuring what students have to truly learnt (as you point out, they often measure how students write). I also believe that the marking process itself isn’t necessarily valid. That is, when teachers come to mark essays they should be marking who has objectively produced “the best” work (which is not without problems in itself), but how do teachers mark that reliably and objectively? When it comes to essays, there is a huge amount of information. Factors that may influence marking include: conciseness, synthesis, originality of ideas, organization, writing style, formatting, how do you decided which measure is more valid? It is often left to the individual teacher. This review (https://orderline.education.gov.uk/gempdf/1849625344/QCDA104983_review_of_the_literature_on_marking_reliability.pdf) points out that factors such as how good the markscheme is, and even gender, affect the reliability of marking (and you can’t have validity without reliability). There is often low inter-rater reliability. Even more worryingly, Spear (http://www.tandfonline.com/doi/abs/10.1080/0013188970390209) found that contrast effects have a big influence on marking: the quality of the work marked just before your essay/exam was marked has a direct effect on your mark. Therefore even if the exam/essay was completely valid in measuring what students had learnt (Which has you point out is questionable), the grade could still be influenced by the order the teacher marked the work in, hence meaning the overall grade still isn’t truly valid.

    • Hi Becca, thanks for your comment! I completely agree with you; how is it possible for teachers to mark reliably and objectively considering the different factors that influence their final awarding grade? Think of how much a teacher has to remember when marking an essay; if you look at a marking scheme there is so much for the teacher to take into account. Furthermore, when you look on our cover sheets, there are 18 possible letter grades that the marker has to consider, ranging from A* to F3. Whilst it may be easier to distinguish an A grade paper from a D grade paper, it is much harder for a marker to differentiate between an A and a B+. How can they possibly provide a reliable grade considering how many choices there are and how much content there is to read through and assess? Most of us are aware of Miller’s Law (Miller, 1956) which stipulated that the number of objects the average human holds in their working memory is 7 ± 2. Bearing this in mind, consider how much a teacher has to hold in their working memory at one time to mark an essay. Not only should they be continuously considering what letter grade the piece of work amounts to; they have to remember all the marking guidelines and criteria, in addition to evaluating spelling, grammar, and in our case, APA referencing. As Miller suggest in his law, this in not actually humanly possible. Therefore, it is highly improbable that the resulting grade is reliable, even taking into account inter-rater reliability, because the same problem occurs when the next marker attempts to evaluate the same piece of work. The question is, what can be done to overcome this lack of reliability in grading?


      Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.

  2. I think you’ve made a brilliant argument about the validity of grades but want to question your argument about reliability. Even with the evidence you have brought it from Jesse, I think the reliability of marking papers is very subject specific. With subjects that requires long essay questions, or subjects with a very small divide between right or wrong the reliability of marking has to be, to an element, subjective. If the subject has a definitive right answer such as Maths the reliability of the marking with increase. This was suggested by Newton (1996), who then questioned whether “examination boards are failing in there assessment of English”. Although there are standardised procedures of marking, the schemes provided can easily be misinterpreted, but isn’t this the same of the examination paper itself within English?

    Newton, P. E. (1996). http://www.jstor.org/stable/10.2307/1501723

    • Thanks for your comment, Trudi. You present a really interesting point when you argue that reliability in grading is largely dependent on the subject. When there is a right/wrong answer, for example in a maths test, the reliability is high because each paper will be marked to the same standard; making it trustworthy, consistent and fair (Race, 2009). However, when it comes to essay marking it is a different concept all together. As I have said in response to the previous comment by Becca, the average human is only able to hold 7 ± 2 objects in their working memory at one time (Miller, 1956). Considering this theory, marking a maths test or even an MCQ test which has only four possible answers, reliability is high. In contrast, if you take into consideration the amount of information a teacher has to remember when marking as essay, the number of “objects” that are required to be present in working memory at one time are considerably more than 7 ± 2. Marking essays is not nearly as consistent, trustworthy or fair as marking assessments that have simple right/wrong answers. Considering this, I can only echo what Becca said in her comment; that without reliability you cannot have validity, either.


      Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81.

      Race, P. (2009). Designing assessment to improve physical sciences learning: A physical sciences practical guide. Retrieved from http://www.heacademy.ac.uk/assets/ps/documents/practice_guides/practice_guides/ps0069_designing_assessment_to_improve_physical_sciences_learning_march_2009.pdf

  3. I saw your talk today and really enjoyed it. It got me thinking about how grades can be taken into the world and be generalised to other aspects of life.
    When getting a job grades are used to point out the competency of an individual, however, I believe that grades can be very detrimental to the application of jobs. Just because someone’s grades are not up to scratch academically, does not mean they cannot be given the opportunity for the job, they may be very hard working. Someone may have graduated university getting a first, and 100% in every module, but that doesn’t mean they may have the skills needed for specific jobs. It does not necessarily mean that they are good at other aspects such as integrating with others and working as a team, which many job vacancies may be looking for. So I believe that grades aren’t the most important thing in life as they cannot be generalised to all skills and doesn’t show the competency of an individual as a whole.

    • Thanks for your comment. I must say I agree with you. Getting good grades does not guarantee that an individual is suitable for a job. You would hope that during their degree, a student would have participated in assignments that were dependent on group/team work, but it is not a certainty. Something even more disheartening is that some candidates who are perfectly suitable for a job may be overlooked or turned down because of their university grade. This is where I come back to my argument that in a society where grades are everything, they must be a valid and reliable measure of student learning and ability, which (as I have explained in my main blog) is certainly not the case. For example, a student who spends every waking hour in the day (unlikely, but possible!) studying and learning, but never seems to do well in an exam perhaps because of an undetected learning disability, such as dyslexia, will consequently receive low grades in assessments. The result of this is not only that their degree classicifation is not a valid or reliable indicator of their learning ability, but they will have a hard time even reaching the interview stage when applying for future jobs because employers would take one look at their grades and dismiss them as incompetent. It is unfortunate, but sadly, it does happen!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s