This afternoon while taking a long bath before the return to the treadmill tomorrow I found myself reading Ofqual’s myth busting page. The fact that Ofqual felt the need to produce a myth busting page suggests that they’re still losing the PR battle and that the chief executive, Glenys Stacey, is battling for survival. That’s not what this blog is about though, because Ofqual have practically admitted that this summer’s exam series was unfair.
The following section jumped out at me:
“ofqual should have acted in january, when the unit awards were too generous”
Our report sets out how we reviewed January awarding. Our conclusion is that the right processes were followed and the outcomes were reasonable given the data available at the time.
There was not sufficient evidence when the January awards were made for Ofqual or the exam boards to conclude that the unit awards were generous. Indeed, as set out in our report, the data at the time suggested that the boundaries on the AQA foundation tier paper should have been even lower than they were.
This demonstrates one of the challenges with awarding graded modular qualifications. We will be moving away from a modular system at GCSE in England after the 2012/13 school year.
Flexible grade boundaries exist to ensure fairness within the exam system. Due to the nature of the English exam(s) some papers are more difficult than others. This is generally due to the material used in the exam rather than the questions themselves. Sometimes a more difficult text (with fewer presentational devices, a higher register that lower ability students may find more difficult to understand etc) is used, and therefore students will perform less well and the grade boundaries need to be lowered. This isn’t because the students are less clever, and so the results need to be normalised (I think this is technically known in maths as quantile normalisation – but I may be wrong!).
This makes sense year-on-year, but doesn’t necesserily make sense with GCSE modules. During the first run through of a specification an exam board has little data to go on, thus making grade boundary setting difficult. As the new specifications were modular for the first time in GCSE English history this was even more difficult – the number of students entered in January 2011, June 2011 and January 2012 was tiny. I’ve tried finding figures for the individual exams, but 2011 figures don’t seem to be available. For AQA in January 2012 54,000 candidates sat the foundation exam, in June 2012 141,000 candidates were entered. As noted in the Ofqual Report “This report [the examiners report] for the June paper noted that: “the overall demands of the paper were very similar to previous series”.”
This doesn’t make a lot of sense, if the grade boundary system exists to maintain standards, and the overall demands of the paper were considered to be similar to the January 2012 paper then, logically, the grade boundaries should have stayed the same. They didn’t.
The report goes on to say that:
“The tier F agreed C boundary, initially 52, was moved up to 53 on revisiting the [evidence] in the light of further statistical information. Although this mark was significantly higher than the mark for the reference year [June 2011] it was felt that this mark was a truer reflection of the quality of candidates’ work in relation to the C grade criteria”.
I’ve emboldened the two elements I think are particularly important here. AQA had initially set the grade boundary for a C at 52 – 9 marks higher than in January 2012, despite the fact that the exam presented similar demands on the students. The report then states that this was then moved up to 53 due to further statistical information. It however makes no attempt to explain HOW the boundary of 52 had been decided and WHAT statistical information had been used to to generate it.
So without any extra information it already seems unfair – similar difficulty of paper, much higher pass rate. (Let’s face it, most people consider a D or below a fail).
It gets worse though.
Because this is the first time students were certificating in this GCSE course, students who sat in the exam in January 2011 were all in Year 10. They’d studied GCSE English for 3 months (some schools may have started their GCSE courses in Year 9). In June 2011 they had been studying for 10 months. In January 2012, 13 months. In June 2012, 20 months. Students are allowed to resit the exam two times. Therefore, logically, the students in June 2012 will have performed better than any other cohort. They will have had 20 months of teaching and preparation. They may have sat the exam before and schools will have learned from the previous 3 series and will have got better at preparing their students for the exam. (Our students certainly did, 90+% of those who resat in June 2012 achieved a higher grade than they did in January 2011, despite the grade boundary changes) The students who take the exams earlier on, will have been less mature, less well prepared and generally gained lower marks than those who sit later on. These students, and schools, were rewarded.
Because Ofqual are aiming for a return to norm referencing:
“ofqual’s approach is a return to norm referencing”
A norm referencing approach would mean that in each exam board there would be, say, 5% getting A*, 10% getting A, and so on. That takes no account of differences in the cohort between years or between exam boards. Candidates entering with an exam board with a higher ability entry would be less likely to get into the top 5%, or the top 10%.
Our comparable outcomes approach is explained in detail in the following document relating to the summer 2012 awards, http://www.ofqual.gov.uk/files/2012-05-09-maintaining-standards-in-summer-2012.pdf. This approach is based on a number of assumptions, including the assumption that if the cohort is similar, in terms of ability, to the cohorts in previous years then the outcomes should be comparable. But if the cohort is different, then the outcomes will be different. This is a fairer approach than a simple norm referencing approach.
In order to avoid just the situation I’ve described above the exam boards use information about the cohort – prior data. Except there’s another problem here – Key Stage 3 tests (the prior data traditionally used by exam boards) were abolished in 2008. This summer’s year group didn’t have Key Stage 3 data. How would they maintain this fairness?
With data from KS3 tests no longer available as the tests stopped after 2008, in preparation for the summer 2011 exam boards sought other data to use to compare the relative ability levels of the 2009 and 2011 cohorts. We expect exam boards to use as wide a range of qualitative and statistical evidence as possible in guiding their awards.
Having considered the issues above, the regulators have agreed with exam boards that emerging results in August 2012 will be reported to the regulators using two measures. All exam boards will report their outcomes compared to the results achieved by common centres
7 from 2011. In addition, the three exam boards based in England will report their outcomes against predictions for the cohort based on prior achievement at Key Stage 2.
I don’t believe Key Stage 2 data is a particularly accurate measure of predicting what students could/should achieve 5 years after they sat the KS2 SATs. As a school, we have to use KS2 data to inform our predictions because we’re judged against 3 and 4 levels of progress. We don’t like it. We often get complaints from parents that their son/daughter has matured/changed/etc since they were in Year 6 and they’re right. It’s inaccurate and unfair.
So to summarise.
- In an attempt to produce comparable incomes the exam boards used statistical data to ensure that this year’s GCSE results were similar to last years.
- Due to a lack of data, and a small cohort, in the first three modular GCSE English foundation exams the grade boundaries were set lower than in June 2012.
- The students who sat these exams weren’t as well prepared or mature as the students who sat the exam in January 2012, therefore the marks achieved were lower.
- The grade boundaries were therefore set at a level that is now considered generous (a convenient excuse because once a grade has been awarded it can’t be taken away).
- The students were better prepared in June 2012, therefore they achieved a higher number of marks and so the grade boundaries were increased.
So the students who performed at the highest level were penalised for doing so and had to achieve a higher number of marks to achieve a C grade. The students who were entered early performed less well, but achieved a higher grade. We punished the wrong students. We should be supporting them and applauding their achievements.
(I still believe that some schools who have achieved lower number of C grades in the summer could have done more than they did to support their students and that they have to bear some level of responsibility in this whole saga but it’s fair to say that schools and students have been treated unfairly by the system when they were led to believe that the grade boundaries were fairly secure and that exam boards and Ofqual knew what they were doing.)
Edit: To clarify the final paragraph, I’m not blaming schools but merely making the point that if we are to blame and criticise others we must also look inwards at ourselves and our schools to ensure we are doing the best for our students. After all they’re the most important people in this whole thing.