Can we spot GPT-written content?
Cesare is a Teaching fellow soon to be lecturer in the Department of Mathematics and a Senior Fellow of the Higher Education Academy. Here, he outlines some common features of Large Language Model (LLM)-created writing with Esme Davies, Higher Education Intern in the ITL, and discusses the implications of this technology for how we might assess students’ knowledge and skills in future.
(GPT stands for Generative Pre-trained Transformer – a Large Language Model (LLM) type of artificial intelligence (AI) that generates written content from scanning massive online datasets).
There has been a lot of talk about the tell-tale signs of GPT-written work. Could you give us an overview of these features, and their reliability in detecting GPT-written content?
There is an important distinction between work that is entirely written by GPT, and work that is co-produced content. Co-produced content is much harder to detect as it contains human elements which vary person to person.
- One of the common characteristics of GPT-generated text is references which don’t exist such as names of people, or quotations. These look real at first glance and sometimes require an expert in a particular topic to identify.
- Sometimes the text can contain expressions such as “As an AI language model, I am unable to…” that have been inadvertently left there.
- Another current trait is the language we typically find in AI written work. In general, due to the way they were trained, GPT and other language models tend to structure their answers in a precise way, often following a top-down approach where they start by defining basic concepts and outlining the structure of what they are about to produce. For instance, a colleague was able to deduce a student has used GPT to write an essay because it started by giving definitions of concepts that were very basic for the level of the course, and looked out of place.
However, this last feature does not necessarily always mean content is GPT-written. Weaker students are known to rely on re-defining basic concepts, sometimes to increase their word count. AI will continue to get better so we need to change how we assess students; in the meantime, we can take advantage of the current pitfalls of AI.
How do AI detection tools fit into this discussion?
Personally, I am against the use of AI detection tools. Companies train these detectors on specific data sets, but the actual uses of AI are so rich and big and varied that once you go beyond the training data set, detectors are not 100% reliable. For example, there is evidence that texts written by non-native English speakers are particularly vulnerable to being incorrectly flagged as AI-generated, probably because their sentences have a more predictable structure.
Even if AI detectors were to reach a level of reliability that eliminated most false positives and negatives, we are still moving in a direction where we want content to be co-created with AI. This could be for many reasons: write better, save time, for accessibility reasons. The word processor that students use may soon have AI already built in. Employing detectors would disadvantage people who use AI for legitimate reasons.
The guiding principle of many institutions, including the government, is that we should aim to train people in using AI responsibly and skilfully. It will be interesting to see how assessments might develop to be able to incorporate effective and appropriate use of AI since we know that assessments are one of the main motivators of learning for students.
For example, we need to develop more authentic assessments – possibly some that integrate AI where appropriate, and in other cases explicitly exclude it. One analogy is non-calculator exam papers: when we do not want students to make use of a calculator, we create controlled conditions where this can be ensured, but outside of the exam room we acknowledge the existence of calculators, and we teach them to use them critically and properly.
Machine based detection should not be part of the process at all, but the lecturer’s insight and knowledge of their students and what they are capable of is key.
How would you address concerns from colleagues over an increase in academic malpractice with GPT-written content?
Students cheat for a variety of reasons. There have always been ways for this to happen, plagiarism through word scramblers, essay mills, asking a friend. ChatGPT only makes cheating easier and faster, but this not an entirely new thing. The best thing to aim for is to try and motivate students not to cheat by changing assessments, and in the long run this will mean moving towards new types of assessment that include the possibility of co-creation with AI.
However, this is a long process. Mitigation in the short-term could mean more invigilated exams, but not just that! For example, effective interim measures might include:
- Asking students to submit drafts to their supervisors at regular intervals so the supervisor can see the work evolve and whether the student implements feedback.
- Reverting to older models of assessment where oral presentation held more weight than they do now.
- We could give more weight to reflective elements in which we discuss with the student what they have done, and we track how they have done it.
None of these suggestions is exempt from caveats, and there is no one-size-fits-all answer.
What else can colleagues do in the short and longer term to prevent submission of GPT-written work?
Colleagues should explicitly tell their students their approach to Artificial Intelligence and associated assessment rubric, making it clear to students what is good and bad usage of AI and setting out their expectations.
Students need to be taught the best uses for AI, such as spellcheck, or summarising, whilst also having awareness of AI’s pitfalls. This will require an honest conversation with students and, depending on the course, this conversation can vary wildly. I have things I would say to my fourth year Algebra course that I would not repeat to my first-year maths students, largely because AI is currently pretty good at first year maths, and terrible at fourth year maths.
Colleagues could put their own assignments into ChatGPT and adjust the assignment, or just to have some idea of what ChatGPT written work would look like for that essay question.
What kind of support do you think staff would need in creating more authentic assessment to reduce GPT-written work in terms of academic malpractice?
Colleagues need to be trained on AI, how models work, common pitfalls, what models can and cannot do. There are damaging misconceptions such as overconfidence in ability to identify AI-written content. Every peer-reviewed study I have seen on the topic shows that humans consistently fail to identify AI-written content and often misclassify it as human-produced, especially with more advanced models.
Support with workload will also be needed. It is undeniable that more authentic assessment needs more time. For instance, monitoring drafts and sending regular feedback takes more time than just marking the whole essay once. In the end authenticity comes down to a human element, taking time to speak to other humans.
Separate suggestions on providing a VLE (Virtual Learning Environment) such as Cadmus and requiring students to work only within its bounds could be effective at the moment. However, this approach comes with drawbacks related to privacy and is vulnerable to AI learning to mimic human behaviour.
A fundamental element in the following years will be enhancing scholarship and networking, as we need to encourage everyone to share what they have tried on their courses with the wider teaching community, what has worked and what hasn’t.
In the end we need to see this from the student’s perspective. Right now, every staff member is a person who has grown up with traditional teaching with traditional assessments. Students have experienced blended learning, and now they have AI. We perceive AI as a new thing, but students will soon perceive AI as a tool which was always there, because they will have grown up with it. It is fundamental to involve students in the conversation.
Cesare G. Ardito – @CesareGArdito , https://cesaregardito.eu .
GPT-4 was used for feedback and improvement of an early draft of this article.
Further reading/resources:
- Michael Webb, “A generative AI primer”, JISC.
- The use of Artificial Intelligence (AI) – University statement for student handbooks (in the TLE Bulletin (July/August 2023)
- The University of Manchester’s Guidance on the use of Calculators in Exams.
- The University of Manchester’s Guidance on the Use of Dictionaries in Exams.
0 Comments