Sami Karamalla Gaiballa: Can AI Help Analyse Student Diversity Open Data?

by | 3 Sep 2025 | Access and Participation, Data and Research | 0 comments

Disclaimer: work and ideas in the article are author’s own however AI was used to edit the article.

Introduction

Accessing and understanding data in higher education (HE) can sometimes be a challenge: knowing whom to ask, where it’s stored, or who is responsible. Thankfully, the HE sector benefits from several open data sources that anyone can use. With September as Student Data Month on our diversity calendar, let’s explore how AI might help a non-expert gain insights into student diversity trends using these open resources.

Open Student Data

Open data, information freely available to all, has the power to shed light on the experiences of students from diverse backgrounds. For example:

However, making sense of all this information can be overwhelming. That’s why I decided to test an AI engine—Microsoft Copilot 365 with GPT-5—to see if it could help extract useful insights from some of these datasets. Let’s start with a short definition of AI.

What is AI?

The University of Manchester AI Guidelines describe AI as technology that allows machines to perform tasks typically requiring human intelligence, such as recognising speech or solving problems. Generative AI, like ChatGPT or Microsoft Copilot, can generate new content by training on large datasets. Yet, humans must oversee AI’s work due to potential biases in underlying data and the risk of “hallucinations”—when AI fabricates information.

Using AI with Student Data

For this example, I downloaded HESA’s open data table (*) on student enrolments by provider and disability marker. I uploaded it to Copilot and prompted:

I am a university employee interested challenges that disabled students might face in at university, but I do not have any data analytics background or experience. I would like to know:

-Number of disabled students in the Higher Education sector

-Proportion of disabled students in the Higher Education

Could you provide me with that information based on this data, please?

Copilot returned values and explained its calculations. I double-checked the answer to make sure that it is correct – something you should always do when working with AI:

Thanks for sharing the file—here’s the headline picture for the UK Higher Education (HE) sector from your dataset (Academic year 2023/24): Your two answers (based on the file dt051-table-4.csv) Number of disabled students (sector‑wide): 514,310 students with a known disability (HESA “disability marker”). 12 Proportion of disabled students (sector‑wide): 17.7% of all students. How calculated: 514,310 ÷ 2,904,425 (total students) = 0.1771 → 17.7%. 12 Notes for interpretation (plain English): Known disability is HESA’s marker based on students’ self‑reported information; it’s the standard way HESA publishes disability statistics. HESA rounds headcounts to the nearest 5 to protect confidentiality, so you may see tiny “doesn’t‑quite‑add‑up” differences between parts and totals—this is expected. 3 If you’d like, I can also break this down by nation, provider, or show the trend since 2014/15—all of that is available in the same HESA table and easy to visualise. 2

However, Copilot’s claim that all data could be visualised by national breakdown was not strictly correct for this dataset beacuse that field was not there and it would require downloading additional data. This highlights the importance of human oversight.

Next, I asked Copilot: Could you create a bar chart with Top 10 institutions with the highest proportion of disabled students, please? AI provided, a chart and methodology used. In addition, Copilot noted that small institutions could skew percentages, offering to filter by size of insitutions if we would like to:

"Bar chart titled 'Top 10 HE providers by proportion of students with a known disability (2023/24)'. The chart displays the proportion of students with a known disability for each provider, ranging from 0.0 to 1.0 on the x-axis. The top provider is Chicken Shed Theatre Company with 81.2% (65 out of 80 students), followed by Arts University Plymouth (46.7%), Norland College (45.6%), Luther King House Educational Trust (45.5%), Court Theatre Training Company Ltd (42.9%), Falmouth University (42.2%), The Royal Central School of Speech and Drama (40.9%), National Centre for Circus Arts (40.0%), Futureworks (39.8%), and Cliff College (39.1%)."

However, AI also suggested filtering by public universities which is a field this dataset did not contain. This again emphasises the need for knowledgeable human guidance.

You can keep the  “conversation” by further prompting AI. At some point, we could move on from data and start asking questions such as what are the best practices in recruiting disabled students. Remember to always request references so that you can verify the information yourself.

Conclusion

AI can be a valuable partner in interpreting open student data, but it still requires human oversight and critical thinking to ensure accurate, meaningful analysis. Rembember to always perform a check on the AI output and, if you are a a student or staff at the University of Manchester, to use AI according with the University’s guidance.

Sami Karamalla Gaiballa – EDI Partner and Lead Data Analyst in the Directorate of Equality, Diversity and Inclusion at the University of Manchester

References:

*HESA, Table 4 – HE student enrolments by HE provider and disability marker Academic years 2014/15 to 2023/24 , https://www.hesa.ac.uk/data-and-analysis/students/table-4 

0 Comments