AI Safety

Over the past couple of years, I’ve been involved in several projects at the intersection of AI safety and human–AI interaction research. These projects all share a common theme: understanding how AI systems behave in the real world, identifying potential risks, and exploring ways to make them safer and more aligned with human needs.

Human Risks of Augmented Reality and AI

One strand of my work has focused on the REPHRAIN DAARC (Defence Against Augmented Reality Crimes) project, which looks at the social and criminal risks of the combination of immersive technologies and artificial intelligence capabilities.

My first contribution here was helping with a systematic literature review of the potential harms and criminal uses of AR — everything from invasive data collection to novel types of fraud or harassment. That work gave us a clearer map of the risks already being discussed in research and policy circles.

But we also wanted to explore how ordinary people imagined future threats. To do that, I co-ran focus groups and workshops where participants co-designed possible AR+AI crime scenarios. The process was deliberately interactive: participants sketched out ideas, swapped and remixed each other’s scenarios, and then worked together to assess their likelihood and severity on a risk matrix. Finally, the groups brainstormed possible countermeasures or guidelines that could help mitigate these threats.

Later, we applied pair qualitative coding to all of these responses, looking for recurring themes in how people thought about risk, probability, and safety in an emerging technology space.

This research is currently in the process of being published.

Multimodal AI Safety Evaluation

In parallel, I also pursued an independent project to better understand how AI safety research is actually done — the methods, frameworks, and tools that underpin it. Coming from an HCI background, I wanted to explore how human-centred perspectives could connect with technical evaluation.
Using Meta’s Llama 3 model, I developed Python pipelines that handled everything from cleaning multimodal datasets to structuring prompts for consistent testing. Rather than only looking at accuracy, I focused on where things could go wrong: when the model fabricated information (hallucination), when it produced unsafe or biased text (harmful output), and how its responses might mislead or manipulate users (interaction risks).

To investigate these, I used industry-standard safety benchmarks and toolkits, applying mixed-effects models and other statistical methods to analyse performance. I also implemented interactive Python simulations to capture responses, compute safety metrics, and visualise the data with pandas and matplotlib.

The outcome was a comprehensive report that summarised both accuracy and safety vulnerabilities, and outlined mitigation strategies for real-world applications. Just as importantly, the project gave me hands-on experience with the evaluation frameworks used in AI safety research and highlighted how they can (and should) connect to user experience considerations in HCI.