IELTS | News and Insights - Exploring the potential of conversational…

Language assessment must evolve with the times, making use of the latest technology to improve test standards and efficiency while maintaining integrity. In today’s digital world, that means taking advantage of the benefits of artificial intelligence (AI).

Recent developments in conversational AI have shown promise for speaking assessments. However, discussions often focus on future-facing claims rather than current abilities.

This article will discuss the current state of AI-powered speaking assessments and what role this technology may play in high-stakes summative testing.

The case for conversational AI in speaking assessment

Language testing has become increasingly digitised in recent years, with AI and automation now routinely used to deliver and score listening, reading and writing tests.

Speaking has seen fewer changes due to its interactive, interpersonal nature. In recent years, however, predictive machine learning, generative AI and speech recognition are beginning to change that. Modern AI systems can now capture a range of linguistic features, including pronunciation, fluency and discourse management.

These automated approaches to language assessment offer benefits for universities and candidates alike, such as scalability, standardisation and rapid feedback. They can also help overcome concerns about the variability that could theoretically arise from differences in examiner behaviour in test delivery.

While technology can lead to efficiency and boost trust in testing, innovation alone is no reason to change assessment design, delivery or scoring systems. Institutes must be confident that it provides some tangible benefit.

Therefore, the question remains: how can we explore the technology's potential without compromising test validity and integrity?

Why assessing dialogue remains a challenge for computer-delivered speaking assessment

Many widely used speaking tests rely on non-interactive, monologic tasks in the interest of feasibility and scalability. However, they have come under criticism for not fully representing real-world communication and for raising concerns about test validity.

In everyday situations, spoken communication is a two-way process. It involves responding to others, managing turns, repairing misunderstandings, and building meaning together over time.

While these skills are central to oral proficiency, most computer-based speaking assessments fall short when replicating and assessing them. They do not require candidates to respond to others or adapt in real time – meaning they do not genuinely reflect human interaction.

What the evidence suggests about the current capabilities of conversational AI

Spoken Dialogue Systems (SDSs) are a key technology that could address this challenge. They allow users to interact with a computer, which listens, interprets meaning and responds – much like a human examiner.

Traditionally, SDSs have been rule-based, meaning they rely on predefined scripts, intents and dialogue pathways. As a result, they are highly controlled and consistent leading to more structured, task-based interactions.

More recently, AI-powered SDSs have emerged, using large language models to generate responses in real time. These systems can produce natural-sounding replies that fit the context of the conversation, opening up real possibilities for speaking assessment (Karatay & Xu, 2025).

Karatay and Xu set out to test whether an AI-powered SDS could prompt more natural dialogue in a controlled setting, with responses evaluated by human examiners. The system, powered by GPT-4o, moderated a discussion task modelled on Part 3 of the IELTS Speaking test. Their study was unique in that it examined both the AI and test taker output.

The findings were encouraging. The system successfully elicited a wide range of interactive language, particularly in topic development and turn-taking. It also did so consistently across proficiency levels, giving all test takers comparable opportunities to demonstrate their abilities and suggesting potential for greater standardisation.

Crucially, the SDS made it possible to differentiate between higher and lower proficiency candidates. These traits are fundamental to any valid assessment and demonstrate that topic development is a reliable indicator of oral proficiency (Seedhouse, 2019). The AI-powered SDS also led to more consistent measures of interactions, possibly because separating test delivery from scoring reduced examiners’ cognitive load and allowed them to focus fully on evaluation.

Test-taker perceptions were broadly positive: many participants were comfortable with the system's ability to understand their responses and pose appropriate follow-up questions.

Where conversational AI still shows limitations in speaking assessment

The study uncovered some limitations with conversational AI based on feedback. In particular, participants noted several issues with how the AI-powered SDS managed conversational flow and demonstrated active listening:

Limited verbal acknowledgement, in particular phrases that indicate active listening such as ‘yeah’ and ‘right’
Lack of nonverbal communication, for example, the absence of eye contact, facial expressions and head movements that normally signal understanding
Unintended interruptions from rigid turn-taking that disrupted the flow of conversation

All this meant the interaction felt structured and reliable, but not entirely authentic or natural — closer to a standardised interview than a genuine conversation.

The findings echo those of a 2024 study on conversation openings and closings, which identified more relational talk and facework in human-human vs human-SDS interactions (Dombi et al., 2024).

More broadly, conversational AI shows a lot of promise in how it can respond to clarification requests, adapt to unexpected input and mimic dialogue. However, because the systems can vary in their responses and sometimes “help” the user too much, they are harder to standardize thus not suitable to play the role as a fair examiner yet.

What questions the evidence raises about the role of AI in speaking assessment

Current evidence suggests a need to reevaluate what conversational AI can realistically deliver in the context of speaking assessment.

Leading technology like AI-powered SDSs could have benefits such as consistency, reduced social pressure, flexibility over timing and improved accessibility. It could also enable tests to assess a wider range of speaking abilities without putting more pressure on human examiners.

However, speaking is not just about producing language. Test takers must draw upon a range of abilities such as listening, cognitive and interpersonal skills in real time. Interactions also require managing turn taking, topic shifts and clarification among other competencies (Goh & Aryadoust, 2025). Because these skills emerge through conversation, the conditions in which they take place could significantly influence how candidates perform.

Conversational AI for speaking assessments: Technology with future potential

Conversational AI is already influencing how test providers design and deliver speaking assessments, offering new possibilities for making computer-delivered tests more authentic. It not only allows for more dialogic interactions but also behaves more like a human examiner in how it responds to requests for clarification and simulates role plays. While there are some limitations, the technology shows a lot of promise.

For high-stakes assessment contexts, traditional rule-based SDSs, as opposed to their AI-powered counterparts, therefore seem to be a more practical choice in the short term, given their greater consistency, scalability, and control over test delivery.

References

Dombi, J., Sydorenko, T., Timpe-Laughlin, V. (2024). Openings and closings in human-human versus human-spoken dialogue system conversations (PDF 842 KB - 30 pages)
Goh, C., Aryadoust, V. (2025). Developing and assessing second language listening and speaking: Does AI make it better?
Karatay, Y., Xu, J. (2025). Exploring the potential of conversational AI for assessing second language oral proficiency
Seedhouse, P. (2019). The dual personality of ‘topic’ in the IELTS speaking test
Toprak, T. E.(2024). Natural language processing applications in language assessment: The use of automated speech scoring

Exploring the potential of conversational AI in speaking assessment