News Image
DON'S COLUMN

Taming The Machine

Managing GenAI’s Hallucination Risk In Audit
KELVIN K.F. LAW
BY KELVIN K.F. LAW


Generative artificial intelligence (GenAI) is no longer waiting for an invitation to the audit room but has firmly taken root at the audit table. The conversation has shifted from whether to use this transformative technology to how to deploy it in a manner that is testable, defensible, and carefully aligned with professional standards. The challenge is not merely technological but a fundamental question of governance. For audit professionals, harnessing the power of AI while mitigating its inherent risks (chief among them, the phenomenon of “hallucination”) is the new frontier of professional responsibility.

What does an AI hallucination look like in an audit context? It manifests as an output that appears plausible and is often rich with specific details that are, in fact, non-existent. For instance, when an auditor prompts GenAI about a client’s debt covenants, the model may return a specific capital expenditure cap and cite a section number, neither of which exists in the actual loan agreement. Even the most advanced GenAI produces such fabrications with a non-zero frequency, particularly when tasked with complex, fact-based requests. This is a reality auditors cannot resolve with improved prompts alone.

To effectively manage this risk, we must first understand its origin. Hallucination is not a bug, but an intrinsic feature of how current large language models (LLMs) operate. At their core, these GenAI models are probabilistic engines designed to predict the next most likely word, optimising for grammatical form and coherence, not factual truth. They are masters of plausible language. A hallucination, or what might be more accurately termed a “confabulation”, occurs when the model fills gaps in its knowledge by generating statistically probable but factually incorrect content. It doesn’t “know” it is lying; it is simply completing a pattern. This is an inherent architectural trait that cannot be completely eliminated through better prompting alone.

THE RISK IS REAL, AND IT HAS LEGAL PRECEDENTS

The abstract nature of AI risk became starkly tangible in two recent incidents, elevating hallucination from a technical novelty to a critical governance issue. In the 2024 case of Moffatt versus Air Canada, a tribunal held the airline accountable for misinformation provided by its customer service chatbot, and awarded damages to the affected passenger. The tribunal’s reasoning is clear – the chatbot is an extension of the company’s website, and therefore, the company is liable for its statements. This principle extends deep into professional services. In Mata versus Avianca (2023), a US federal court sanctioned lawyers US$5,000 for submitting fabricated case citations generated by an AI tool. This was not an isolated event. This year, attorneys from the prominent firm Morgan & Morgan were similarly sanctioned for AI-driven mis-citations. These cases send an unambiguous signal – that professionals remain unequivocally accountable for the outputs of the AI tools they leverage.

This wave of adoption is palpable in Singapore, as we are at the forefront of technological integration. A 2024 EY survey revealed a dramatic surge in workplace GenAI use in Singapore, jumping from just 24% in 2023 to 79%. Further highlighting this trend, Microsoft’s 2024 Work Trend Index for Singapore reported that an astonishing 88% of knowledge workers are already using GenAI, and saving an average of 30 minutes per day. Although this rapid adoption heralds a new era of efficiency, it simultaneously amplifies a core audit risk – the propensity of AI to deliver answers that are confident, precise, and entirely wrong.

FIVE CONTROLS TO MAKE AI AUDITABLE

To navigate this complex terrain, audit firms must embed a robust set of controls into their AI-assisted procedures. These five controls provide a framework for making AI not just a tool of efficiency, but a bastion of auditable, defensible work.

1) Grounded retrieval

First is the implementation of open-book prompts, a technique commonly known as retrieval-augmented generation (RAG). This procedural-control approach grounds the AI model in a curated corpus of trusted documents, such as signed PDFs, prior-year workpapers, and client policies. The approach requires the GenAI to extract or quote directly from these approved sources only. To ensure traceability, every prompt, the retrieved passages, and the final answer should be archived as a single, unified artifact in the engagement file. RAG significantly reduces the risk of fabricated content by constraining the model to a defined universe of information. However, firms must recognise that RAG itself is not foolproof. The retrieval step can be fragile, potentially missing key context or pulling irrelevant passages, which is why it must be paired with rigorous verification.

2) Verifiable outputs

Second, every claim generated by AI must be traceable through “show-your-work” verification. This detection control mandates that the AI provides source-backed answers by default, embedding citations directly into its output. A critical subsequent step is a verification pass that cross-checks these citations and flags any assertion that lacks a source. It is critical to note that this verification must go beyond checking atomic facts. An LLM can correctly cite two separate facts but then draw a logically invalid conclusion from them. The human reviewer’s role is therefore not just to check the citations, but to scrutinise the reasoning between the facts.

3) Structured data

The third control is the enforcement of structured outputs. To eliminate ambiguity and accelerate the review process, AI responses should be constrained to predefined fields and machine-readable formats like JSON or structured tables. For example, during an inventory observation, the AI’s output could be required to populate specific fields for location, exception type, and an evidence snippet. This structured approach is not merely about neatness, it creates a consistent, testable schema for workpapers.

4) Meticulous governance log

Fourth, a comprehensive AI governance log is indispensable for documenting intent, scope, and accountability. For every AI-assisted audit step, this log should capture the objective, data boundaries, retrieval sources, specific model and version used, and any operational parameters. This practice aligns with the documentation duties outlined in ISA 230 and maps directly to the expectations of frameworks like the National Institute of Standards and Technology (NIST) AI Risk Management Framework and ISO/IEC 42001.

5) Human scepticism

Finally, the principle of the human-as-supervisor must be upheld to preserve professional scepticism. An effective “Architect-Supervisor-Judge” workflow ensures that human oversight is integrated at every critical juncture. The Architect designs the procedure, the Supervisor monitors retrieval quality and constraints, and the Judge critically challenges the conclusions. Looking ahead, this model will evolve from “human in the loop” to “human supervising agents”. The goal is to build specialised audit agents that execute tasks with these controls as built-in guardrails, shifting the human role from performing AI-assisted work to directing and validating the outcomes of a team of AI agents.

MEASURE TO IMPROVE: THE HALLUCINATION SCORECARD

A framework is incomplete without measurement. This mirrors an important discipline emerging from the core of AI development itself: the creation of standardised benchmarks.

Researchers now rigorously test systems against curated, human-verified datasets specifically designed to provoke and detect confabulations. A firm’s “Hallucination Scorecard” is the internal adoption of this same discipline. It should monitor retrieval accuracy (Did RAG pull the right context?), citation precision (Did the citation link to the correct evidence?), and factual error rate (How often did a human judge have to intervene?). This data is not just for compliance but is essential for finetuning systems, demonstrating reliability, and proving the return on investment (ROI) of governance.

ALIGNING WITH THE REGULATORY LANDSCAPE IN SINGAPORE

For firms in Singapore, these controls should be implemented in alignment with a dynamic regulatory and professional environment. Although the Accounting and Corporate Regulatory Authority (ACRA) has not yet issued AI-specific audit rules, its 2024 Audit Regulatory Report highlights recurring quality themes, underscoring the need for disciplined documentation where AI contributes to a finding. The Institute of Singapore Chartered Accountants (ISCA) is actively laying the groundwork for professional adoption through its S$2 million “AI for AI” programme. Furthermore, the Monetary Authority of Singapore’s Principles to Promote Fairness, Ethics, Accountability and Transparency (FEAT) principles provide a robust benchmark for regulated clients.

GOING FORWARD

Ultimately, the risk of AI hallucination is not a reason to shun the technology, but a mandate to manage it with discipline. By combining grounded retrieval, verifiable outputs, structured data, a meticulous governance log, and unwavering human scepticism, audit teams can build the necessary evidence trail. The journey should begin where the stakes are highest, especially in areas such as complex estimates and legal text extraction. These external controls represent today’s state-of-the-art yardsticks for using AI responsibly. They are the essential bridge to a future where models are built with auditable reasoning processes from the inside out. Only by doing so can the profession ensure that the machine’s statistical truths are always tethered to the auditor’s verifiable facts.


Kelvin K.F. Law is Associate Professor of Accounting, Nanyang Business School, Nanyang Technological University.

Loading spinner