AI hallucinations in law: how to recognise and avoid them

An AI that invents court rulings, misquotes statutes, or cites non-existent legal commentary – that may sound like a cautionary tale from a technology conference. But it is documented reality. In 2023, a US law firm submitted court filings containing hallucinated ChatGPT citations. The outcome: sanctions, reputational damage, and disciplinary proceedings.

For Swiss law firms and in-house legal teams using or evaluating AI-assisted tools, AI hallucinations in legal work are not a theoretical concern. They are an operational risk question.

What AI hallucinations are – and why they are so dangerous in law

In this context, "hallucination" refers to when an AI model generates information that is inaccurate, misleading, or simply fabricated – but sounds plausible. The problem: correct and incorrect answers look identical. There is no visual warning, no different font colour, no flag.

In general business contexts, hallucination rates vary between 0.7% and 29.9% depending on the model (Vectara). In legal contexts, specialist sources put that figure at up to 88%. That is not a marginal issue.

Why legal work is particularly vulnerable

Legal texts demand precision at the word level. Whether a clause says "may" or "must", whether a deadline is three or thirty days, whether a ruling dates from 2019 or 2021 – these are not stylistic questions. In the worst case, such details determine liability, contract validity, or litigation outcome.

AI models are pattern matchers. They were not trained to tell the truth; they were trained to generate probable next tokens. In a legal context, that means the model writes with confidence even when it has no reliable source. Invented Federal Court rulings, incorrect statutory articles, non-existent commentary references – all of this has already been documented.

The professional risks of unverified AI outputs

A lawyer who passes AI-generated content directly into pleadings or opinions without checking it risks more than a bad day. The risks include:

Professional misconduct from submitting false or unverifiable sources
Liability towards clients arising from incorrect legal advice based on hallucinated content
Reputational damage that, in a small market like Switzerland, can have long-lasting effects

German courts have already drawn consequences from unverified AI outputs. Swiss courts apply comparable standards. AI-generated opinions that are not verified are legally worthless.

Causes of AI hallucinations

Understanding how to avoid AI hallucinations in legal work starts with understanding the causes:

Incomplete or biased training data. When a model is trained on incomplete or flawed sources, it reproduces those flaws.

Lack of contextual understanding. Language models process text statistically, not through semantic-logical reasoning. They do not "understand" legal terms the way a lawyer does.

Excessive generalisation. The model fills gaps with plausible-sounding but freely generated content – particularly when a question falls outside its training knowledge.

Imprecise prompts. Vague or ambiguous inputs increase the likelihood that the model generates in unintended directions.

Strategies to avoid AI hallucinations in legal practice

There are technical, methodological, and organisational measures that significantly reduce the risk.

Retrieval-Augmented Generation (RAG)

RAG is currently the most effective technical approach against hallucinations. Instead of relying on the model's trained knowledge, AI answers are anchored in verified sources. The model does not generate freely; it draws on specific documents or databases.

Legal AI platforms that use RAG are structurally less prone to free invention – provided the source database is current and quality-controlled.

Source-based research instead of open generation

A key distinction exists between AI tools that generate openly (like a general-purpose LLM) and those that link answers to concrete sources. In legal work, the latter is not optional.

CASUS, a Swiss legal AI platform, uses a database of over 660,000 cantonal and federal court decisions as well as statutory articles in its Legal Research mode. Answers are linked to the relevant decisions and reasoning sections (Erwagungen), viewable inline without any additional click. That is structurally a different approach from a general language model.

Chain-of-thought prompting and precise inputs

How a question is framed significantly influences what comes back. Vague questions produce vague or incorrect answers. Concrete, scoped prompts with clear context reduce the space for hallucinations.

In everyday legal practice, this means: instead of "What does Swiss law say about contractual penalties?", ask "What requirements must be met under Art. 160 ff. CO to enforce a contractual penalty?" The second formulation is verifiable; the first invites free generation.

Human review as an indispensable step

No technical measure replaces expert verification. Every AI output that feeds into a pleading, opinion, or contractual clause must be checked by a qualified person. That is not an optional extra step – it is the legally required standard.

In practice: check citations, look up rulings, read statutory articles in the original text. AI can provide the starting point; the human provides the sign-off.

Organisational checklists for daily legal work

Firms introducing AI tools should establish clear internal rules: which outputs can be used without verification? Which ones require mandatory review? A risk classification by document type – email draft versus court submission, for example – helps allocate verification effort where it matters most.

What distinguishes good legal AI tools from general language models

Not all AI tools are equally suitable for legal work. The relevant differences lie in the architecture, not the interface.

Legal AI platforms like CASUS are designed for specific tasks: contract analysis, benchmarking against playbooks, proofreading, data room extraction, and legal research with verified sources. That specialisation reduces the space for hallucinated content because the system works within the context of the specific document – not in a vacuum.

General-purpose LLMs, by contrast, generate from their entire training corpus. In a legal context, that is structurally riskier.

Data protection and security are also not peripheral concerns in the Swiss legal market. CASUS hosts exclusively in Switzerland and the EU, does not transfer data to the US, and operates without permanent data storage or human review of documents. More details are available on the CASUS security page.

Using CASUS to minimise hallucination risk

The Legal Research mode in CASUS delivers structured, source-based assessments. Relevant decisions and statutory articles are linked directly within the answer; reasoning sections can be viewed inline. That enables quick verification – not blind trust.

In the AI Chat with Agent Mode, the AI works on the specific document. Answers are linked to text passages that can be jumped to directly. This substantially reduces free generation because the model is anchored to the document context.

For proofreading contracts before sending, the Proofread module checks linguistic and formal consistency – cross-references, definitions, numbering, placeholders – without stepping into legal judgment.

Use AI carefully – not fearfully

AI hallucinations in legal work cannot be reduced to zero. Anyone claiming otherwise is, in a sense, hallucinating themselves. What is achievable: a substantially reduced risk through the right combination of suitable platform, precise prompts, and consistent human review.

Those working with AI in the Swiss legal sector and relying on source-based, structured, and traceable outputs can try CASUS for free: Get started.

FAQ

What are AI hallucinations in legal work?

AI hallucinations in legal work are instances where an AI model generates legal content that is factually wrong, misleading, or entirely fabricated – for example non-existent rulings, incorrect statutory articles, or invented literature references – while appearing plausible and correct.

How high is the hallucination rate for legal AI tools?

The general hallucination rate for language models ranges from 0.7% to 29.9% depending on the model (Vectara). In legal contexts, specialist sources put the figure at up to 88%. Legal work is considered particularly vulnerable because precise facts are required, while models are trained on plausibility.

What professional consequences can arise from unverified AI use?

Lawyers who pass hallucinated AI outputs directly into court submissions or legal opinions risk professional misconduct findings, client liability claims, and reputational damage. A documented US case from 2023 shows that courts do impose sanctions when hallucinated rulings are submitted.

What is Retrieval-Augmented Generation (RAG) and why does it matter?

RAG is an architecture where AI answers are anchored to a verified source database rather than generated freely. In legal contexts, that means the model draws on specific rulings, statutes, or documents rather than its general training knowledge. This substantially reduces hallucination risk.

How can AI hallucinations be avoided in day-to-day legal practice?

Three measures work together: formulate precise, context-rich prompts; use AI platforms that link answers to verifiable sources; and have every AI output that enters a legal document reviewed by a qualified person. No single measure is sufficient on its own.

Is legal AI actually advisable for Swiss law firms?

Yes – when the right tools are used with the right workflow. The problem is not AI itself, but uncritical acceptance of its outputs. Specialised platforms with source anchoring, Swiss hosting, and clear data protection standards carry a different risk profile than general-purpose language models.

How does CASUS differ from general AI tools like ChatGPT?

CASUS is a specialised legal AI platform that works on specific documents or a verified legal database of over 660,000 decisions. Answers are linked to sources and directly traceable. ChatGPT and similar general-purpose models generate from their entire training corpus without mandatory source references.

Which prompting strategies reduce hallucination risk?

Concrete, scoped questions with clear legal context produce more reliable answers than open-ended ones. Chain-of-thought prompting – asking the model to make its reasoning steps explicit – can surface errors before outputs are reused. Source anchoring in the prompt ("refer only to CO Art. 160 ff.") further limits the space for free generation.