A legal AI glossary explains the technical vocabulary behind AI-powered legal tools – from large language models and RAG to hallucinations. Anyone evaluating legal AI products or working with them daily needs this vocabulary: it helps interpret vendor claims, name risks accurately, and make informed decisions about deploying AI in a law firm or in-house team.

What is legal AI?

Legal AI refers to AI systems used specifically for legal tasks – contract analysis, risk detection, legal research, document review. They do not replace lawyers, but they handle repetitive cognitive work: reading texts, recognising structures, flagging deviations.

In Switzerland, law firms and in-house teams are increasingly using platforms like CASUS, which analyse documents directly in Microsoft Word or a web app – without data transfer to the US and with zero data retention.

The 25 most important legal AI terms

1. Artificial intelligence (AI)

Machines that simulate human thought processes: learning, pattern recognition, problem-solving. Current legal AI is so-called narrow AI – specialised for specific tasks. General, conscious AI does not exist in any productive form.

2. Machine learning (ML)

ML systems learn rules from data and outcomes rather than being explicitly programmed. In law: contract classification, risk detection, clustering of similar documents.

3. Supervised learning

An ML method where the model is trained on labelled example data – for instance, contracts manually marked as "high risk" or "low risk". The model learns to classify new documents accordingly.

4. Unsupervised learning

ML without predefined labels. The model identifies patterns and groupings in data on its own. Useful for document structuring in large datasets, such as in due diligence processes.

5. Deep learning

A subset of ML that uses multi-layered neural networks and large volumes of data. It enables more complex language processing and semantic understanding of contract text.

6. Large language model (LLM)

A language model trained on vast amounts of text that understands and generates human language. A well-known example is GPT-4. LLMs are the foundation of most modern legal AI tools – for summaries, drafts, and research.

7. Natural language processing (NLP)

NLP refers to techniques that allow machines to process and understand natural language. In legal AI: extracting clauses, identifying contract parties, analysing unstructured legal text.

8. Generative AI

AI that creates new content – text, drafts, summaries. Built on LLMs and NLP. In law: contract wording, memos, response emails. Generative AI produces drafts that still require legal review.

9. Prompt

An instruction or question given to an AI system. In a legal AI context, the precise formulation of the prompt significantly influences output quality. Specific, context-rich prompts produce better results than vague requests.

10. Context window

The amount of text a model can process at once – its working memory. A small context window means long contracts must be split into parts (chunked). Modern models have increasingly large context windows, which improves handling of lengthy documents.

11. Chunking

Splitting large documents into smaller segments for model processing. Relevant when a contract exceeds the context window. Well-implemented chunking preserves semantic continuity across segment boundaries.

12. Hallucination

An AI system invents information – court decisions, statutory articles, clause content – that does not exist or is factually wrong. Hallucinations are the primary risk in legal AI use. Reliable systems work from sources and make citations transparent.

13. RAG – retrieval-augmented generation

RAG combines a language model with an external knowledge base. Rather than drawing only from training data, the system actively retrieves relevant documents or decisions and incorporates them into its answer. This reduces hallucinations and makes outputs more traceable.

14. Embeddings

Mathematical vector representations of text that make semantic similarity measurable. Embeddings enable semantic search: "find all clauses resembling liability limitations" – even when different terminology is used.

15. Fine-tuning

A pre-trained model is further trained on specific data – for instance, a firm's own contract archive – to deliver more precise results for particular tasks. Resource-intensive and sensitive from a data protection perspective.

16. Agentic AI

AI systems that plan and execute multi-step tasks autonomously, without needing a new instruction for each step. In a contract context: an agent analyses, identifies what needs changing, and inserts a clause at the right location. Agentic AI still requires substantial human oversight.

17. Chain of thought

A technique where the model exposes its reasoning step by step before reaching a conclusion. Important for traceable, defensible outputs – for example in legal assessments or contract risk analyses.

18. Guardrails

Technical or policy-based safeguards that constrain AI system behaviour – such as prohibiting certain output types, setting accuracy thresholds, or requiring escalation when uncertain. Particularly relevant in law for compliance and liability questions.

19. Zero data retention

A data protection principle: input data is not stored after processing, not logged, and not used for model training. Relevant for attorney-client privilege and client confidentiality. CASUS operates with zero data retention.

20. Data residency

The physical location where data is stored and processed. For Swiss law firms, data often must not leave Switzerland or the EEA. CASUS hosts in Switzerland and the EU – no data transfer to the US.

21. Bias / algorithmic bias

When training data is systematically skewed, the model reproduces those biases in its outputs. In law, this is problematic in decision support, risk assessments, or predictive models.

22. Structured output / structured findings

AI outputs in a defined format – for example, a table with risk level, party, clause excerpt, and improvement suggestion. Structured outputs can be directly processed, transferred into reports, or shared with colleagues. The CASUS Risk & Quality Review delivers findings structured by assignment, relevance, and severity.

23. Legal analytics

Analysis of large volumes of legal data – court decisions, litigation outcomes, judicial behaviour – to generate strategic insights. Examples: probability of success in a given court, typical damages in liability cases.

24. Semantic search

Search that accounts for meaning and context, not just exact keywords. This makes it possible to find all clauses on "liability limitations" even if the clause uses the phrase "cap on damages".

25. Vectorization

The process of converting text into numerical vectors that can be mathematically compared. The basis for semantic search and embedding-based similarity comparisons. In practice: a contract is "vectorised" and then matched against a clause database.

Why these terms matter for lawyers in Switzerland

Anyone buying or internally evaluating legal AI products will encounter these terms in vendor presentations, data protection assessments, and security questionnaires. Knowing the vocabulary makes the difference between a well-negotiated SaaS agreement and one with data protection gaps.

Concretely: understanding zero data retention and data residency allows precise questions about whether a vendor transfers data to the US. Knowing what hallucinations are makes it possible to check whether a system works from sources or generates answers without any basis.

For Swiss law firms and in-house teams considering legal AI, the CASUS security and data protection page documents the technical privacy measures of the platform in detail.

Working with CASUS makes these concepts visible in practice: the AI Chat with Agent Mode uses structured outputs and source references, the Legal Research mode searches over 660,000 cantonal and federal court decisions from cited sources, and the AI Data Room extracts clause content from hundreds of documents according to self-defined fields. Teams that want to test these capabilities can try CASUS at app.getcasus.com/signup.

FAQ

What is a legal AI glossary?

A legal AI glossary is a structured collection of definitions for technical terms in AI-powered legal tools – from foundational concepts like machine learning to specific security terms like zero data retention.

What does hallucination mean in a legal AI context?

Hallucination refers to AI-generated content that is factually wrong or entirely fabricated – for example, non-existent court decisions or incorrect statutory references. Reliable legal AI systems work from sources and make citations transparent.

What is RAG and why does it matter for legal AI?

RAG (retrieval-augmented generation) combines a language model with an external knowledge base. The system actively retrieves relevant sources – such as court decisions or statutory articles – before responding. This reduces hallucinations and makes outputs traceable.

What does zero data retention mean for law firms?

Zero data retention means that input data is not stored after processing – neither for logs nor for model training. For law firms, this matters because client data and privileged information do not flow into third-party systems.

What is the difference between fine-tuning and pre-training?

Pre-training is the initial training of a model on large, general datasets. Fine-tuning is further training on specific data – such as legal texts – to improve accuracy for particular tasks. Fine-tuning is resource-intensive and sensitive when client data is involved.

What is agentic AI in a legal context?

Agentic AI refers to systems that can plan and execute multi-step tasks without being re-instructed at each step. Example: an agent reviews a contract, identifies missing clauses, and inserts suitable wording at the right location. Human oversight remains necessary for complex legal questions.

What is the difference between semantic search and keyword search?

Semantic search accounts for meaning and context – it finds relevant passages even when different terminology is used from the search term itself. Keyword search only finds exact matches. In a contract context, semantic search finds "cap on damages" when searching for "liability limitation".

What data protection requirements apply to legal AI in Switzerland?

Key requirements come from the revised Swiss Data Protection Act (nDSG), the GDPR where EU connections exist, and attorney-client privilege obligations. Law firms should ensure that vendors do not transfer data to the US, maintain zero data retention, and host within Switzerland or the EEA.

Legal AI Glossary: The 25 Most Important Terms Explained