AI and the evolution of document review and production

10 Minute read 07.04.2026 Jason McQuillen and Sarah McCann

AI is transforming legal document review — from Technology-Assisted Review and CAL to Generative AI — reshaping speed, accuracy and defensibility in discovery and investigations.

Key takeouts

CAL is the gold standard for defensible review. Courts and bodies like the Sedona Conference have endorsed it, making it the preferred methodology where court or regulator approval is required.

Gen AI processes thousands of documents in minutes and delivers real-time strategic insights, but lacks CAL's defensibility precedent and courts are yet to develop standards for assessing its adequacy.

The optimal approach combines both technologies with human oversight: Gen AI for early triage and summarisation, CAL for formal review and production, and senior lawyer review for privilege and edge cases.

Artificial intelligence (AI) is fundamentally reshaping the legal indsutry. From contract analysis and legal research to due diligence and regulatory compliance, generative AI is transforming how legal professionals work, promising benefits in speed, efficiency and insight. The field of discovery, investigations and regulatory responses is no exception, and the introduction of AI builds on an evolution in the practice of document review and production.

What was once a labour-intensive, linear process dominated by human review has evolved into a sophisticated ecosystem where AI is playing an increasingly integral role. For organisations facing a requirement to collect, review and produce large data sets, understanding the various tools and approaches available is critical to managing timelines, cost, risk and defensibility.

This article examines the evolution of document review and production over the past decade, from the early adoption of Technology-Assisted Review (TAR) through Continuous Active Learning (CAL), to the emergence of AI, now including Generative AI. We analyse how these technologies compare across key metrics including speed, accuracy and defensibility, and provide practical guidance for how to decide the optimal approach to investigations, litigation and other document production depending on your context.

The traditional review challenge

Throughout the 2010s, legal teams faced an escalating challenge in the form of exponential growth in electronically stored information. Matters routinely involved production obligations spanning years of email correspondence, contract repositories, messaging platforms and financial records, often numbering in the millions of documents. Linear review – where lawyers manually examined every document – was becoming economically unsustainable and temporally impossible given the timelines courts and regulators imposed. Offshoring to lower cost jurisdictions became commonplace but was still dependent on humans.

The introduction of Technology-Assisted Review (TAR)

The mid-2010s witnessed the adoption of Technology-Assisted Review, particularly predictive coding and later Simple Active Learning. These technologies promised to reduce review populations dramatically whilst maintaining high levels of accuracy.

Early TAR deployments followed a "seed set" methodology: senior lawyers would review a statistically significant sample of documents, train a predictive model, and then apply that model to categorise the remaining population. Whilst effective in reducing volumes, the approach required substantial upfront investment in training and remained dependent on large-scale human review of predicted-responsive documents.

Continuous Active Learning (CAL): the current standard

Continuous Active Learning represents a refined approach to Technology-Assisted Review. Unlike traditional predictive coding, CAL employs an iterative process where the algorithm continuously learns from reviewer decisions and prioritises the most informative documents for human review.

The workflow typically proceeds as follows:

Initial training: reviewers code a statistical subset of the total review pool (often 500–1,000).
Model prediction: the algorithm surfaces documents likely to be responsive based on similar concepts.
Prioritised review: as reviewers continue coding documents the algorithm prioritises and deprioritises the remaining documents in the population based on themes and concepts in reviewed documents.
Continuous refinement: the model updates continuously based on each coding decision pushing material to opposite ends of a bell curve scoring matrix.
Validation: once the model stabilises, a statistically valid sample of predicted non-responsive documents is reviewed to measure 'recall' (the proportion of relevant documents identified).
Production determination: documents meeting responsiveness thresholds are designated for production.

CAL brings a number of advantages:

Speed: CAL significantly accelerates review timelines compared to linear review. By prioritising the most relevant documents, reviewers focus their attention where it matters most, often reducing review populations by 20–40%.
Accuracy: CAL is typically accepted as a method of ceasing review early once it reaches recall rates of 80–85%. The continuous learning cycle allows the algorithm to refine its understanding of responsiveness throughout the review.
Defensibility: CAL enjoys widespread acceptance in jurisdictions globally, including Australia. Courts have endorsed the methodology in numerous decisions, and industry standards (such as those published by the Sedona Conference) provide detailed guidance on defensible implementation. The ability to measure and report recall, precision and confidence levels provides quantitative evidence of review quality.
Cost efficiency: by eliminating the need to review 20-40% of the document population, CAL substantially reduces both review time and costs.
Transparency: CAL workflows generate detailed audit trails showing training decisions, model performance metrics and validation results, facilitating defensibility submissions to courts and regulators.

However, there are a number of limitations with CAL:

Richness dependency: CAL efficiency is heavily influenced by relevance richness (the prevalence of responsive documents in the population). When richness exceeds 80–90%, efficiency gains diminish significantly, as most documents require review regardless.
Training requirements: CAL requires consistent, high-quality training from senior reviewers familiar with the matter's issues. Inconsistent coding decisions during training degrade model performance and reduce efficiency, resulting in more documents that appear in the centre of the scoring bell curve as 'unsure' documents.
Technology learning curve: legal teams must develop expertise in CAL workflows, understand statistical validation concepts and learn to interpret performance metrics – a barrier for smaller firms or infrequent users.
Ongoing human review: substantial human review effort remains necessary even with CAL, including all producible documents, associated attachments and validation samples of non-responsive documents to ensure adequate coverage. Not to mention the pool of material unsuitable for CAL still requires manual human review.

Generative AI: the emerging frontier

Generative AI tools (based on large language models like GPT-4, Claude, or Sonnet) approach document review fundamentally differently from CAL. Rather than learning from user coding decisions to predict document categories, Generative AI can:

Understand context and nuance: large language models comprehend legal concepts, commercial relationships and factual scenarios without extensive training.
Summarise and synthesise: generate concise summaries of lengthy documents, identifying key provisions, obligations and risks.
Answer specific queries: respond to targeted questions about document content (e.g., "Does this contract contain indemnification provisions favouring the supplier?").
Extract structured data: identify and tabulate specific data points across large document sets (parties, dates, monetary amounts, jurisdictions).
Perform multi-document reasoning: analyse relationships between documents, identify inconsistencies and construct chronologies.

Generative AI tools promise a number of advantages:

Speed: generative AI can process documents extraordinarily quickly. Where human review might process 40–70 documents per hour and CAL requires iterative training cycles, Gen AI can analyse thousands of documents in minutes, generating summaries, extracting data or answering queries at scale.
Live results for early decision-making: unlike traditional manual review, where teams must wait days or weeks for consolidated reports, Gen AI provides near-instantaneous results. As documents are processed, extracted data populates the tabular review in real time, enabling:

Immediate strategic insights: legal teams can identify key issues, problematic documents or gaps in documentation within hours rather than weeks.
Dynamic case strategy: early identification of key evidence allows lawyers to adjust scope, identify new avenues for inquiry or refine legal theories whilst review is ongoing.
Client updates: real-time results facilitate prompt client reporting of investigative findings and discovery risk assessment.
Resource allocation: project managers can redirect review resources based on emerging patterns visible in live data.
Settlement discussions: in litigation contexts, early identification of adverse or favourable documents can inform settlement timing and negotiation strategy.

Minimal training requirements: unlike CAL, which requires extensive human coding to train models, Gen AI can be deployed immediately using natural language instructions (prompts). This dramatically reduces the upfront investment required to commence review.
Sophisticated analysis: Gen AI can perform tasks beyond simple classification, including identifying themes, detecting anomalies, constructing privilege logs and generating first drafts of chronologies or fact summaries.
Flexibility: the same Gen AI system can address multiple review objectives without separate training exercises – issue coding, data extraction and summarisation can occur simultaneously.

Limitations and risks of generative AI

Large language models (LLMs) are improving rapidly, with each generation demonstrating enhanced accuracy, reasoning capability and factual reliability. However, their fundamental architecture remains probabilistic – they predict the most likely next token (word or word fragment) based on patterns learned from training data, rather than retrieving and verifying facts from a trusted database.

This probabilistic nature means LLMs can be prone to "hallucinations"– generating plausible-sounding but factually incorrect outputs if the AI is being used to generate content in an open-source environment.

This means their use for generating content will not always be appropriate for discovery. Instead, AI in large- scale document review is typically limited to providing 'yes /no' scoring decisions for each document considered, rather than generating a summary or new content. The scoring can then be used in a similar way to the outputs generated by CAL, ranking documents in a bell curve of responsive material to prioritise for legal review.

For an AI's scoring decisions to stand in the shoes of CAL, and be used to defensibly cease review early, the Gen AI must be applied with robust safeguards to appease Courts, Regulators and other parties that it has been used reliably and appropriately:

Prompt engineering for accuracy: prompts must be carefully curated to favour facts over creativity. This involves:

Simple language queries that do not contain a long list of variables (as this decreases the AI's confidence in whether a document is responsive)
Factual background that does not 'assume' facts or allegations to be true simply because that suits the client's narrative
Providing background context of the key people, organisations and events and how they inter-relate with the categories of relevance

Validation protocols: human oversight remains essential:

Statistical sampling of AI outputs to measure accuracy rates
Mandatory human review of high-stakes determinations (privilege, key document identification)
Audit trails linking AI outputs to source text for verification
Quality control checkpoints at regular intervals throughout the review

Transparency and documentation: maintaining detailed records of:

Prompts and model configurations used
Validation sampling methodology and results
Error rates and corrective measures implemented
Human review workflows applied to AI outputs

Other considerations for generative AI use

There are other factors that must be considered in deciding how to use Generative AI in your review process:

Lack of established defensibility: unlike CAL, which benefits from over a decade of judicial acceptance and established validation methodologies, Generative AI in discovery contexts lacks comparable defensibility precedent. Courts have not yet developed standards for assessing the adequacy of Gen AI-assisted review, creating risk for early adopters who apply it without regard for compliance with the fundamentals of discovery obligations. This is where the guidance from an eDiscovery consulting team with experience in AI-enabled review is crucial.
Limited transparency: large language models operate as "black boxes" – it is difficult to explain why a model reached a particular conclusion about a document, until documents are independently verified or otherwise human reviewed. Without regard for adequate validation, verification and conflict workflows this opacity can conflict with discovery obligations and legal teams may be left in the dark without a reasonable explanation for their review methodology.
Validation challenges: establishing appropriate validation protocols for Gen AI review, while attempting to keep human review of the documents proportionately lower than CAL or traditional review, is an ongoing challenge. Whilst Gen AI can assist prioritising document pools, richness of relevant documents will always erode time and cost savings. Conversely, without validation, it's impossible for teams to be certain that their prompts have set the AI up for success to identify all the situations where key documents may arise.

An optimised approach

The evolution of document review and production over the past decade – from linear review through CAL to Generative AI – reflects the broader transformation of legal practice through artificial intelligence. However, practical deployment requires realistic assessment of current capabilities and limitations; the key is matching technology to matter requirements.

While CAL remains the "gold standard" for firms unfamiliar with the workflows required to deploy AI in situations requiring defensible methodology and court or regulator approval, MinterEllison's Discovery & Data Intelligence team was one of the first teams to establish an approved workflow and have been actively deploying it on matters since 2024. CAL and AI-enabled review delivers maximum efficiency for moderate richness populations (under 50% responsive rates). In other legal review contexts, where it is not critical to produce a transparent workflow, teams can be more liberal with the use of AI. Typically, Generative AI may be favoured where speed outweighs defensibility formality (such as internal investigations) or early assessment is required for decision-making (such as before formal discovery)

An optimised approach will often involve deploying both technologies strategically, alongside expert human review:

Gen AI for early triage: rapidly identify high-priority documents, themes and key custodians
Gen AI and CAL for formal review: apply proven CAL methodology to the document population requiring production decisions
Gen AI for enhancement: use Gen AI to summarise production documents, extract data and prepare chronologies
Senior lawyer review for final quality control: human oversight of privileged documents and edge cases

By running CAL on Gen AI results, practitioners can learn to use both tools better. Determining strengths and weaknesses of each will allow better understanding of appropriate use cases. CAL offers a validation layer, making it a keystone for any Gen AI-based review strategy.

The future of high-volume document production is undeniably AI-driven, but not exclusively Gen AI. No single approach suits all matters. The most sophisticated practitioners will develop fluency across multiple methodologies, applying the right tool to the right problem whilst maintaining the transparency, accuracy and defensibility that discovery obligations demand.

As Gen AI continues to mature and courts develop standards for its use in discovery contexts, we can expect the balance to shift. But for now, the prudent approach combines cutting-edge technology with established, defensible methodologies – ensuring that innovation serves, rather than undermines, our fundamental obligations to clients, courts and regulators.

MinterEllison's Discovery & Data Intelligence is a national team of experts with decades of experience deploying market-leading and proprietary technologies, in combination with rigorous process, to deliver accurate, efficient outcomes to support transactions, disputes and investigations.

For specific advice on strategy for your matter, please contact us.

Contact

Jason McQuillen

Partner, MinterEllison Consulting, Sydney
- +61 2 9921 4134
- +61 472 522 468
Sarah McCann

Senior Associate - Discovery and Data Intelligence
- +61 7 3119 6684
- +61 431 073 137

Trending

Australian Federal Budget 2026/27

CPS 230: Your roadmap to compliance

Fair Work Act: reasonable redeployment and genuine redundancies

AI and the evolution of document review and production

Key takeouts

The traditional review challenge

The introduction of Technology-Assisted Review (TAR)

Generative AI: the emerging frontier

Limitations and risks of generative AI

Other considerations for generative AI use

An optimised approach

Contact

Jason McQuillen

Sarah McCann

Tags

Read next