How MIRA thinks like a lawyer.
A senior lawyer reading a difficult question does not arrive at the answer in a single mental act. She moves, often without noticing, through a sequence of distinct cognitive operations. She separates the relevant facts from the surrounding noise. She identifies the legal issue or issues at stake. She brings to mind the framework of authority that governs the issue. She tests the framework against the facts, attending to the points at which the fit is uneasy. She drafts a tentative position, then turns it over and looks for the strongest contrary case. Only after all of this, and usually after a small interval of staring out of the window, does she commit to a view.
This sequence is not a quirk of the individual lawyer. It is the cognitive shape of legal work. The shape is durable across jurisdictions, areas of practice, and centuries. It has been formalised, in legal education, under the acronym IRAC, which stands for Issue, Rule, Application, Conclusion. The acronym is sometimes derided for its primary-school simplicity, but the simplicity is an asset rather than a flaw. It captures a structural truth that real lawyers do, in fact, use, even when they have long since stopped consciously rehearsing it.
The trouble with most artificial intelligence, in legal applications, is that it produces only the last of these. The conclusion. The output is a sentence, often a confident one, that asserts what the law says about the facts. The cognitive sequence by which the conclusion was reached is invisible. The user cannot inspect it because it does not exist in inspectable form. The model has produced a continuation of the prompt. It has not, in any reproducible sense, reasoned.
This essay is about the architecture by which a legal AI system can be made to reason in the structural sense. It describes what a reasoning chain is, why a chain is required rather than a single step, what the stages of the chain are, what each stage produces, how the stages are made to answer to one another, and why the resulting structure is not a debugging convenience but a feature of the product. The argument throughout is that intelligence in this domain is a process, not a point, and that any system claiming to assist legal work must, at minimum, be capable of showing its work.
Before describing the chain we have implemented, it is worth setting out, in greater detail, the cognitive operations that legal reasoning consists of. These operations are not exotic. They are the everyday practice of law, refined into a teachable form.
The first operation is fact discrimination. Most matters arrive with more facts than are legally relevant, and most relevant facts arrive without being labelled as such. The lawyer's first task is to read the matter and decide which facts will move the analysis. A change-of-control clause may turn on the precise mechanism of the share transfer. A consumer dispute may turn on whether the goods were sold to a private user or for resale. Until the lawyer has separated the facts that matter from the facts that do not, no further reasoning is sound.
The second operation is issue identification. A set of relevant facts gives rise to one or more legal questions. Sometimes the question is presented as such by the client. Often it is not. A client describes a commercial dispute and the lawyer recognises that the live legal issues are not the ones the client identified. Issue identification requires bringing to bear a structured catalogue of the kinds of legal questions that arise in the area, and matching the facts to the appropriate kind. It is essentially a classification operation, but a classification with depth, because the same facts can give rise to multiple issues at different levels of generality.
The third operation is the assembly of the legal framework. Once an issue has been identified, the lawyer brings to mind the rules that govern it. These rules consist of statutes, regulations, judicial holdings, and the principles that synthesise them. They are not stored in the lawyer's mind in unstructured form. They are organised by issue, by jurisdiction, by court, and by date. The assembly of the framework is therefore a directed retrieval. The lawyer is not searching everything she knows. She is selecting from the cluster of authorities that her training and habit have indexed under the issue in question.
The fourth operation is the application of the framework to the facts. This is the point at which retrieval becomes reasoning. The rules, abstract, must be brought into contact with the facts, concrete. The lawyer asks, of each rule, whether it applies on these facts, whether the facts satisfy the conditions of the rule, whether any exception is engaged. This is rarely a mechanical operation. It is often the place where the matter becomes interesting, because the facts will satisfy some conditions, frustrate others, and stand in an ambiguous relation to still others.
The fifth operation is the consideration of the contrary view. A lawyer worth listening to does not stop at her first conclusion. She actively seeks the strongest argument against it, not because she expects to be persuaded by it, but because anticipating it allows her to address it before it is raised. This is sometimes called adversarial reasoning. It is the discipline of treating one's own first instinct as a draft to be tested rather than a result to be defended.
The sixth operation is conclusion. The lawyer, having performed the previous five operations, commits to a view. The view is sometimes a single answer. More often it is a structured answer, in which the conclusion on the central question is qualified by observations on related questions, by indications of the strength of the answer, and by recommendations for what should be done next. The conclusion, properly formed, is the smallest part of the work product. It is also the part the client is most tempted to read first.
This sequence, repeated over a working life, is what legal reasoning is. A system that does not perform some recognisable analogue of the sequence is not a legal reasoning system. It is a generator that has been trained to produce text in the shape of legal reasoning's output.
The transition from a cognitive description of legal reasoning to an architecture that performs it is not a simple one. Cognition is opaque. Architecture must be explicit. Each of the cognitive operations described above must be turned into a stage that produces a defined artefact, takes defined inputs, and is amenable to inspection. The translation requires choices.
The first choice is granularity. Each of the six operations could be a single stage, or each could be decomposed into smaller stages. We have, in our current architecture, settled on a chain of stages that is finer-grained than the IRAC framework but coarser-grained than a per-token analysis. The granularity has been chosen so that each stage produces an artefact that is small enough to be inspected but large enough to be useful in its own right.
The second choice is what each stage produces. An ill-designed stage produces output that the next stage cannot consume. A well-designed stage produces output that is structured enough to be operated on without requiring re-reading from the surface. The output of issue identification, for example, is not a paragraph describing the issue. It is a structured object that names the issue, places it in a taxonomy, lists the questions it raises, and identifies the area of law to which it belongs. The next stage consumes that object, not the paragraph.
The third choice is what flows between stages. There is a temptation to allow each stage to consume the original prompt and the prior stage's output. This is the path of least architectural resistance, but it is also the path along which silent errors propagate. The prompt is messy. The prior output may already contain a mistake. A stage that has access to both can rationalise the mistake by reference to the prompt. We have therefore chosen, where possible, to constrain each stage to operate on the structured outputs of its predecessors, with the prompt available only at clearly designated points. This is more difficult to engineer and easier to debug.
The fourth choice is failure handling. Each stage can fail. The fact discrimination stage may fail to identify a relevant fact. The issue identification stage may fail to spot an issue. The framework assembly stage may fail to retrieve a controlling authority. The application stage may make a logical error. The contrary-view stage may fail to find the strongest opposing case. The conclusion stage may overstate confidence. A system that treats these failures as identical, or that surfaces only the worst of them, will be silently brittle. A system that distinguishes them, surfaces them appropriately, and offers the user a calibrated indication of where the chain is weakest is a system that has learned humility.
With those choices made, we may describe, in turn, the principal stages of the reasoning chain as it currently operates. The description is partial. The chain has more components than fit cleanly in an essay, and the implementation is in continual revision. The principal stages, however, are stable.
Each stage is logged. The complete reasoning chain, including the artefacts produced at each step and the choices made between alternatives, is recorded and available for inspection. This is not a debugging facility. It is part of the deliverable. A senior lawyer who wants to know why the system reached its conclusion is shown the chain. A junior lawyer who wants to learn from the system is shown the chain. A regulator who wants to audit the system's reasoning is shown the chain. The chain is the work product.
It is possible to build a system that performs all of the operations described above and still presents its output as a single confident sentence. The internal architecture would be sound. The user experience would not. We have come to regard the visibility of the reasoning chain as a non-negotiable feature, for reasons that go beyond transparency.
The first reason is that legal work is supervised work. A junior associate produces a draft. A senior reviews it. The review consists not of reading the conclusion but of tracing the reasoning behind it. The senior asks, of each step, whether it was sound, whether the authority cited is the right one, whether the application is clean, whether the contrary view has been properly addressed. A system that produces only the conclusion forces the supervisor to reconstruct the reasoning in her head, which defeats the entire point of having a system. A system that surfaces the reasoning chain integrates into supervision the way a junior associate integrates into supervision: as something whose work can be reviewed in the form in which it was produced.
The second reason is that legal work is teachable work. The discipline is transmitted between generations through the apprenticeship of supervised practice. A junior who reads only conclusions does not learn to reason. A junior who reads reasoning chains, sees how the steps connect, and observes where the system was uncertain, is being taught. A reasoning-chain interface is therefore not only a productivity tool. It is a pedagogical surface, and we design it as such.
The third reason is that legal work is contestable work. The conclusion produced for a client today may be challenged tomorrow, in a brief, in court, by opposing counsel, by a regulator. The lawyer relying on the system needs to be able to reproduce the reasoning at will, to defend it, to amend it, to abandon it where necessary. A system that has obscured its reasoning has, in effect, given the lawyer a position she cannot defend. A system that has surfaced its reasoning has given the lawyer a position with the warrants attached.
The fourth reason, and the most important, is that legal work is accountable work. The lawyer is responsible for the advice. She cannot delegate the responsibility to the system. She can only delegate the production of the draft. To bear the responsibility, she must be able to see what was done in her name. The reasoning chain is what makes that visibility possible. Without it, the system is a black box, and a lawyer who relies on a black box has not preserved her own accountability. She has merely deferred it to a process she cannot inspect.
The reasoning chain is most strained at exactly the points where legal reasoning itself is most strained. Three of these points deserve to be named.
The first is the joinder of overlapping issues. Many real questions are not questions about a single issue. They are questions whose facts engage multiple issues, often in tension. A tax matter may also raise a regulatory question. A contract dispute may also raise an employment question. A criminal complaint may also raise a constitutional question. The reasoning chain must, in such cases, produce a multi-issue analysis in which each issue is reasoned through separately and the interactions between them are addressed explicitly. This is more difficult than it sounds. The temptation, in any single-issue formulation, is to absorb the other issues into footnotes. The discipline is to refuse the temptation and to give each issue its own unfolded chain.
The second is the resolution of authority conflict. Sometimes the controlling authorities point in different directions. A High Court has held one way. The Supreme Court has held differently in a related but distinguishable matter. A coordinate bench has expressed doubt without overruling. The reasoning chain must, in such cases, do more than retrieve. It must adjudicate, at least provisionally, between the authorities, and must make the basis of the adjudication explicit. The discipline is not to suppress the conflict but to surface it. The lawyer using the system can then choose how to handle it, with full knowledge of the difficulty.
The third is the application of authority to facts that lie at the boundary of the rule. Many of the most consequential legal questions are boundary questions. The facts are close to the line. The rule could plausibly apply. It could plausibly not. The reasoning chain at the application stage must, in such cases, refuse to manufacture certainty. The honest output is a structured statement of the considerations on each side, with an indication of how the leading authorities have treated comparable boundary cases. This is more useful to the practising lawyer than a confident answer that the boundary case happens to land in either direction.
None of these difficulties is unique to AI. Each appears in human legal reasoning as well. The difference is that a human lawyer, on encountering the difficulty, slows down and becomes more careful. A poorly designed system, on encountering the difficulty, often becomes more confident. The architecture of the reasoning chain must include a mechanism for the system to slow down when the chain is most strained, rather than to barrel forward into a conclusion that is unsupported by the strain it has just absorbed.
Building the reasoning chain is one task. Maintaining it as the system is used is another. Several disciplines have, over time, become part of how we operate.
We treat regression in the chain with greater severity than regression in the conclusion. A new release that produces the same conclusions on a benchmark but with weaker reasoning chains is not, for our purposes, an equivalent release. It is a degraded one. The ability of the system to show its work is a property we protect explicitly, and not a side-effect we hope is preserved.
We separate the evaluation of each stage from the evaluation of the system as a whole. The fact-discrimination stage is evaluated on its own metrics. The issue-identification stage on its own. The framework-assembly stage on its own. The application stage on its own. The end-to-end performance is evaluated as a separate matter. This decomposition is essential because end-to-end metrics, alone, can hide the failure of an internal stage that another stage compensates for. Compensation is brittle. A regression in one stage may be masked by another for a while, until a query arrives that breaks the masking, and the system fails in a way that is suddenly visible to the user.
We solicit reasoning-chain feedback from users explicitly. The feedback we ask for is not whether the conclusion was correct. It is whether the chain that produced the conclusion was sound. Users tell us when the issue was misidentified, when an authority was missed, when the application was strained, when the adversarial review was thin. This feedback is, qualitatively, more useful than feedback on the conclusion alone, because it tells us where in the chain the failure occurred and therefore what specifically to fix.
We resist the temptation to automate the chain into invisibility. There is, at every release, a pressure to reduce latency by collapsing stages, to reduce cost by skipping stages, to improve perceived quality by smoothing the output. Each of these pressures is, in isolation, reasonable. Each, in aggregate, would produce a system that is faster, cheaper, smoother, and worse. We measure the chain. We protect the chain. The chain is, in a sense we mean seriously, what we are building.
The decision to build a reasoning chain rather than a generation pipeline is, in the end, a decision about what one is doing. A generation pipeline produces text. A reasoning chain produces analysis, of which text is the final stage. The two pipelines may share components. They are, in their orientation, opposite. A generation pipeline asks, at each step, what should the model produce next. A reasoning chain asks, at each step, what does the lawyer need to know in order to take the next step responsibly.
This distinction is the deepest one in the design of legal artificial intelligence. A system oriented toward generation will, with every increment of capability, produce more confident, more fluent, more polished text. A system oriented toward reasoning will, with every increment of capability, produce more transparent, more inspectable, more carefully qualified analysis. From a marketing standpoint, the generation orientation looks more impressive. From a practice standpoint, the reasoning orientation is the only one that pays off.
It is sometimes said that artificial intelligence will transform legal work. The phrase is too undifferentiated to be useful. The artificial intelligence that will transform legal work is the artificial intelligence that has earned the right to participate in the discipline. The right is earned through reasoning that can be inspected, justified, and corrected. It is not earned through fluent output. The reasoning chain is the structure through which the earning happens. It is not, for us, an architectural choice. It is the choice that defines the work.
— Engineering