Stanford Researchers Analyze Reliability of Leading AI Legal Research Tools
This article discusses the concept of retrieval-augmented generation (RAG) and its potential in addressing legal hallucinations in language models. RAG is a technique that combines retrieval and generation steps to produce more accurate and detailed responses by incorporating information from retrieved documents. The retrieval step involves selecting relevant documents based on a user query, while the generation step uses these documents along with the query to generate a response.
The article highlights the limitations of RAG in the legal domain. Retrieval in law is challenging due to the lack of clear-cut answers and the need to consider multiple sources across time and place. Document relevance is not solely based on text similarity, as different jurisdictions and time periods may have different applicable rules. The generation of meaningful legal text is also complex, requiring the synthesis of facts and rules from various sources while considering the legal context.
To assess the quality of RAG systems, the article introduces the concepts of correctness and groundedness. Correctness refers to the factual accuracy and relevance of the response, while groundedness evaluates the relationship between the response and the cited sources. A response can be correct but improperly grounded if it falsely asserts support from an unrelated source. Hallucinations are defined as responses that are either incorrect or misgrounded.
The article emphasizes the importance of addressing hallucinations in legal AI tools, as they can mislead users and have potentially dangerous consequences. It argues that simply linking to real legal documents does not guarantee the absence of hallucinations if the linked sources are irrelevant or contradict the tool’s claims.
While RAG shows promise in mitigating legal hallucinations, it has limitations in the legal domain. The article provides a framework for evaluating the correctness and groundedness of responses and highlights the need for careful design choices in the retrieval and generation steps. Addressing hallucinations requires domain expertise in both computer science and law to ensure accurate and reliable legal AI tools.
While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis and Thomson Reuters each hallucinate more than 17% of the time.
View referenced article