What data do we have?
CAP includes all official, book-published United States case law — every volume designated as an official report of decisions by a court within the United States.
Our scope includes all state courts, federal courts, and territorial courts for American Samoa, Dakota Territory, Guam, Native American Courts, Navajo Nation, and the Northern Mariana Islands. Our earliest case is from 1658, and our most recent cases are from 2018.
Each volume has been converted into structured, case-level data broken out by majority and dissenting opinion, with human-checked metadata for party names, docket number, citation, and date.
We also plan to share (but have not yet published) page images and page-level OCR data for all volumes.
CAP does not include:
- New cases as they are published. We currently include volumes published through June, 2018, and may or may not include additional volumes in the future.
- Cases not designated as officially published, such as most lower court decisions.
- Non-published trial documents such as party filings, orders, and exhibits.
- Parallel versions of cases from regional reporters, unless those cases were designated by a court as official.
- Cases officially published in digital form, such as recent cases from Illinois, Arkansas, New Mexico, and North Carolina.
Cases published after 1922 do not include headnotes.
By the numbers
Here are some tsv-formatted spreadsheets with specific counts from our collection, and links to view those cases in the API:
We created this data by digitizing roughly 40 million pages of court decisions contained in roughly 40,000 bound volumes owned by the Harvard Law School Library.
Members of our team created metadata for each volume, including a unique barcode, reporter name, title, jurisdiction, publication date and other volume-level information. We then used a high-speed scanner to produce JP2 and TIF images of every page. A vendor then used OCR to extract the text of every case, creating case-level XML files. Key metadata fields, like case name, citation, court and decision date, were corrected for accuracy, while the text of each case was left as raw OCR output. In addition, for cases from volumes not yet in the public domain, our vendor redacted any headnotes.
Our data inevitably includes countless errors as part of the digitization process. The public launch of this project is only the start of discovering errors, and we hope you will help us in finding and fixing them.
Some parts of our data are higher quality than others. Case metadata, such as the party names, docket number, citation, and date, has received human review. Case text and general head matter has been generated by machine OCR and has not received human review.
You can report errors of all kinds at our Github issue tracker, where you can also see currently known issues. We particularly welcome metadata corrections, feature requests, and suggestions for large-scale algorithmic changes. We are not currently able to process individual OCR corrections, but welcome general suggestions on the OCR correction process.
Data made available through the Caselaw Access Project API and bulk download service is citable. View our suggested citation in these standard formats:
Caselaw Access Project. (2018). Retrieved [date], from [url].
The President and Fellows of Harvard University. "Caselaw Access Project." 2018, [url].
Chicago / Turabian
Caselaw Access Project. "Caselaw Access Project." Last modified [date], [url].
Have you used Caselaw Access Project data in your research? Tell us about it.
Usage & access
The CAP data is free for the public to use and access.
Case metadata, such as the case name, citation, court, date, etc., is freely and openly accessible without limitation. Full case text can be freely viewed or downloaded but you must register for an account to do so, and currently you may view or download no more than 500 cases per day. In addition, research scholars can qualify for bulk data access by agreeing to certain use and redistribution restrictions. You can request a bulk access agreement by creating an account and then visiting your account page.
Access limitations on full text and bulk data are a component of Harvard’s collaboration agreement with Ravel Law, Inc. (now part of Lexis-Nexis). These limitations will end, at the latest, in March of 2024. In addition, these limitations apply only to cases from jurisdictions that continue to publish their official case law in print form. Once a jurisdiction transitions from print-first publishing to digital-first publishing, these limitations cease. Thus far, Illinois, Arkansas, New Mexico, and North Carolina have made this important and positive shift and, as a result, all historical cases from these jurisdictions are freely available to the public without restriction. We hope many other jurisdictions will follow their example soon.
- LawSites June 21, 2019
- ABA Journal June 20, 2019
- LawSites November 12, 2018
- TechDirt October 31, 2018
- Harvard Law Today October 31, 2018
- ABA Journal October 30, 2018
- The New York Times October 28, 2015
Friends & Partners
- The Caselaw Access Project is a Harvard Law School Library Innovation Lab Project.
- LIL is part of the Harvard Law School Library.
- Berkman-Klein Center cooperated with LIL on the Caselaw Access Project.
- Ravel Law has partnered with the Harvard Law Library and LIL since the beginning of the Caselaw Access Project. Ravel funded the digitization effort and now offers free public access to the entire corpus through their search interface and their non-commercial API.
- Carl Jaeckel of ClassAction.org graciously donated the case.law domain name.
- Cloudflare has generously provided LIL with network services for case.law.
- Anastasia Aizman→Developer & designer
- Kendra Albert →former Research Associate
- Karen Beck →Manager, Historical & Special Collections
- Zachary Bodnar →Digitization Specialist
- June Casey →Librarian for Open Access Initiatives & Scholarly Communication
- Stephen Chapman →Manager, Digital Strategies for Collections
- Deborah Chase →Digitization Specialist
- Jack Cushman→Senior Developer
- Kim Dulin →former Library Innovation Lab Director
- Lindsay Dumas →former Digital Projects Archivist
- Kate Edrington →Digitization Specialist
- Kelly Fitzpatrick→Research Associate
- Kerri Fleming →Digital Projects Archivist
- Gerard Fowke →Harvard Law School Library Intern
- Jane Kelly →Historical & Special Collections Assistant
- Erica Leeman →former Digitization Specialist
- Dustin Lewis →former Project Manager (FTL, H2O) and now Senior Researcher at Harvard Law School's PILAC
- Andrew MacTaggart →Senior Digitization Specialist
- Emily Magagnosc →Digitization Specialist
- Margaret Peachy →Curator of Digital Collections
- Lori Schulsinger →Collection Development Coordinator
- Andy Silva→Developer
- Ben Steinberg→DevOps
- Shailin Thomas →Affiliate, Berkman Center for Internet & Society
- Caroline Walters →Collection Development Librarian for U.S. Law
- Suzanne Wones →former Executive Director, Harvard Law School Library
- Adam Ziegler→Director
- Jonathan Zittrain→Harvard Faculty and Law School Library Director
Getting Legal Help
The Caselaw Access Project team cannot help with personal legal research problems or legal representation. Our data is valuable for scholarship, but it is a work in progress and is not kept up to date. Please do not rely on our data set to solve personal legal problems.
Finding a lawyer: see the list of links on the Harvard Law School Library's page Where can I get legal advice?
Alternate databases: if you need to conduct up-to-date research for use in a legal proceeding, consider one of these alternate databases.
Learning to conduct legal research: If you have access to a public law library, its librarians should be able to help you learn legal research skills. The Harvard Law School Library Reference Desk may also be able to offer assistance through their Ask a Librarian service.