CAP API

The Caselaw Access Project API, also known as CAPAPI, serves all official US court cases published in books from 1658 to 2018. The collection includes over six million cases scanned from the Harvard Law Library shelves. Learn more about the project.

Getting Started

Browse the API

CAPAPI includes an in-browser API viewer, but is primarily intended for software developers to access caselaw programmatically, whether to run your own analysis or build tools for other users. API results are in JSON format with case text available as structured XML, presentation HTML, or plain text.

To get started with the API, you can explore it in your browser, or reach it from the command line. For example, here is a curl command to request a single case from Illinois:

curl "https://api.case.law/v1/cases/?jurisdiction=ill&page_size=1"

If you haven't used APIs before, you might want to jump down to our Beginner's Introduction to APIs.

Registration

Most API queries don't require registration: check our access limits section for more details.

Click here to register for an API key if you need to access case text from non-whitelisted jurisdictions.

Authentication

Most API queries don't require registration: check our access limits section for more details.

Most API requests do not need to be authenticated. However, if requests are not authenticated, you may see this response in results from the case endpoint with full_case=true:

{
  "results": [
    {
      "id": 1021505,
      ...
      "jurisdiction": {
        ...
        "whitelisted": false
      },
      "casebody": {
        "data": null,
        "status": "error_auth_required"
      }
    },
  ]
}

In this example the response included a case from a non-whitelisted jurisdiction, and casebody.data for the case is therefore blank, while casebody.status is "error_auth_required".

To authenticate the request from code or the command line, you can provide an Authorization header:

curl -H "Authorization: Token abcd12345" "https://api.case.law/v1/cases/?full_case=true"

While you are logged into this website, all requests through the API browser will be authenticated automatically.

Case Text Formats

Both of these parameters must be used in conjunction with the full_case=trueparameter.

CAPAPI returns case text in three formats: text (default), XML, or HTML. The choice of format is either controlled by the body_format, which will return the body and metadata in a JSON container, or the format parameter which will return the case by itself. Only one of these parameters should be specified at a time.

The non-default behavior, and simpler of the two is the format parameter, which accepts a value of either html or xml. If you specify html, CAPAPI returns a lightly styled, standalone HTML page. If you specify xml CAPAPI returns the entire original XML document with intact METS, PREMIS and structural metadata.

This is what you can expect from different format specifications using the body_format parameter.

Text Format (default)

The default text format is best for natural language processing. Example response data:

"data": {
      "head_matter": "Fifth District\n(No. 70-17;\nThe People of the State of Illinois ...",
      "opinions": [
          {
              "author": "Mr. PRESIDING JUSTICE EBERSPACHER",
              "text": "Mr. PRESIDING JUSTICE EBERSPACHER\ndelivered the opinion of the court: ...",
              "type": "majority"
          }
      ],
      "judges": [],
      "parties": [
          "The People of the State of Illinois, Plaintiff-Appellee, v. Danny Tobin, Defendant-Appellant."
      ],
      "attorneys": [
          "John D. Shulleriberger, Morton Zwick, ...",
          "Robert H. Rice, State’s Attorney, of Belleville, for the Peop ..."
      ]
  }
}

In this example, "head_matter" is a string representing all text printed in the volume before the text prepared by judges. "opinions" is an array containing a dictionary for each opinion in the case. "judges", "parties", and "attorneys" are particular substrings from "head_matter" that we believe to refer to entities involved with the case.

XML Format

The XML format is best if your analysis requires more information about pagination, formatting, or page layout. It contains a superset of the information available from body_format=text, but requires parsing XML data. Example response data:

"data": "<?xml version='1.0' encoding='utf-8'?>\n<casebody ..."
HTML Format

The HTML format is best if you want to show readable, formatted caselaw to humans. It represents a best-effort attempt to transform our XML-formatted data to semantic HTML ready for CSS formatting of your choice. Example response data:

"data": "<section class=\"casebody\" data-firstpage=\"538\" data-lastpage=\"543\"> ..."

Pagination and Counts

Queries by default return 100 results per page, but you may request a smaller number using the page_size parameter:

curl "https://api.case.law/v1/cases/?jurisdiction=ill&page_size=1"

We use cursor-based pagination, meaning we keep track of where you are in the results set on the server, and you can access each page of results by using the link in the "previous" and "next" keys of the response:

{
  "count": 183149,
  "next": "https://api.case.law/v1/cases/?cursor=cD0xODMyLTEyLTAx",
  "previous": "https://api.case.law/v1/cases/?cursor=bz0xMCZyPTEmcD0xODI4LTEyLTAx"
  ...
}

Responses also include a "count" key. Occasionally this may show "count": null, indicating that the total count for a particular query has not yet been calculated.

Access Limits

The agreement with our project partner, Ravel, requires us to limit access to the full text of cases to no more than 500 cases per person, per day. This limitation does not apply to researchers who agree to certain restrictions on use and redistribution. Nor does this restriction apply to cases issued in jurisdictions that make their newly issued cases freely available online in an authoritative, citable, machine-readable format. We call these whitelisted jurisdictions. Currently, Illinois and Arkansas are the only whitelisted jurisdictions.

We would love to whitelist more jurisdictions! If you are involved in US case publishing at the state or federal level, we'd love to talk to you about making the transition to digital-first publishing. Please contact us and introduce yourself!

If you qualify for unlimited access as a research scholar, you can request a research agreement by creating an account and then visiting your account page.

In addition, under our agreement with Ravel (now owned by Lexis-Nexis), Ravel must negotiate in good faith to provide bulk access to anyone seeking to make commercial use of this data. Click here to contact Ravel for more information, or contact us and we will put you in touch with Ravel.

Unregistered Users
  • Access all metadata
  • Unlimited API access to all cases from whitelisted jurisdictions
  • Bulk Download all cases from whitelisted jurisdictions
Registered Users
  • Access all metadata
  • Unlimited API access to all cases from whitelisted jurisdictions
  • Access to 500 cases per day from non-whitelisted jurisdictions
  • Bulk Download all cases from whitelisted jurisdictions
Researchers
  • Access all metadata
  • Unlimited API access to all cases
  • Bulk Downloads from all jurisdictions
Commercial Users
Click here to contact Ravel for more information.

Usage Examples

This is a non-exhaustive set of examples intended to orient new users. The endpoints section contains more comprehensive documentation about the URLs and their parameters.

Retrieve a single case by ID

This example uses the single case endpoint, and will retrieve the metadata for a single case.

Modification with Parameters:
Retrieve a list of cases using a metadata filter

This example uses the cases endpoint, and will retrieve every case with the citation 1 Ill. 34.

There are many parameters with which you can filter the cases result. Check the cases endpoint documentation for a complete list of the parameters, and what values they accept.

Modification with Parameters:
Simple Full-Text Search

This example performs a simple full-text case search which finds all cases containing the word "insurance."

There are many parameters with which you can filter the cases result. Check the cases endpoint documentation for a complete list of the parameters, and what values they accept.

Modification with Parameters:
Get all reporters in a jurisdiction

This example uses the reporter endpoint, and will retrieve all reporters in Arkansas.

Endpoints

API Base

This is the base endpoint of CAPAPI. It just lists all of the available endpoints.

Case Browse/Search Endpoint

This is the primary endpoint; you use it to browse, search for, and retrieve cases. If you know the numeric ID of your case in our system, you can append it to the path to retrieve a single case.

Endpoint Parameters:
  • name_abbreviation
    • An arbitrary string
    • e.g. People v. Smith
  • decision_date_min
    • YYYY-MM-DD
  • decision_date_max
    • YYYY-MM-DD
  • docket_number
  • citation
    • e.g. 1 Ill. 21
  • reporter
  • court
  • jurisdiction
  • search
    • An arbitrary string
    • A full-text search query
  • cursor
    • An randomly generated string
    • This field contains a value that we generate which will bring you to a specific page of results.
Single Case Endpoint

This is the way to retrieve a single case.

Endpoint Parameters:
  • full_case
    • true or false
    • When set to true, this parameter loads the case body. It is required for setting both body_format and format.
  • body_format
    • html or xml
    • This will return a JSON enclosure with metadata, and a field containing the case in XML or HTML.
  • format
    • html or xml
    • This will return the case in HTML or its original XML with no JSON enclosure or metadata.
  • cursor
    • An randomly generated string
    • This field contains a value that we generate which will bring you to a specific page of results.

Here's what you can expect when you request a single case. Everything under casebody is only returned if full_case=true is set. In the cases endpoint, you'd get a list of these in a JSON object which also included pagination information and result counts.

{
    "id": integer
    "url": url,
    "name": string,
    "name_abbreviation": string,
    "decision_date": YYYY-MM-DD,
    "docket_number": string,
    "first_page": string (generally a number),
    "last_page": string (generally a number),
    "citations": [
        {
            "type": "official" or "parallel",
            "cite": string
        }
    ],
    "volume": {
        "url": url,
        "volume_number": string (generally a number)
    },
    "reporter": {
        "url": url,
        "full_name": string
    },
    "court": {
        "url": url,
        "id": integer,
        "slug": slug,
        "name": string,
        "name_abbreviation": string
    },
    "jurisdiction": {
        "url": url,
        "id": integer,
        "slug": slug,
        "name": string,
        "name_long": string,
        "whitelisted": "true" or "false"
    },
    
    "casebody": {
        "data": {
            "judges": [],
            "head_matter": string
            "attorneys": [
              string
            ],
            "opinions": [
                {
                    "type": string,
                    "author": string,
                    "text": string
                }
            ],
            "parties": [
                string
            ]
        },
        "status": should be "ok"
    }
    
}
Reporters
https://api.case.law/v1/reporters/

This will return a list of reporter series.

Endpoint Parameters:
  • full_name
    • e.g. Illinois Appellate Court Reports
    • the full reporter name
  • short_name
    • e.g. Ill. App.
    • the abbreviated name for the reporter
  • start_year
    • YYYY
    • the earliest year reported on in the series
  • end_year
    • YYYY
    • the latest year reported on in the series
  • volume_count
    • integer
    • filter on the number of volumes in a reporter series
  • cursor
    • An randomly generated string
    • This field contains a value that we generate which will bring you to a specific page of results.
Jurisdictions
https://api.case.law/v1/jurisdictions/

This will return a list of jurisdictions.

Endpoint Parameters:
  • id
    • integer
    • get jurisdiction by ID
  • name
    • e.g. Ill.
    • abbreviated jurisdiction name
  • name_long
    • e.g. Illinois
    • full jurisdiction name
  • whitelisted
  • slug
    • a slug
    • filter on the jurisdiction slug
  • cursor
    • An randomly generated string
    • This field contains a value that we generate which will bring you to a specific page of results.
Courts
https://api.case.law/v1/courts/

This will return a list of courts.

Endpoint Parameters:
  • slug
  • name
    • e.g. Illinois Appellate Court
    • full court name
  • name_abbreviation
    • e.g. Ill. App. Ct.
    • abbreviated court name
  • jurisdiction
  • cursor
    • An randomly generated string
    • This field contains a value that we generate which will bring you to a specific page of results.
Volumes
https://api.case.law/v1/volumes/

This will return a complete list of volumes.

Endpoint Parameters:
  • cursor
    • An randomly generated string
    • This field contains a value that we generate which will bring you to a specific page of results.
Citations
https://api.case.law/v1/citations/

This will return a list of citations.

Endpoint Parameters:
  • cursor
    • An randomly generated string
    • This field contains a value that we generate which will bring you to a specific page of results.

Beginner's Introduction to APIs

Are you a little lost in all the technical jargon? Here's a good place to start. This is by no means a complete introduction to using APIs, but it might be just enough to help situate a technically inclined person who's a bit outside of their comfort zone.

Fundamentally, an API is no different from a regular website: A program on your computer, such as a web browser or curl sends a bit of data to a server, the server processes that data, and then sends a response. If you know how to read a URL, you can interact with web-based services in ways that aren't limited to clicking on the links and buttons on the screen.

Consider the following URL, which will perform a google search for the word "CAP."

https://www.google.com/search?q=CAP

Let's break it down into its individual parts:

https://

This first part tells your web browser which protocol to use: this isn't very important for our purposes, so we'll ignore it.

www.google.com

The next part is a list of words, separated by periods, between the initial double-slash, and before the subsequent single slash. Many people generically refer to this as the domain, which is only partly true, but the reason why that's not entirely true isn't really important for our purposes; the important consideration here is that it points to a specific server, which is just another computer on the internet.

/search

The next section, which is comprised of everything between the slash after the server name and the question mark, is called the path. It's called a path because, in the earlier days of the web, it was a 'path' through folders/directories to find a specific file on the web server. These days, it's more likely that the path will point to a specific endpoint.

You can think of an endpoint as a distinct part of a program, which could require specific inputs, and/or provide different results. For example, the "login" endpoint on a website might accept a valid username and a password for input, and return a message that you've successfully logged in. A "register" endpoint might accept various bits of identifying information, and return a screen that says your account was successfully registered.

Though there is only one part of this particular path, search, developers usually organize paths into hierarchical lists separated by slashes. Hypothetically, if the developers at Google decided that one generalized search endpoint wasn't sufficiently serving people who wanted to search for books or locations, they could implement more specific endpoints such as /search/books and /search/locations.

?q=CAP

The final section of the URL is where you'll find the parameters, and is comprised of everything after the question mark. Parameters are a way of passing individual, labelled pieces of information to the endpoint to help it perform its job. In this case, the parameter tells the /search endpoint what to search for. Without this parameter, the response wouldn't be particularly useful.

A URL can contain many parameters, separated by ampersands, but in this instance, there is only one parameter: the rather cryptically named "q," which is short for "query," which has a value of "CAP." Parameter names are arbitrary— Google's developers could just as easily have set the parameter name to ?query=CAP, but decided that "q" would sufficiently communicate its purpose.

The Google developers designed their web search endpoint to accept other parameters, too. For example, there is an even more cryptically named parameter, 'tbs' which will limit the age of the documents returned in the search results. The parameters ?q=CAP&tbs=qdr:y will perform a web search for "CAP" and limit the results to documents less than a year old.

So when you're working with CAPAPI, the same principles apply. Rather than http://www.google.com, you'll be using https://api.case.law/. Rather than using the /search?q= endpoint and parameter, you'll be using one of our endpoints and the parameters we've defined. One important difference is the purpose of the structured data we're returning, vs. the visual, browser-oriented data that google is returning with their search engine.

When you perform a query in a web browser using our API, there are some links and buttons, but the data itself is in a text-based format with lots of brackets and commas. This format is called JSON, or JavaScript Object Notation. We use this format because software developers can easily utilize data in that format in their own programs. We do intend to have a more user-friendly case browser at some point soon, but we're not quite there yet.

OK! That about does it for our beginner's introduction to web-based APIs. Please check out our usage examples section to see some of the ways you can put these principles to work in CAPAPI. If you have any suggestions for making this documentation better, we'd appreciate your taking the time to let us know in an issue report in our code repository on github.com.

Thanks, and good luck!

Reporting Problems and Enhancement Requests

We are serving an imperfect, living dataset through an API that will forever be a work-in-progress. We work hard to hunt down and fix problems in both the API and the data, but a robust user base will uncover problems more quickly than our small team could ever hope to. Here's the best way to report common types of errors.

Jumbled or Misspelled Words in Case Text
For now, we're not accepting bug reports for OCR problems. While our data is good quality for OCR'd text, we fully expect these errors in every case. We're working on the best way to tackle this.
Typos or Broken Links in Documentation or Website, API Error Messages or Performance Problems, and Missing Features
First, please check our existing issues to see if someone has already reported the problem. If so, please feel free to comment on the issue to add information. We'll update the issue when there's more information about the issue, so if you'd like notifications, click on the "Subscribe" button on the right-hand side of the screen. If no issue exists, create a new issue, and we'll get back to you as soon as we can.
Incorrect Metadata or Improperly Labelled data in XML
First, check our errata to see if this is a known issue. Then, check our existing issues to see if someone has already reported the problem. If so, please feel free to comment on the issue to add context or additional instances that the issue owner didn't report. We'll update the issue when there's more information about the issue, so if you'd like notifications, click on the "Subscribe" button on the right-hand side of the screen. If no issue exists, create a new issue and we'll get back to you as soon as we can.

Glossary

This is a list of technical or project-specific terms we use in this docuemntation. These are stub definitions to help you get unstuck, but they should not be conisdered authoratative or complete. A quick Google search should provide more context for any of these terms.

API
API is an acronym for Application Programming Interface. Broadly, it is a way for one computer program to transfer data to another computer program. CAPAPI is a RESTful API designed to distribute court case data.
Character
A letter, number, space, or piece of punctuation. Multiple characters together make up a string.
Special characters are characters that have programmatic significance to a program. The "specialness" of any given character is determined by the context in which it's used. For example, you can't add a bare question mark to your path because they indicate to the server that everything after them is a parameter.
Command Line
This is the textual interface for interacting with a computer. Rather than interacting with the system through windows and mouse clicks, commands are typed and output is rendered in its textual form. On mac, the default Command Line program is Terminal. On Windows, the program is cmd.exe.
curl
curl is a simple command line tool for retrieving data over the internet. It's similar to a web browser in that it will retrieve the contents of a url, but it will dump the text contents to a terminal, rather than show a rendered version in a graphical browser window.
Endpoint
You can think of an endpoint as a distinct part of a program, which could require specific inputs, and/or provide different results. For example, the "login" endpoint on a website might accept a valid username and a password for input, and return a message that you've successfully logged in. A "register" endpoint might accept various bits of identifying information, and return a screen that says your account was successfully registered.
Jurisdiction
The jurisdiction of a case or volume is the political division it belongs to, such as the United States, a state, a territory, a tribe, or the District of Columbia. Volumes that collect cases from a region have the jurisdiction "Regional." Cases from tribal courts (other than Navajo Nation) temporarily have the jurisdiction "Tribal Jurisdictions" while we verify the correct jurisdiction for each tribal court.
OCR
OCR is a process in which a computer attempts to create text from an image of text. The text in our cases is OCR-derived using scanned case reporter pages as source images.
RESTful
A RESTful API is based on HTTP, and makes use of its built-in verbs(commands), such as GET and POST.
Reporter
In this project, we use the term 'reporter' to refer to a reporter series. We'd consider F2d. a reporter.
Server
A server is just a computer on the internet that was configured to respond to requests from other computers. A web server will respond to requests from a web browser. An email server will respond to requests from email programs, and or other email servers which are sending it messages.
Slug
A string with special characters removed for ease of inclusion in a URL.
String
A string, as a type of data, just means an arbitrary list (or string) of characters. A word is a string. This whole sentence is a string. "h3ll0.!" is a string. This whole document is a string.
Top-Level Domain
The suffix to a domain name, such as .com, .edu or .co.uk.
URL
A URL, or Uniform Resource Locator, is an internet address that generally contains a communication protocol, a server name, a path to a file or endpoint, and possibly parameters to pass to the endpoint.
URL Parameter
For our purposes, a parameter is just a piece of data with a label that can be passed to an endpoint in a web request.
URL Path
The URL path begins with the slash after the top-level domain and ends with the question mark that signals the beginning of the parameters. It was originally intended to point to a file on the server's hard drive, but these days it's just as likely to point to an application endpoint.
Whitelisted
While most cases in the database are subject to a 500 case per day access limit, jurisdictions that publish their cases in a citable, machine-readable format are not subject to this limit. Click here for more information on access limits, what type of users aren't subject to them, and how you can eliminate them in your legal jurisdiction.