The Kensho Extract API allows users to transform PDF documents into structured JSON files. The JSON files have a few key characteristics:
- They contain core document data types including:
- Headers & Titles
- Tables & Table Titles
- Figures & Figure Titles
- Miscellaneous Text
- The JSON is structured in a hierarchical fashion going down to the second header level mimicking the intended hierarchy of a document.
The API behaves in the following fashion:
After authentication, the user is able to submit PDF documents as well as a priority code to the API. By default, the API will treat all documents as first in, first out with the exception that any document marked as low priority will be handled after high priority documents are completed regardless of when they are submitted.
The low priority queue is intended for all bulk document processing to avoid delaying the processing of any high-urgency documents which may need a fast turnaround.
After document submission, the API will return a unique request_id key which can be used for a subsequent query to retrieve the document output at a later time.
You can begin using Kensho Extract in seconds via our REST API.
To sign up, please email firstname.lastname@example.org to set up your API profile.
Then, to start extracting documents with Kensho Extract, visit our authentication guide or reference the full API Documentation.