Automated Extraction and Matching for Healthcare Prior Authorization
Problem Statement
Healthcare prior authorization — the process by which insurance companies approve medical treatments before they are delivered — requires extracting clinical information from unstructured documents (physician notes, faxes, scanned PDFs, lab reports) and matching it against complex, insurer-specific rule sets that change frequently. Olive AI raised $902 million and was deployed in over 900 hospitals to automate this process, but investigations revealed that the "AI" required extensive manual intervention because the underlying information extraction was unreliable. The company shut down in October 2023. The problem persists: prior authorization consumes an estimated 34 hours per physician per week in administrative time, and current approaches still depend heavily on human labor.
Why This Matters
Administrative costs account for roughly 30% of US healthcare spending — approximately $1 trillion annually. Prior authorization alone costs the US healthcare system an estimated $35 billion per year in administrative overhead. The American Medical Association reports that 93% of physicians say prior authorization delays patient care, and 34% report that prior authorization has led to a serious adverse event for a patient. With over 2 billion prior authorization requests processed annually in the US, even modest automation improvements could free billions in healthcare spending and reduce treatment delays. The problem is worsening: the number of services requiring prior authorization has increased steadily, with insurers adding more requirements even as the administrative burden grows.
What’s Been Tried
Olive AI's approach used robotic process automation (RPA) — essentially screen-scraping and form-filling bots — combined with machine learning classifiers to categorize authorization requests. The system worked for simple, structured cases but failed on complex ones for several reasons: (1) clinical documentation is highly heterogeneous — physician notes use inconsistent terminology, abbreviations, and formatting across health systems; (2) insurer rule sets are complex, ambiguous, and change quarterly, meaning the matching logic requires constant updating; (3) many critical clinical details are buried in unstructured free-text notes, scanned faxes, or handwritten annotations that defy reliable extraction; (4) edge cases are common, not rare — prior authorization decisions often hinge on nuanced clinical judgment that rule-based systems can't capture. R1 RCM, UiPath, and Waystar offered competing products, but all require substantial human-in-the-loop intervention. Epic and Cerner have built some in-house automation, but health system adoption remains low because the accuracy isn't high enough to trust for clinical decisions.
What Would Unlock Progress
Progress requires advances on two fronts: (1) reliable information extraction from messy clinical documents — a system that can parse physician notes, fax images, lab reports, and scanned PDFs with high enough accuracy that a human doesn't need to verify every extraction; and (2) a computable, standardized representation of insurer authorization criteria that can be updated automatically when rules change, rather than requiring manual reprogramming. The FHIR (Fast Healthcare Interoperability Resources) standard and CDS Hooks specification provide some infrastructure for structured clinical data exchange, but most prior authorization still happens outside these standards. Adjacent fields: legal document analysis (extracting structured facts from unstructured legal text), regulatory compliance automation (matching documents against changing rule sets), and claims processing in property/casualty insurance (similar document heterogeneity challenges).
Entry Points for Student Teams
A student team could: (1) build a prototype clinical document parser that extracts structured prior-authorization-relevant fields (diagnosis, procedure, clinical justification, supporting evidence) from a corpus of de-identified physician notes, measuring extraction accuracy by document type; (2) design a computable representation of prior authorization criteria for a specific insurer and procedure category, demonstrating how rule changes could be encoded and applied automatically; (3) develop a clinical NLP pipeline that matches extracted document information against authorization criteria and flags cases requiring human review, with the goal of maximizing the percentage of cases that can be auto-adjudicated. Relevant disciplines include natural language processing, health informatics, software engineering, and human-computer interaction.
Genome Tags
Source Notes
- Olive AI ($902M raised, $4B peak valuation) is the most expensive failure in healthcare automation. Waystar acquired the clearinghouse business; Humata Health (led by ex-Olive executives) acquired the prior authorization technology, suggesting the underlying IP had some value. - The failure is best characterized as `failure:ignored-context` — the system was designed for structured, predictable inputs but deployed against messy, heterogeneous real-world documents. Also `failure:disciplinary-silo` — the NLP/AI capabilities were disconnected from deep clinical workflow understanding. - Related to healthcare administrative cost problems broadly. The CMS Interoperability and Prior Authorization Final Rule (CMS-0057-F, effective 2026) requires electronic prior authorization for some payers, which may create both new infrastructure and new data standards that change the problem landscape. - CommonSpirit Health (one of the largest US health systems) terminated its Olive contract after poor results, suggesting the deployment challenges were not limited to small institutions.
"Health AI startup Olive to shut down," Healthcare Dive, Oct 2023; "What Happened to Olive & Why Did It Fail?" SunsetHQ, 2024; "Olive AI's Rise, and Fall in Healthcare: What Went Wrong?" OyeLabs, 2023. Access date: 2026-02-11.