Healthcare AI Compliance: HIPAA, Data Residency, and LLMs

What your legal team will ask before approving an AI feature -- and the architectural decisions that make the answers easy.

Healthcare AI products do not fail because the model is wrong. They stall because legal says no. After building AI features for healthcare SaaS companies across clinical documentation, patient communication, and prior authorization, we have seen the same approval blockers surface repeatedly. This post is a map through them.

The Three Questions Your Legal Team Will Ask

Before any AI feature ships in a healthcare context, you will face three non-negotiable questions:

Does PHI leave our control?
Where is the data stored and processed?
Who has a BAA with whom?

Get architecture that answers all three cleanly and the rest is paperwork. Architect ambiguously and you will be in legal review cycles for months.

HIPAA and LLMs: The BAA Problem

HIPAA requires a Business Associate Agreement (BAA) with every vendor that touches Protected Health Information (PHI). This is where most teams stumble with off-the-shelf LLM APIs.

The current BAA landscape for major LLM providers:

Microsoft Azure OpenAI: BAA available via Azure Healthcare APIs and enterprise agreements. This is the most battle-tested path for GPT-4 in healthcare.
AWS Bedrock (Anthropic Claude, Meta Llama): BAA available under AWS standard enterprise agreements. Bedrock's isolation model is clean for compliance.
Anthropic Claude API (direct): BAA available for enterprise customers. Contact their sales team -- not available at the standard API tier.
OpenAI API (direct): BAA available for ChatGPT Enterprise and API enterprise agreements, not standard API accounts.
Google Vertex AI: BAA available, HIPAA-eligible services clearly documented.

Do not use a consumer-tier API key for anything that touches PHI. The BAA tier is not a premium upsell -- it is the difference between a compliant product and a HIPAA violation.

Data Residency: Where Does the Data Go?

Data residency requirements come in two flavours in healthcare: regulatory (some jurisdictions require data to stay in-country) and contractual (hospital systems often specify data residency in their vendor agreements).

Architectural patterns that handle both:

Region-locked inference

Both Azure OpenAI and AWS Bedrock allow you to specify the compute region for inference. Lock it to US-East or US-West and your inference never touches infrastructure outside the US. Document this in your security questionnaire -- hospital procurement teams will ask.

On-premise or VPC-isolated models

For the most stringent data residency requirements, run an open-weight model (Llama 3, Mistral) in your own VPC or on-premise. Quality is lower than GPT-4 or Claude for complex reasoning tasks, but for structured extraction and classification tasks in healthcare workflows, fine-tuned open-weight models are often sufficient and fully within your control.

PHI Minimisation: The Safest Architecture

The cleanest HIPAA posture is one where PHI never reaches the LLM in the first place. This is achievable in more use cases than you might expect.

De-identification before inference

Run a de-identification pass (AWS Comprehend Medical, Azure Text Analytics for Health, or a custom NER model) to strip PHI before sending to the LLM. Store the mapping between original and de-identified tokens securely. Re-inject after inference. This pattern works for summarisation, classification, and ICD code suggestion use cases.

Synthetic data for development and testing

Never use real patient data in development or staging environments. Generate synthetic datasets using tools like Synthea or a controlled LLM pipeline. This eliminates an entire category of compliance risk and is standard practice in every compliant healthcare AI product we have shipped.

Audit Logging: Your Compliance Evidence

HIPAA requires an audit trail for access to PHI. For AI features, this means:

Log every request that contains or relates to PHI -- timestamp, user, action, data touched.
Log every LLM call -- inputs (or hashed inputs), outputs, model version, latency.
Retain logs for a minimum of 6 years per HIPAA requirements.
Store logs in a write-once, tamper-evident system (AWS CloudTrail, Azure Monitor with immutable storage).

Audit logging is not optional and it is not just for breach response. It is the evidence you present during a compliance review that your AI features operate within policy.

Making Legal Say Yes Faster

The teams that move fastest through healthcare AI compliance reviews are the ones who present architecture decisions, not features. A one-page data flow diagram showing exactly where PHI goes, which vendors have BAAs, and how de-identification works will resolve 80% of legal concerns before a meeting is scheduled.

Compliance is not a blocker. It is a design constraint -- and design constraints make products better.