Phantom: Privacy-Preserving Deep Neural Network Model Obfuscation in Heterogeneous TEE and GPU System​

1Johns Hopkins University, 2Arizona State University, 3University of Central Florida, 4Sony AI​

Motivation

Cost Diagram
The cost of training large-scale machine learning models is doubling every nine months. Source: Cottier & Rahman (2024), Epoch AI .
Motivation Diagram
High-stakes domains require fine-tuning ML models using private data for domain-specific tasks.

Abstract

In this work, we present Phantom, a novel privacy-preserving framework for obfuscating deep neural network (DNN) model deployed in heterogeneous TEE/GPU systems. Phantom employs reinforcement learning to add lightweight obfuscation layers, degrading model performance for adversaries while maintaining functionality for authorized user. To reduce the off-chip data communication between TEE and GPU, we propose a Top-K layer-wise obfuscation sensitivity analysis method. Extensive experiments demonstrate Phantom's superiority over state-of-the-art (SoTA) defense methods against model stealing and fine-tuning attacks across various architectures and datasets. It reduces unauthorized accuracy to near-random guessing (e.g., 10% for CIFAR-10 tasks, 1% for CIFAR-100 tasks) and achieves a 6.99% average attack success rate for model stealing, significantly outperforming SoTA competing methods. System implementation on Intel SGX 2.0 and Nvidia GPU heterogeneous system achieves 35% end-to-end latency reduction compared with most recent SoTA work.

Threat Model

Threat Model Diagram
Threat model showing the adversary's access to untrusted execution environment while TEE remains protected.

Following the prior model privacy protection works, we consider a powerful adversary could access the untrusted execution environment, including the operating system (OS) and GPU. The adversary can observe and steal the model knowledge and data outside the TEE, but cannot access the ones inside the TEE. In line with industry practices and previous research, we assume that the deployed DNN models provide only class labels as output to both adversaries and authorized users. Any intermediate results, such as prediction confidence scores, remain protected within the TEE. We assume the adversary can query the victim model with limited numbers to build a transfer dataset, which can be used to train a surrogate model for model stealing attack. Furthermore, we assume the adversary can obtain at most 10% original training dataset for fine-tuning attack, which is a more powerful assumption based on prior works.

Method

Secure Model Deployment
Secure Model Deployment
Trained Obfuscation Model
Trained Obfuscation Model
Vulnerable Model Components
Vulnerable Model Components Remain Susceptible to Extraction by Adversaries
Private Model Components
Obfuscated Model Components contains Sensitive Privacy Information requiring Protection
Whole Model Deployment
Trained Obfuscation Model deployed on TEE and GPU to protect sensitive privacy information

Experiment

Table 1
Table 2
Figure 1
Table 3
Table 4

BibTeX

@inproceedings{bai2025phantom,
  title={Phantom: Privacy-Preserving Deep Neural Network Model Obfuscation in Heterogeneous TEE and GPU System},
  author={Bai, Juyang and Chowdhuryy, Md Hafizul Islam and Li, Jingtao and Yao, Fan and Chakrabarti, Chaitali and Fan, Deliang},
  year={2025},
  organization={34th USENIX Security Symposium}
}