go back

Volume 16, No. 10

Enabling Secure and Efficient Data Analytics Pipeline Evolution with Trusted Execution Environment

Authors:
Haotian Gao, Cong Yue, Tien Tuan Anh Dinh, Zhiyong Huang, Beng Chin Ooi

Abstract

Modern data analytics pipelines are highly dynamic, as they are constantly monitored and fine-tuned by both data engineers and scientists. Recent systems managing pipelines ease creating, deploying, and tracking their evolution. However, privacy concerns emerge as many of them are deployed on the public cloud with less or no trust. Unfortunately, the unique nature of pipelines prevents the adoption of existing confidential computing techniques with different computational patterns and large performance overhead. Being a potential approach, trusted execution environments (TEEs) are efficient in protecting the confidentiality and integrity of data and computation. However, fast-changing pipelines with latency requirements bring the challenge of reducing the cold start overhead — the main bottleneck in the latest TEE. To support end-to-end private pipeline evolution, we present SecCask, a TEE-based data analytics pipeline management system. SecCask overcomes the problems of a naive design that isolates complete pipeline execution in one enclave by administering enclaves and runtimes. To reduce cold start overheads, our approach consists of reusing trusted runtimes for different pipeline components and caching them to avoid the cost of initialization. We leverage the latest Intel SGX to conduct experiments on representative workloads. The results demonstrate that SecCask reduces the total execution time by 68.4% compared to not reusing, is faster than running all components in one enclave, and incurs a modest average performance overhead of 29.9% over insecure baselines.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy