ABOUT THE AUTHOR
Vara Bonthu

Vara Bonthu

Data and AI on Kubernetes
Principal Open Source Solutions Architect · AWS
Apache Spark Apache Flink Trino StarRocks Kubernetes EKS Karpenter Generative AI Open-Source Data

I'm the person companies call when their data platform is on fire — or when they want to make sure it never is. Over a decade working on large-scale data infrastructure across finance, retail, and technology, I've run into every flavor of Spark OOM, Flink checkpoint storm, and Trino memory cascade you can imagine.

At AWS I work with some of the biggest data engineering teams in the world — helping them design, migrate, and operate open-source data frameworks natively on Kubernetes and EKS. My focus is the gap between what the framework documentation says and what production actually demands.

Data Signal is that gap, documented. Real learnings from real incidents, real cost reviews, and real architecture decisions. If a signal is on this site, it's because I've either fixed that problem myself or sat in the war room while someone else fixed it.

Areas of depth
Data & AI on Kubernetes
Running Spark, Flink, Trino, StarRocks and ML workloads natively on EKS — from scheduling to cost to reliability.
Distributed Compute
Petabyte-scale Spark pipelines, shuffle optimization, YuniKorn gang scheduling, Karpenter node provisioning.
Stream Processing
Flink stateful pipelines, RocksDB backends, incremental checkpointing, Kafka-native ingestion at scale.
Federated Query Engines
Trino federation across S3, Iceberg, and RDBMS. Multi-tenant isolation, Alluxio caching, Spot-safe workers.
Lakehouse Engineering
StarRocks shared-data mode, Delta Lake and Apache Iceberg on S3, sub-second analytics at warehouse scale.
Platform Cost Engineering
Spot strategy, right-sizing, Karpenter bin-packing, Celeborn external shuffle — saving real dollars at real scale.
Connect
LinkedIn
linkedin.com/in/varaprofile
Read the Signals
Production learnings from Spark, Flink, Trino & StarRocks on Kubernetes
View Signals →