ABOUT THE AUTHOR

Vara Bonthu

Data and AI on Kubernetes

Principal Open Source Solutions Architect · AWS

Apache Spark Apache Flink Trino StarRocks Kubernetes EKS Karpenter Generative AI Open-Source Data

I'm the person companies call when their data platform is on fire — or when they want to make sure it never is. Over a decade working on large-scale data infrastructure across finance, retail, and technology, I've run into every flavor of Spark OOM, Flink checkpoint storm, and Trino memory cascade you can imagine.

At AWS I work with some of the biggest data engineering teams in the world — helping them design, migrate, and operate open-source data frameworks natively on Kubernetes and EKS. My focus is the gap between what the framework documentation says and what production actually demands.

Data Signal is that gap, documented. Real learnings from real incidents, real cost reviews, and real architecture decisions. If a signal is on this site, it's because I've either fixed that problem myself or sat in the war room while someone else fixed it.

Areas of depth

Data & AI on Kubernetes

Running Spark, Flink, Trino, StarRocks and ML workloads natively on EKS — from scheduling to cost to reliability.

Distributed Compute

Petabyte-scale Spark pipelines, shuffle optimization, YuniKorn gang scheduling, Karpenter node provisioning.

Stream Processing

Flink stateful pipelines, RocksDB backends, incremental checkpointing, Kafka-native ingestion at scale.

Federated Query Engines

Trino federation across S3, Iceberg, and RDBMS. Multi-tenant isolation, Alluxio caching, Spot-safe workers.

Lakehouse Engineering

StarRocks shared-data mode, Delta Lake and Apache Iceberg on S3, sub-second analytics at warehouse scale.

Platform Cost Engineering

Spot strategy, right-sizing, Karpenter bin-packing, Celeborn external shuffle — saving real dollars at real scale.

Connect

linkedin.com/in/varaprofile

→

Read the Signals

Production learnings from Spark, Flink, Trino & StarRocks on Kubernetes

View Signals →