Company Overview
Dexcom, founded in 1999, is a global leader in continuous glucose monitoring (CGM) technology. Headquartered in San Diego, California, the company develops cutting-edge glucose monitoring solutions that empower people with diabetes to manage their condition with confidence and control. Dexcom’s mission is to improve health outcomes and quality of life through innovative sensor technology and seamless digital integration. With operations spanning the U.S., Malaysia, and Ireland, Dexcom remains at the forefront of diabetes care, driving advancements that benefit both patients and healthcare providers.
Objectives
Dexcom Cloud Engineering team set the following objectives for 2024:
- Speedup and derisk GKE and Add-on upgrades to get ahead of the upgrades, thus preventing forced upgrades and avoiding a 500% GKE Extended Support surcharge.
- Reduce reliance on expert team members, freeing up expert bandwidth for strategic innovation projects.
- Enhance application safety posture by defining guardrails for app teams that assure change safety throughout the software development lifecycle.
- Ensure compliance with stringent regulatory standards by providing documented proof of operational resilience for mission-critical applications that power user data monitoring.
"Each Add-on, especially stateful and datapath add-ons, required weeks of preparation. The risk of breakage and downtime made every change a high-stakes event." — Sean Blog, Sr. Manager, Dexcom Cloud Engineering.
Challenges
Dexcom Cloud Engineering team has built a Platform on GKE which supports hundreds of software developers running thousands of applications. These applications are mission-critical because of their inherent medical nature and the Platform must also comply with stringent compliance regulations from the healthcare industry.
GKE mandates at least three Kubernetes upgrades each year, and each add-on typically requires 4-5 upgrades annually. Between February 2023 and August 2024, GKE released six Kubernetes versions (v1.26, v1.27, … v1.31). This frequent upgrade cycle and surge in new releases have created the following challenges for Dexcom’s Platform:
- Complexity in navigating GKE and Add-on upgrades – Due to the constant change pressure of GKE releases and Add-on upgrades, Dexcom needs extensive validation, testing, and coordination across teams to prevent breakages and disruptions.
- Change velocity and agility bottlenecks – Dependencies across numerous Add-ons created unknown incompatibilities, making infrastructure changes slow. Dexcom Cloud Engineering team didn’t want to choose between safety and agility.
- Risk of forced upgrades – Falling behind on upgrade cycles could have resulted in forced upgrades—an extreme and disruptive event that posed a business continuity risk.
- Ensuring application availability through upgrades – Dexcom’s platform relies on real-time data processing, making continuous uptime non-negotiable for safety and compliance.
Dexcom Cloud Engineering team decided against securing additional headcount for this upgrade surge, and instead partnered with Chkk to navigate interdependencies across hundreds of Add-ons and Kubernetes versions, along with its ability to speedup GKE upgrades and prevent forced upgrades.
Solution
Dexcom implemented Chkk’s Operational Safety Platform to simplify upgrade management and enhance compliance.
- Streamlined Upgrade Process – Chkk automated key upgrade tasks such as dependency analysis, release note processing, and impact assessment across hundreds of Add-ons, cutting down research and planning time by up to 8x.
- Upgrade Copilot & Preverified Plans – Chkk’s Upgrade Copilot automated tedious pre-work and delivered Preverified Upgrade Plans for clusters and Add-ons, tested on a digital twin of Dexcom’s infrastructure, ensuring safe, well-orchestrated upgrades.
- Repeatable Upgrades with Curated Workflows – Chkk standardized workflows and enabled task delegation, reducing reliance on expert knowledge and making complex upgrades repeatable and efficient.
- Avoiding breakages with Safety, Health and Readiness Checks – Chkk covers thousands of add-on versions in its curated library of preflight, inflight and postflight checks, which were extensively used by Dexcom to ensure upgrades are disruption-free.
- Conformance to Operational Safety Guardrails – Dexcom used Chkk’s Guardrails to update hundreds of Helm charts owned by application teams, ensuring conformance to safety primitives at the source of their software development lifecycle.
"Chkk transformed our upgrade process from a high-risk, manual effort into a streamlined, automated workflow. The level of insight and safety nets they provide is unparalleled." — John Rzeszotarski, VP of Infrastructure.
Outcomes
By implementing Chkk, Dexcom achieved significant operational and financial benefits:
- 200% increase in upgrade productivity, ensuring business, regulatory, and compliance goals were met.
- 80% reduction in upgrade preparation time, eliminating weeks of manual research and validation.
- Improved operational efficiency – Cloud Engineering team could focus on strategic initiatives rather than break-fix efforts.
- Repurposed 2 FTEs by reducing manual upgrade workloads, allowing them to focus on high-value work.
- Higher compliance assurance – Timely upgrades ensured adherence to regulatory standards and mitigated non-compliance, and forced upgrade risks.
Dexcom’s experience with Chkk demonstrates how automated upgrade management can transform cloud-native operations, reducing risk, saving costs, and enabling teams to focus on innovation.
"With Chkk, we’ve cut our Kubernetes upgrade prep time from weeks to days while ensuring the highest levels of safety and compliance." — Chakri Paladugu, Staff Engineer, Dexcom Cloud Engineering.
Takeaway Lessons
- The real upgrade challenge isn’t Kubernetes—it’s Add-ons – Managing interdependencies across hundreds of Add-ons is the primary source of complexity.
- Forced upgrades pose a serious business continuity risk – Extended Support only delays the inevitable; without proactive planning, companies will face rushed, high-risk upgrades.
- Kubernetes safety tooling is a must – For large-scale, regulated environments, safety and compliance must be baked into upgrade workflows to prevent breakages and failures.
- A proactive approach accelerates Upgrades by 3x-5x – Chkk ensures fast, safe, and compliant upgrades.
- Safety and agility are achievable together – The right automation and risk mitigation tools enable velocity without sacrificing stability.