Engineering

Smarter Kubernetes Scaling with KEDA and Custom Metrics

May 1, 2025

Smarter Kubernetes Scaling with KEDA and Custom Metrics

by Sai Siddharth

‍

Running applications on Kubernetes often means dealing with tasks triggered by events or batch jobs — like processing queues, scheduled workloads, or interacting frequently with databases and APIs. Traditional Kubernetes scaling, using the Horizontal Pod Autoscaler (HPA), primarily relies on CPU and memory usage metrics. While this works great for compute-intensive workloads, it doesn’t align well with scenarios where your applications spend most of their time waiting on external responses or handling event-driven tasks.

‍

In other words, your pods might stay idle or underused even as queues grow or external tasks pile up, forcing you to manually manage scaling during sudden spikes or increased workloads. You end up either provisioning more resources than necessary to handle unexpected spikes or suffering from poor responsiveness during peak loads.

‍

This blog dives into how KEDA (Kubernetes Event-driven Autoscaling) paired with custom metrics can help you scale your Kubernetes workloads more accurately, efficiently, and cost-effectively based on actual workload demands.

‍

So, What is KEDA?

KEDA is a lightweight component built specifically for Kubernetes to enhance its autoscaling capabilities. Unlike traditional HPA, KEDA scales your applications based on real-time events and custom-defined metrics, giving you the flexibility to scale precisely according to your workload needs.

KEDA has two main components:

‍

KEDA Controller:

Handles scaling your Kubernetes deployments up and down — even to zero pods if no work is present.
Integrates smoothly with the existing Kubernetes Horizontal Pod Autoscaler (HPA).
Detects events from external sources and communicates these signals directly to the HPA.

‍

KEDA Metrics Server (Adapter):

Collects rich, real-time metrics from external sources and converts them into Kubernetes-compatible metrics.
Allows HPA to consume these metrics easily and make informed scaling decisions.
Seamlessly connects with external event sources like message queues, databases, and more.

Together, these components empower your Kubernetes clusters to scale dynamically and precisely based on the exact needs of your workloads.

‍

Exploring KEDA Scalers and Custom Metrics

KEDA comes with several built-in scalers for popular platforms such as AWS SQS, PostgreSQL, Elasticsearch, and more. While these built-in options simplify scaling, there are scenarios where having a custom metrics scaler becomes a huge advantage:

1. Better Security and Credential Management:

Some built-in scalers, like the PostgreSQL scaler, require direct access to database credentials. With a custom metrics scaler, you can avoid exposing these credentials at the infrastructure level by connecting to the database securely through your application code — using tools like RDS Proxy or service accounts.

2. Less Infrastructure Complexity:

With custom scalers, you avoid hard coding specific queries or scaling logic into Terraform or Helm charts, making your infrastructure simpler and easier to maintain.

3. Flexible Metric Queries:

Your team can create tailored, sophisticated queries directly within your application logic, giving you exactly the metrics you need for precise scaling.

‍

Putting it into Practice: KEDA + Custom Metrics Server

Consider implementing a custom metrics server, for instance using Python. This server securely interacts with your databases (through secure intermediaries like RDS Proxy) to gather essential metrics like pending tasks or queue lengths.

‍

Here’s how the scaling process looks in action:

You define scaling rules using a ScaledObject or ScaledJob.
The KEDA Controller picks up these rules and sets up an HPA resource accordingly.
The Controller activates the KEDA Metrics Adapter specifically for your workload.
The Metrics Adapter queries your custom metrics server (Python-based, in this case).
The metrics server returns real-time metrics.
KEDA Metrics Adapter forwards these metrics directly to Kubernetes HPA.
The HPA then scales your application pods up or down based on these precise metrics.

‍

Behind the Scenes: How KEDA Triggers Scaling

‍

Quick Setup: Installing KEDA and Defining a ScaledObject

Getting started with KEDA is simple. Here’s how you can install it and define a scaling rule with a ScaledObject.

‍

Install the KEDA Operator via Helm

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda - namespace keda - create-namespace

This installs a single KEDA operator pod in the keda namespace, which includes both the controller and the metrics adapter as sidecar containers.

‍

Example ScaledObject YAML

Below is a sample ScaledObject for an email-notification deployment. This uses a custom metrics API to scale based on a count value.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
 name: email-notification  # Name of the ScaledObject (acts as the deployment identifier for this scaling rule)
 namespace: default
 annotations:
   scaledobject.keda.sh/transfer-hpa-ownership: "true"  # Ensures KEDA manages HPA ownership
spec:
 scaleTargetRef:
   name: email-notification  # Name of the Kubernetes Deployment to scale
 minReplicaCount: 1           # Minimum number of pods
 maxReplicaCount: 5           # Maximum number of pods
 advanced:
   horizontalPodAutoscalerConfig:
     name: email-notification  # Optional: custom HPA name
 triggers:
   - type: metrics-api
     metadata:
       targetValue: "10"     # When 'count' exceeds this, scale up
       activationTargetValue: "0"  # Don't scale from 0 until value > 0
       format: "json"        # Expected format from metrics API
       url: "http://custom-metrics-server.default.svc.cluster.local/count?service=email-notification"
       valueLocation: "count"  # JSON field name containing the count
       useCachedMetrics: "true"  # Enables caching to reduce metric server load

‍

About the Metrics Server

You can implement a simple Python-based metrics server using Flask or FastAPI. This server just needs to expose an endpoint that returns a JSON response like:

{ “count”: 27 }

‍

KEDA will regularly poll this endpoint to determine whether to scale your pods.

‍

Real-World Benefits and Outcomes

By implementing KEDA with custom metrics, you’ll notice several immediate improvements:

1. Dynamic, Real-time Scaling:

Your pods scale up and down automatically based on the actual workload — whether it’s a sudden surge in events or a spike in queued jobs. No more waking up to manually tweak pod counts or over-provisioning “just in case.” KEDA ensures the right number of pods are always running, exactly when they’re needed.

2. Significant Cost Savings:

Imagine a service previously running 5 pods continuously, each using 1 CPU and 2 GB of memory (totaling 5 CPUs and 10 GB memory). After introducing KEDA and HPA, this service typically scales between 1 and 5 pods based on real demand. On average, it might run just 2 pods — meaning resource consumption drops by approximately 60%, directly reducing infrastructure costs.

3. Operational Simplicity and Flexibility:

Developers can easily adjust scaling logic and metrics directly within the application code, without complicated infrastructure updates.

4. Cloud-Agnostic Scalability:

Because custom metrics avoid tight coupling with specific cloud services, your setup remains portable across different cloud environments or on-premises deployments.

‍

Wrapping Up

KEDA’s ability to leverage custom metrics for scaling gives your Kubernetes applications the responsiveness, security, and efficiency they need, especially when traditional CPU or memory-based scaling falls short. For anyone running event-driven or batch-processing workloads in Kubernetes, adopting KEDA with custom metrics can deliver substantial improvements in resource management, cost efficiency, and operational flexibility.

‍

You can also check this post on Medium.