Hardware, scaling and storage
Scaling
Real-world applications need scaling to handle varying loads on the system. Kapstan provides two scaling options:
Manual
If you want to tightly control the number of replicas your application is running, you can select manual scaling. To manually scale the application, follow these steps:
- Select "Manual Scaling".
- Enter the number of pods you want to run.
- Click on "Save".
By default Kapstan provisions one replica per application.
Auto Scaling
You can choose this option to auto-scale based on the CPU or Memory utilization of the application. To enable the auto-scaling, select "Auto Scaling" and then set the following parameters:
- Minimum Pods: The minimum number of pod replicas that the application can scale down to in case of low resource consumption.
- Maximum Pods: The maximum number of pod replicas that the application can scale up to in case of high resource consumption.
- Target CPU Utilization: Average target CPU utilization which the system needs to maintain. It is set at 60% by default. Kapstan will take this as an input to scale the pods up or down if the current value is higher or lower than the set target.
- Target Memory Utilization: Average target memory utilization which the system needs to maintain. It is set at 60% by default. Kapstan will take this as an input to scale the pods up or down if the current value is higher or lower than the set target.
Auto Scaling Triggers
Scaling based on CPU and Memory utilization is generally sufficient for most applications. However, there are scenarios where scaling based on external metrics is more effective. To address these use cases, Kapstan supports the concept of triggers.
Triggers allow you to scale your applications based on various external metrics, providing more granular control and flexibility in managing application load. Currently, Kapstan supports AWS SQS as a trigger, and we plan to add more triggers in the future to enhance our scaling capabilities.
This setup, powered by KEDA (Kubernetes Event-Driven Autoscaling), ensures that your application can handle varying loads effectively.
AWS SQS
For applications using AWS SQS queue, auto-scaling can be achieved based on the length of the SQS queue. To enable this feature, select "SQS" as the trigger type and then set the parameters
- Queue URL: The URL of the SQS queue.
- Queue Length: The queue length parameter defines the maximum number of messages that a single instance of the application can handle concurrently. For example, if the Queue Length is set to 10:
- If the SQS queue has 9 pending messages, 1 instance of the application will be running.
- If the SQS queue has 15 pending messages, 2 instances of the application will be running.
- If the SQS queue has between 21 and 30 pending messages, 3 instances of the application will be running.
This scaling mechanism ensures that the number of application instances dynamically adjusts based on the queue length, optimizing resource utilization.
Kafka
Coming Soon
Hardware
Architecture: You can choose the underlaying architecture to run your application:
- ARM64: Choose this if your application image is built for ARM64 architecture
- AMD64/x86-64: Choose this if your application image is built for AMD64/x86-64 architecture
- Any: Choose this if your application image is built for both ARM64 and AMD64/x86-64 architectures
Run on GPU: You can select this option if your application code requires a GPU to run. We currently support GPU with AMD64/x86-64 only.
GPU-based application example
This is an example for how you can setup a simple GPU based application that does a vector addition using a community-built docker image.
Follow these steps to run this image on Kapstan:
- Add this image to your container registry. Learn more about container registry connection, here.
- Create a container application.
- Go to the "Config" tab and add a container. For this, use the above image.
- Once the container is added, go to the "Hardware" tab and select the "AMD64/x86-64" architecture and tick the "Run on GPU" option.
- Click on the "Save" button and deploy the application.
- Check the logs by clicking on the "Logs" tab. You can see the logs of the application running successfully.
Resources
You can allocate CPU and memory to the container based on the application requirements. To allocate the resources, simply drag the slider to the desired value. Click on "Save" to save the resource configurations.
Learn more about CPU cores.
Storage
You can use this to add ephemeral storage to your container. Ephemeral storage is temporary and does not persist across deployments, pod restarts or rescheduling. It can be used to for data processing, caching, or storing non-critical data.
Adding ephemeral storage
- Volume name: A meaningful name for identification.
- Path: The path on which you want to mount the storage inside the container. E.g.
/data
- Size(Optional): Size of the storage. E.g., 100 MB.
Behavior based on size input
When size is specified:
The system allocates exactly the amount of ephemeral storage you requested.
Your application is guaranteed this amount of storage, but cannot exceed it.
If your application attempts to use more storage than allocated, it may encounter "out of space" errors or get evicted from the node.
When size is left empty:
The system uses a default allocation mechanism.
Your application can use as much ephemeral storage as available on the node, up to the node's limits.
This approach provides flexibility but may lead to unpredictable behavior if the node's resources are constrained.