Test Workflows - Matrix and Sharding
This Workflows functionality is not available when running the Testkube Agent in Standalone Mode - Read More
Often you want to run a test with multiple scenarios or environments, either to distribute the load or to verify it on different setup.
Test Workflows have a built-in mechanism for all these cases - both static and dynamic.
Configuration File Setup
Test Workflow sharding is configured through YAML files that define TestWorkflow custom resources in your Kubernetes cluster.
Where to Define Configuration
You can create and apply Test Workflow configurations in several ways:
- Create a YAML file (e.g.,
my-workflow.yaml) with your TestWorkflow definition - Apply it using kubectl:
kubectl apply -f my-workflow.yaml - Or use the Testkube CLI:
testkube create testworkflow -f my-workflow.yaml - Or use the Testkube Dashboard - navigate to Test Workflows and create/edit workflows through the UI
All Test Workflows are stored as custom resources in your Kubernetes cluster under the testworkflows.testkube.io/v1 API version.
Basic Configuration Structure
A minimal sharded workflow configuration looks like this:
apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
name: my-sharded-workflow
spec:
# Your container and content configuration
container:
image: your-test-image:latest
steps:
- name: Run tests
parallel:
count: 3 # Number of shards to create
shell: 'run-your-tests.sh'
Choosing the Right Shard Number
The number of shards you configure has a direct impact on performance and resource utilization:
Performance Impact
| Shard Count | Execution Time | Resource Usage | Best For |
|---|---|---|---|
| 1 (no sharding) | Baseline | Low | Small test suites, limited resources |
| 2-5 | ~50-80% reduction | Medium | Medium test suites (10-50 tests) |
| 5-10 | ~60-90% reduction | High | Large test suites (50-200 tests) |
| 10+ | ~70-95% reduction | Very High | Very large test suites (200+ tests) |
General Guidelines:
- Small test suites (fewer than 10 tests): Use 1-2 shards. More shards add overhead without benefit.
- Medium test suites (10-50 tests): Use 3-5 shards for optimal balance.
- Large test suites (50-200 tests): Use 5-10 shards based on available cluster resources.
- Very large test suites (200+ tests): Use 10-20 shards, but monitor resource consumption.
The optimal number depends on:
- Test duration: Longer tests benefit more from sharding
- Cluster capacity: Each shard requires a pod with allocated resources
- Test distribution: Shards work best when tests can be evenly distributed
Resource Considerations
Each shard runs in its own pod, so consider:
- CPU and memory: Each shard consumes the resources defined in
container.resources - Cluster capacity: Ensure your cluster can handle
countxresourcessimultaneously - Cost: More shards = more parallel pods = higher infrastructure costs during execution
Step-by-Step Configuration Guide
Step 1: Determine Your Sharding Strategy
Choose between static and dynamic sharding:
Static Sharding (count only): Fixed number of shards
parallel:
count: 5 # Always creates exactly 5 shards
Dynamic Sharding (maxCount + shards): Adaptive based on test data
parallel:
maxCount: 5 # Creates up to 5 shards based on available tests
shards:
testFiles: 'glob("tests/**/*.spec.js")'
Step 2: Define Resource Limits
Specify resources for each shard to prevent resource contention:
parallel:
count: 3
container:
resources:
requests:
cpu: 1 # Each shard gets 1 CPU
memory: 1Gi # Each shard gets 1GB RAM
limits:
cpu: 2
memory: 2Gi
Step 3: Configure Data Distribution
For dynamic sharding, define how to split your test data:
parallel:
maxCount: 5
shards:
testFiles: 'glob("cypress/e2e/**/*.cy.js")' # Discover test files
shell: |
# Access distributed test files via shard.testFiles
npx cypress run --spec '{{ join(shard.testFiles, ",") }}'
Step 4: Apply and Verify
# Apply your workflow
kubectl apply -f my-sharded-workflow.yaml
# Run the workflow
testkube run testworkflow my-sharded-workflow -f
# Monitor execution
kubectl get pods -l testworkflow=my-sharded-workflow
Common Use Cases
Use Case 1: Sharding Cypress Tests
Distribute Cypress E2E tests across multiple shards:
apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
name: cypress-sharded
spec:
content:
git:
uri: https://github.com/your-org/your-repo
paths: [cypress]
container:
image: cypress/included:13.6.4
workingDir: /data/repo/cypress
steps:
- name: Install dependencies
shell: npm ci
- name: Run tests in parallel
parallel:
maxCount: 5 # Up to 5 shards for optimal distribution
shards:
testFiles: 'glob("cypress/e2e/**/*.cy.js")'
description: 'Shard {{ index + 1 }}/{{ count }}: {{ join(shard.testFiles, ", ") }}'
transfer:
- from: /data/repo
container:
resources:
requests:
cpu: 1
memory: 1Gi
run:
args: [--spec, '{{ join(shard.testFiles, ",") }}']
Use Case 2: Load Testing with K6
Generate load from multiple nodes:
apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
name: k6-load-test
spec:
container:
image: grafana/k6:latest
steps:
- name: Run distributed load test
parallel:
count: 10 # 10 shards generating concurrent load
description: 'Load generator {{ index + 1 }}/{{ count }}'
container:
resources:
requests:
cpu: 2
memory: 2Gi
shell: |
k6 run --vus 50 --duration 5m \
--tag shard={{ index }} script.js
Use Case 3: Multi-Browser Testing with Playwright
Test across different browsers with sharding:
apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
name: playwright-multi-browser
spec:
container:
image: mcr.microsoft.com/playwright:latest
steps:
- name: Run tests
parallel:
matrix:
browser: [chromium, firefox, webkit] # Test on each browser
count: 3 # Shard each browser's tests into 3 parts
description: '{{ matrix.browser }} - shard {{ shardIndex + 1 }}/{{ shardCount }}'
shell: |
npx playwright test \
--project={{ matrix.browser }} \
--shard={{ shardIndex + 1 }}/{{ shardCount }}
Usage
Matrix and sharding features are supported in Services (services), and both Test Suite (execute) and Parallel Steps (parallel) operations.
- Services (
services) - Test Suite (
execute) - Parallel Steps (
parallel)
kind: TestWorkflow
apiVersion: testworkflows.testkube.io/v1
metadata:
name: example-matrix-services
spec:
services:
remote:
matrix:
browser:
- driver: chrome
image: selenium/standalone-chrome:4.21.0-20240517
- driver: edge
image: selenium/standalone-edge:4.21.0-20240517
- driver: firefox
image: selenium/standalone-firefox:4.21.0-20240517
image: "{{ matrix.browser.image }}"
description: "{{ matrix.browser.driver }}"
readinessProbe:
httpGet:
path: /wd/hub/status
port: 4444
periodSeconds: 1
steps:
- shell: 'echo {{ shellquote(join(map(services.remote, "tojson(_.value)"), "\n")) }}'
kind: TestWorkflow
apiVersion: testworkflows.testkube.io/v1
metadata:
name: example-matrix-test-suite
spec:
steps:
- execute:
workflows:
- name: k6-workflow-smoke
matrix:
target:
- https://testkube.io
- https://docs.testkube.io
config:
target: "{{ matrix.target }}"
apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
name: example-sharded-playwright
spec:
content:
git:
uri: https://github.com/kubeshop/testkube
paths:
- test/playwright/playwright-project
container:
image: mcr.microsoft.com/playwright:v1.32.3-focal
workingDir: /data/repo/test/playwright/playwright-project
steps:
- name: Install dependencies
shell: 'npm ci'
- name: Run tests
parallel:
count: 2
transfer:
- from: /data/repo
shell: 'npx playwright test --shard {{ index + 1 }}/{{ count }}'
Syntax
This feature allows you to provide few properties:
matrixto run the operation for different combinationscount/maxCountto replicate or distribute the operationshardsto provide the dataset to distribute among replicas
Both matrix and shards can be used together - all the sharding (shards + count/maxCount) will be replicated for each matrix combination.
Matrix
Matrix allows you to run the operation for multiple combinations. The values for each instance are accessible by matrix.<key>.
In example:
parallel:
matrix:
image: ['node:20', 'node:21', 'node:22']
memory: ['1Gi', '2Gi']
container:
resources:
requests:
memory: '{{ matrix.memory }}'
run:
image: '{{ matrix.image }}'
Will instantiate 6 copies:
index | matrixIndex | matrix.image | matrix.memory | shardIndex |
|---|---|---|---|---|
0 | 0 | "node:20" | "1Gi" | 0 |
1 | 1 | "node:20" | "2Gi" | 0 |
2 | 2 | "node:21" | "1Gi" | 0 |
3 | 3 | "node:21" | "2Gi" | 0 |
4 | 4 | "node:22" | "1Gi" | 0 |
5 | 5 | "node:22" | "2Gi" | 0 |
The matrix properties can be a static list of values, like:
matrix:
browser: [ 'chrome', 'firefox', '{{ config.another }}' ]
or could be dynamic one, using Test Workflow's expressions:
matrix:
files: 'glob("/data/repo/**/*.test.js")'
Sharding
Often you may want to distribute the load, to speed up the execution. To do so, you can use shards and count/maxCount properties.
shardsis a map of data to split across different instancescount/maxCountare describing the number of instances to startcountdefines static number of instances (always)maxCountdefines maximum number of instances (will be lower if there is not enough data inshardsto split)
- Replicas (
countonly) - Static sharding (
count+shards) - Dynamic sharding (
maxCount+shards)
parallel:
count: 5
description: "{{ index + 1 }} instance of {{ count }}"
run:
image: grafana/k6:latest
__
parallel:
count: 2
description: "{{ index + 1 }} instance of {{ count }}"
shards:
url: ["https://testkube.io", "https://docs.testkube.io", "https://app.testkube.io"]
run:
# shard.url for 1st instance == ["https://testkube.io", "https://docs.testkube.io"]
# shard.url for 2nd instance == ["https://app.testkube.io"]
shell: 'echo {{ shellquote(join(shard.url, "\n")) }}'
parallel:
maxCount: 5
shards:
# when there will be less than 5 tests found - it will be 1 instance per 1 test
# when there will be more than 5 tests found - they will be distributed similarly to static sharding
testFiles: 'glob("cypress/e2e/**/*.js")'
description: '{{ join(map(shard.testFiles, "relpath(_.value, \"cypress/e2e\")"), ", ") }}'
Similarly to matrix, the shards may contain a static list, or Test Workflow's expression.
Counters
Besides having the matrix.<key> and shard.<key> there are some counter variables available in Test Workflow's expressions:
indexandcount- counters for total instancesmatrixIndexandmatrixCount- counters for the combinationsshardIndexandshardCount- counters for the shards
Matrix and sharding together
Sharding can be run along with matrix. In that case, for every matrix combination, we do have selected replicas/sharding. In example:
matrix:
browser: ["chrome", "firefox"]
memory: ["1Gi", "2Gi"]
count: 2
shards:
url: ["https://testkube.io", "https://docs.testkube.io", "https://app.testkube.io"]
Will start 8 instances:
index | matrixIndex | matrix.browser | matrix.memory | shardIndex | shard.url |
|---|---|---|---|---|---|
0 | 0 | "chrome" | "1Gi" | 0 | ["https://testkube.io", "https://docs.testkube.io"] |
1 | 0 | "chrome" | "1Gi" | 1 | ["https://app.testkube.io"] |
2 | 1 | "chrome" | "2Gi" | 0 | ["https://testkube.io", "https://docs.testkube.io"] |
3 | 1 | "chrome" | "2Gi" | 1 | ["https://app.testkube.io"] |
4 | 2 | "firefox" | "1Gi" | 0 | ["https://testkube.io", "https://docs.testkube.io"] |
5 | 2 | "firefox" | "1Gi" | 1 | ["https://app.testkube.io"] |
6 | 3 | "firefox" | "2Gi" | 0 | ["https://testkube.io", "https://docs.testkube.io"] |
7 | 3 | "firefox" | "2Gi" | 1 | ["https://app.testkube.io"] |
Troubleshooting and Best Practices
Common Issues
Issue: Shards Not Starting
Symptoms: Some or all shards remain in pending state
Solutions:
- Check cluster resources: Ensure your cluster has enough capacity for all shards
kubectl describe nodes # Check available resources
kubectl get pods -n testkube # Check pod status - Review resource requests: Each shard needs allocated resources
container:
resources:
requests:
cpu: 500m # Reduce if resources are limited
memory: 512Mi - Reduce shard count: If resources are constrained, use fewer shards
parallel:
count: 3 # Reduced from 10
Issue: Uneven Test Distribution
Symptoms: Some shards finish much faster than others
Solutions:
- Use dynamic sharding with
maxCountinstead ofcount:parallel:
maxCount: 5 # Adapts to available tests
shards:
testFiles: 'glob("tests/**/*.test.js")' - Ensure test files are similar in size/duration: Group fast and slow tests evenly
- Monitor execution times:
testkube get twe EXECUTION_ID # Check individual shard durations
Issue: Out of Memory Errors
Symptoms: Pods crash with OOM (Out of Memory) errors
Solutions:
- Increase memory limits:
container:
resources:
limits:
memory: 4Gi # Increased from 2Gi - Reduce tests per shard: Increase shard count to distribute load
parallel:
count: 10 # More shards = fewer tests per shard
Best Practices
1. Start Conservative and Scale Up
Begin with a small shard count and increase based on results:
# Week 1: Baseline
parallel:
count: 2
# Week 2: If successful, increase
parallel:
count: 5
# Week 3: Optimize based on metrics
parallel:
count: 8 # Sweet spot for your test suite
2. Monitor Resource Usage
Track resource consumption to optimize shard configuration:
# Watch resource usage during execution
kubectl top pods -n testkube -l testworkflow=my-workflow
# Review completed execution metrics
testkube get twe EXECUTION_ID
3. Use Descriptive Names
Make debugging easier with clear descriptions:
parallel:
count: 5
description: 'Shard {{ index + 1 }}/{{ count }} - {{ len(shard.testFiles) }} tests'
4. Implement Retry Logic
Account for transient failures in sharded tests:
steps:
- name: Run tests with retry
parallel:
count: 3
retry:
count: 2 # Retry failed shards up to 2 times
shell: 'run-tests.sh'
5. Consider Cost vs. Speed Tradeoffs
More shards = faster execution but higher cost:
- Development: Use fewer shards (2-3) to save resources
- CI/CD: Use optimal shards (5-8) for speed
- Production validation: Use maximum shards (10+) for critical releases
6. Balance Matrix and Sharding
When combining matrix and sharding, avoid excessive parallelism:
# This creates 3 browsers × 5 shards = 15 pods
parallel:
matrix:
browser: [chrome, firefox, safari] # 3 combinations
count: 5 # 5 shards per combination
# Total: 15 concurrent pods - ensure cluster can handle this!