Shard the output of any process for distributed processing
.screens | ||
LICENSE | ||
README.md | ||
shardz | ||
shardz.c |
Shardz
Shardz is a lightweight C utility that shards (splits) the output of any process for distributed processing. It allows you to easily distribute workloads across multiple processes or machines by splitting input streams into evenly distributed chunks.
Use Cases
- Distributing large datasets across multiple workers
- Parallel processing of log files
- Load balancing input streams
- Splitting any line-based input for distributed processing
Building
gcc -o shardz shardz.c
Usage
some_command | shardz INDEX/TOTAL
Where:
INDEX
is the shard number (starting from 1)TOTAL
is the total number of shards
Examples
- Machine number 1 would run:
curl https://example.com/large_file.txt | shardz 1/3
- Machine number 2 would run:
curl https://example.com/large_file.txt | shardz 2/3
- Machine number 3 would run:
curl https://example.com/large_file.txt | shardz 3/3
How It Works
Shardz uses a modulo operation to determine which lines should be processed by each shard. For example, with 3
total shards:
- Shard 1 processes lines 1, 4, 7, 10, ...
- Shard 2 processes lines 2, 5, 8, 11, ...
- Shard 3 processes lines 3, 6, 9, 12, ...
This ensures an even distribution of the workload across all shards.