Shard the output of any process for distributed processing
Go to file
2024-12-07 14:09:21 -05:00
.screens Initial commit 2024-12-06 23:18:44 -05:00
man Prepair for v1.0.1 2024-12-07 14:09:21 -05:00
pkg Prepair for v1.0.1 2024-12-07 14:09:21 -05:00
LICENSE Initial commit 2024-12-06 23:18:44 -05:00
Makefile Prepair for v1.0.1 2024-12-07 14:09:21 -05:00
README.md Added make file 2024-12-07 12:11:43 -05:00
shardz Prepair for release 1.0.0 2024-12-06 23:24:02 -05:00
shardz.c Initial commit 2024-12-06 23:18:44 -05:00
shardz.pc Prepair for v1.0.1 2024-12-07 14:09:21 -05:00

Shardz

Shardz is a lightweight C utility that shards (splits) the output of any process for distributed processing. It allows you to easily distribute workloads across multiple processes or machines by splitting input streams into evenly distributed chunks.

Use Cases

  • Distributing large datasets across multiple workers
  • Parallel processing of log files
  • Load balancing input streams
  • Splitting any line-based input for distributed processing

Building & Installation

Quick Build

gcc -o shardz shardz.c

Using Make

# Build only
make

# Build and install system-wide (requires root/sudo)
sudo make install

# To uninstall
sudo make uninstall

Usage

some_command | shardz INDEX/TOTAL

Where:

  • INDEX is the shard number (starting from 1)
  • TOTAL is the total number of shards

Examples

  • Machine number 1 would run:
curl https://example.com/large_file.txt | shardz 1/3
  • Machine number 2 would run:
curl https://example.com/large_file.txt | shardz 2/3
  • Machine number 3 would run:
curl https://example.com/large_file.txt | shardz 3/3

How It Works

Shardz uses a modulo operation to determine which lines should be processed by each shard. For example, with 3 total shards:

  • Shard 1 processes lines 1, 4, 7, 10, ...
  • Shard 2 processes lines 2, 5, 8, 11, ...
  • Shard 3 processes lines 3, 6, 9, 12, ...

This ensures an even distribution of the workload across all shards.


Mirrors: acid.vegasSuperNETsGitHubGitLabCodeberg