Linear Congruential Generator for IP Sharding

Go to file

acidvegas 0f2a944556 Updated usage		2024-11-26 16:18:57 -05:00
pylcg	Initial commit	2024-11-26 15:57:28 -05:00
.gitignore	Initial commit	2024-11-26 15:57:28 -05:00
LICENSE	Initial commit	2024-11-26 15:57:28 -05:00
pyproject.toml	Initial commit	2024-11-26 15:57:28 -05:00
README.md	Updated usage	2024-11-26 16:18:57 -05:00
setup.py	Initial commit	2024-11-26 15:57:28 -05:00
unit_test.py	Initial commit	2024-11-26 15:57:28 -05:00

README.md

PyLCG

Ultra-fast Linear Congruential Generator for IP Sharding

PyLCG is a high-performance Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators (LCG) for deterministic random number generation. This tool enables distributed scanning & network reconnaissance by efficiently dividing IP ranges across multiple machines while maintaining pseudo-random ordering.

Features

Memory-efficient IP range processing
Deterministic pseudo-random IP generation
High-performance LCG implementation
Support for sharding across multiple machines
Zero dependencies beyond Python standard library
Simple command-line interface

Installation

pip install pylcg

Usage

Command Line

pylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345

As a Library

from pylcg import ip_stream

# Generate IPs for the first shard of 4 total shards
for ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345):
    print(ip)

How It Works

Linear Congruential Generator

PyLCG uses an optimized LCG implementation with carefully chosen parameters:

Name	Variable	Value
Multiplier	`a`	`1664525`
Increment	`c`	`1013904223`
Modulus	`m`	`2^32`

This generates a deterministic sequence of pseudo-random numbers using the formula:

next = (a * current + c) mod m

Memory-Efficient IP Processing

Instead of loading entire IP ranges into memory, PyLCG:

Converts CIDR ranges to start/end integers
Uses generator functions for lazy evaluation
Calculates IPs on-demand using index mapping
Maintains constant memory usage regardless of range size

Sharding Algorithm

The sharding system uses an interleaved approach:

Each shard is assigned a subset of indices based on modulo arithmetic
The LCG randomizes the order within each shard
Work is distributed evenly across shards
No sequential scanning patterns

Performance

PyLCG is designed for maximum performance:

Generates millions of IPs per second
Constant memory usage (~100KB)
Minimal CPU overhead
No disk I/O required

Benchmark results on a typical system:

IP Generation: ~5-10 million IPs/second
Memory Usage: < 1MB for any range size
LCG Operations: < 1 microsecond per number

Contributing

Performance Optimization

We welcome contributions that improve PyLCG's performance. When submitting optimizations:

Run the included benchmark suite:

python3 unit_test.py

Include before/after benchmark results for:

IP generation speed
Memory usage
LCG sequence generation
Shard distribution metrics

Consider optimizing:

Number generation algorithms
Memory access patterns
CPU cache utilization
Python-specific optimizations

Document any tradeoffs between:

Speed vs memory usage
Randomness vs performance
Complexity vs maintainability

Benchmark Guidelines

When running benchmarks:

Use consistent hardware/environment
Run multiple iterations
Test with various CIDR ranges
Measure both average and worst-case performance
Profile memory usage patterns
Test shard distribution uniformity

Roadmap

IPv6 support
Custom LCG parameters
Configurable chunk sizes
State persistence
Resume capability
S3/URL input support
Extended benchmark suite