# PyLCG > Linear Congruential Generator for IP Sharding PyLCG is a Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators *(LCG)* for deterministic random number generation. This tool aids in distributed scanning & network reconnaissance by efficiently dividing IP ranges across multiple machines while being in a pseudo-random order. ___ ## Table of Contents - [Overview](#overview) - [How It Works](#how-it-works) - [Understanding IP Addresses](#understanding-ip-addresses) - [The Magic of Linear Congruential Generators](#the-magic-of-linear-congruential-generators) - [Sharding: Dividing the Work](#sharding-dividing-the-work) - [Memory-Efficient Processing](#memory-efficient-processing) - [Real-World Applications](#real-world-applications) - [Network Security Testing](#network-security-testing) - [Cloud-Based Scanning](#cloud-based-scanning) ___ ## Overview When performing network reconnaissance or scanning large IP ranges, it's often necessary to split the work across multiple machines. However, this presents several challenges: 1. You want to ensure each machine works on a different part of the network *(no overlap)* 2. You want to avoid scanning IPs in sequence *(which can trigger security alerts)* 3. You need a way to resume scans if a machine fails 4. You can't load millions of IPs into memory at once PyLCG solves these challenges through clever mathematics & efficient algorithms. ___ ## How It Works ### Understanding IP Addresses First, let's understand how IP addresses work in our system: - An IP address like `192.168.1.1` is really just a 32-bit number equal to `3232235777` or `0xC0A80101` in hexadecimal - A CIDR range like `192.168.0.0/16` represents a continuous range of these numbers - For example, `192.168.0.0/16` includes all IPs from `192.168.0.0` to `192.168.255.255` *(65,536 addresses)* - The 32-bit number can be represented as `0xC0A80000` in hexadecimal & its from `3232235520` to `3232239103` in decimal ### The Magic of Linear Congruential Generators At the heart of PyLCG is something called a Linear Congruential Generator *(LCG)*. Think of it as a mathematical recipe that generates a sequence of numbers that appear random but are actually predictable if you know the starting point *(seed)*. Here's how it works: 1. Start with a number *(called the seed, which can be random)* 2. Multiply it by `1664525` & add `1013904223` 3. Take the remainder when divided by `2^32` *(the modulo operando)* 4. Repeat the process to continue the sequence ###### Mathematical notation: ```math Next_Number = (1664525 * Current_Number + 1013904223) mod 2^32 ``` ###### Why these specific numbers? The numbers `1664525` and `1013904223` are the multiplier and increment values used in a Linear Congruential Generator *(LCG)* for random number generation. This specific combination was featured in "Numerical Recipes in C" and became widely known through its use in glibc's rand() implementation. ### Sharding: Dividing the Work PyLCG uses an interleaved sharding approach to ensure truly distributed scanning. Here's how it works: 1. **Interleaved Distribution**: Instead of dividing the IP range into sequential blocks, PyLCG distributes IPs across shards using an offset pattern: - For 4 shards scanning a network: - Shard 0 handles IPs at indices: 0, 4, 8, 12, ... - Shard 1 handles IPs at indices: 1, 5, 9, 13, ... - Shard 2 handles IPs at indices: 2, 6, 10, 14, ... - Shard 3 handles IPs at indices: 3, 7, 11, 15, ... 2. **Randomization**: Within each shard, the LCG randomizes the order of IPs: - Each index is fed through the LCG to generate a random value - IPs are scanned in order of these random values - The same seed ensures consistent ordering across runs This approach ensures: - Even distribution across the entire IP space - No sequential scanning patterns that could trigger alerts - Perfect distribution of work across shards - Deterministic results that can be reproduced ### Memory-Efficient Processing To handle large IP ranges without consuming too much memory, PyLCG uses several techniques: 1. **Chunked Processing** Instead of loading all IPs at once, it processes them in chunks. 2. **Lazy Generation** - IPs are generated only when needed using Python's async generators - The system yields one IP at a time rather than creating huge lists - This keeps memory usage constant regardless of IP range size 3. **Direct Calculation** - The LCG can jump directly to any position in its sequence - No need to generate all previous numbers - Enables efficient random access to any part of the sequence ___ ## Roadmap - [ ] Add support for IPv6 - [ ] Add support for custom LCG parameters like adding port numbers - [ ] Add support for custom chunk sizes & auto-tuning based on available system resources - [ ] Add support for resuming from a specific point in the sequence - [ ] Add support for saving the state of the LCG to a file so you can resume later - [ ] Add support for sharding line-based input files locally, from as s3 bucket, or from a URL by reading it in chunks. - [ ] Update the unit tests to include benchmarks & better coverage for future efficiency improvements & validation. ___ ###### Mirrors for this repository: [acid.vegas](https://git.acid.vegas/pylcg) • [SuperNETs](https://git.supernets.org/acidvegas/pylcg) • [GitHub](https://github.com/acidvegas/pylcg) • [GitLab](https://gitlab.com/acidvegas/pylcg) • [Codeberg](https://codeberg.org/acidvegas/pylcg)