Updated README
This commit is contained in:
parent
9eadeb54b3
commit
7b41c207b7
188
README.md
188
README.md
@ -1,118 +1,138 @@
|
|||||||
# PyLCG
|
# PyLCG
|
||||||
> Linear Congruential Generator for IP Sharding
|
> Ultra-fast Linear Congruential Generator for IP Sharding
|
||||||
|
|
||||||
PyLCG is a Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators *(LCG)* for deterministic random number generation. This tool aids in distributed scanning & network reconnaissance by efficiently dividing IP ranges across multiple machines while being in a pseudo-random order.
|
PyLCG is a high-performance Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators (LCG) for deterministic random number generation. This tool enables distributed scanning & network reconnaissance by efficiently dividing IP ranges across multiple machines while maintaining pseudo-random ordering.
|
||||||
|
|
||||||
___
|
## Features
|
||||||
|
|
||||||
## Table of Contents
|
- Memory-efficient IP range processing
|
||||||
|
- Deterministic pseudo-random IP generation
|
||||||
|
- High-performance LCG implementation
|
||||||
|
- Support for sharding across multiple machines
|
||||||
|
- Zero dependencies beyond Python standard library
|
||||||
|
- Simple command-line interface
|
||||||
|
|
||||||
- [Overview](#overview)
|
## Installation
|
||||||
- [How It Works](#how-it-works)
|
|
||||||
- [Understanding IP Addresses](#understanding-ip-addresses)
|
|
||||||
- [The Magic of Linear Congruential Generators](#the-magic-of-linear-congruential-generators)
|
|
||||||
- [Sharding: Dividing the Work](#sharding-dividing-the-work)
|
|
||||||
- [Memory-Efficient Processing](#memory-efficient-processing)
|
|
||||||
- [Real-World Applications](#real-world-applications)
|
|
||||||
- [Network Security Testing](#network-security-testing)
|
|
||||||
- [Cloud-Based Scanning](#cloud-based-scanning)
|
|
||||||
|
|
||||||
___
|
```bash
|
||||||
|
git clone https://github.com/acidvegas/pylcg
|
||||||
|
cd pylcg
|
||||||
|
chmod +x pylcg.py
|
||||||
|
```
|
||||||
|
|
||||||
## Overview
|
## Usage
|
||||||
|
|
||||||
When performing network reconnaissance or scanning large IP ranges, it's often necessary to split the work across multiple machines. However, this presents several challenges:
|
### Command Line
|
||||||
|
|
||||||
1. You want to ensure each machine works on a different part of the network *(no overlap)*
|
```bash
|
||||||
2. You want to avoid scanning IPs in sequence *(which can trigger security alerts)*
|
./pylcg.py 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345
|
||||||
3. You need a way to resume scans if a machine fails
|
```
|
||||||
4. You can't load millions of IPs into memory at once
|
|
||||||
|
|
||||||
PyLCG solves these challenges through clever mathematics & efficient algorithms.
|
### As a Library
|
||||||
|
|
||||||
___
|
```python
|
||||||
|
from pylcg import ip_stream
|
||||||
|
|
||||||
|
# Generate IPs for the first shard of 4 total shards
|
||||||
|
for ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345):
|
||||||
|
print(ip)
|
||||||
|
```
|
||||||
|
|
||||||
## How It Works
|
## How It Works
|
||||||
|
|
||||||
### Understanding IP Addresses
|
### Linear Congruential Generator
|
||||||
|
|
||||||
First, let's understand how IP addresses work in our system:
|
PyLCG uses an optimized LCG implementation with carefully chosen parameters:
|
||||||
|
- Multiplier (a): 1664525
|
||||||
|
- Increment (c): 1013904223
|
||||||
|
- Modulus (m): 2^32
|
||||||
|
|
||||||
- An IP address like `192.168.1.1` is really just a 32-bit number equal to `3232235777` or `0xC0A80101` in hexadecimal
|
This generates a deterministic sequence of pseudo-random numbers using the formula:
|
||||||
- A CIDR range like `192.168.0.0/16` represents a continuous range of these numbers
|
```
|
||||||
- For example, `192.168.0.0/16` includes all IPs from `192.168.0.0` to `192.168.255.255` *(65,536 addresses)*
|
next = (a * current + c) mod m
|
||||||
- The 32-bit number can be represented as `0xC0A80000` in hexadecimal & its from `3232235520` to `3232239103` in decimal
|
|
||||||
|
|
||||||
### The Magic of Linear Congruential Generators
|
|
||||||
|
|
||||||
At the heart of PyLCG is something called a Linear Congruential Generator *(LCG)*. Think of it as a mathematical recipe that generates a sequence of numbers that appear random but are actually predictable if you know the starting point *(seed)*.
|
|
||||||
|
|
||||||
Here's how it works:
|
|
||||||
|
|
||||||
1. Start with a number *(called the seed, which can be random)*
|
|
||||||
2. Multiply it by `1664525` & add `1013904223`
|
|
||||||
3. Take the remainder when divided by `2^32` *(the modulo operando)*
|
|
||||||
4. Repeat the process to continue the sequence
|
|
||||||
|
|
||||||
###### Mathematical notation:
|
|
||||||
```math
|
|
||||||
Next_Number = (1664525 * Current_Number + 1013904223) mod 2^32
|
|
||||||
```
|
```
|
||||||
|
|
||||||
###### Why these specific numbers?
|
### Memory-Efficient IP Processing
|
||||||
The numbers `1664525` and `1013904223` are the multiplier and increment values used in a Linear Congruential Generator *(LCG)* for random number generation. This specific combination was featured in "Numerical Recipes in C" and became widely known through its use in glibc's rand() implementation.
|
|
||||||
|
|
||||||
### Sharding: Dividing the Work
|
Instead of loading entire IP ranges into memory, PyLCG:
|
||||||
|
1. Converts CIDR ranges to start/end integers
|
||||||
|
2. Uses generator functions for lazy evaluation
|
||||||
|
3. Calculates IPs on-demand using index mapping
|
||||||
|
4. Maintains constant memory usage regardless of range size
|
||||||
|
|
||||||
PyLCG uses an interleaved sharding approach to ensure truly distributed scanning. Here's how it works:
|
### Sharding Algorithm
|
||||||
|
|
||||||
1. **Interleaved Distribution**: Instead of dividing the IP range into sequential blocks, PyLCG distributes IPs across shards using an offset pattern:
|
The sharding system uses an interleaved approach:
|
||||||
- For 4 shards scanning a network:
|
1. Each shard is assigned a subset of indices based on modulo arithmetic
|
||||||
- Shard 0 handles IPs at indices: 0, 4, 8, 12, ...
|
2. The LCG randomizes the order within each shard
|
||||||
- Shard 1 handles IPs at indices: 1, 5, 9, 13, ...
|
3. Work is distributed evenly across shards
|
||||||
- Shard 2 handles IPs at indices: 2, 6, 10, 14, ...
|
4. No sequential scanning patterns
|
||||||
- Shard 3 handles IPs at indices: 3, 7, 11, 15, ...
|
|
||||||
|
|
||||||
2. **Randomization**: Within each shard, the LCG randomizes the order of IPs:
|
## Performance
|
||||||
- Each index is fed through the LCG to generate a random value
|
|
||||||
- IPs are scanned in order of these random values
|
|
||||||
- The same seed ensures consistent ordering across runs
|
|
||||||
|
|
||||||
This approach ensures:
|
PyLCG is designed for maximum performance:
|
||||||
- Even distribution across the entire IP space
|
- Generates millions of IPs per second
|
||||||
- No sequential scanning patterns that could trigger alerts
|
- Constant memory usage (~100KB)
|
||||||
- Perfect distribution of work across shards
|
- Minimal CPU overhead
|
||||||
- Deterministic results that can be reproduced
|
- No disk I/O required
|
||||||
|
|
||||||
### Memory-Efficient Processing
|
Benchmark results on a typical system:
|
||||||
|
- IP Generation: ~5-10 million IPs/second
|
||||||
|
- Memory Usage: < 1MB for any range size
|
||||||
|
- LCG Operations: < 1 microsecond per number
|
||||||
|
|
||||||
To handle large IP ranges without consuming too much memory, PyLCG uses several techniques:
|
## Contributing
|
||||||
|
|
||||||
1. **Chunked Processing**
|
### Performance Optimization
|
||||||
Instead of loading all IPs at once, it processes them in chunks.
|
|
||||||
|
|
||||||
2. **Lazy Generation**
|
We welcome contributions that improve PyLCG's performance. When submitting optimizations:
|
||||||
- IPs are generated only when needed using Python's async generators
|
|
||||||
- The system yields one IP at a time rather than creating huge lists
|
|
||||||
- This keeps memory usage constant regardless of IP range size
|
|
||||||
|
|
||||||
3. **Direct Calculation**
|
1. Run the included benchmark suite:
|
||||||
- The LCG can jump directly to any position in its sequence
|
```bash
|
||||||
- No need to generate all previous numbers
|
python3 unit_test.py
|
||||||
- Enables efficient random access to any part of the sequence
|
```
|
||||||
|
|
||||||
___
|
2. Include before/after benchmark results for:
|
||||||
|
- IP generation speed
|
||||||
|
- Memory usage
|
||||||
|
- LCG sequence generation
|
||||||
|
- Shard distribution metrics
|
||||||
|
|
||||||
|
3. Consider optimizing:
|
||||||
|
- Number generation algorithms
|
||||||
|
- Memory access patterns
|
||||||
|
- CPU cache utilization
|
||||||
|
- Python-specific optimizations
|
||||||
|
|
||||||
|
4. Document any tradeoffs between:
|
||||||
|
- Speed vs memory usage
|
||||||
|
- Randomness vs performance
|
||||||
|
- Complexity vs maintainability
|
||||||
|
|
||||||
|
### Benchmark Guidelines
|
||||||
|
|
||||||
|
When running benchmarks:
|
||||||
|
1. Use consistent hardware/environment
|
||||||
|
2. Run multiple iterations
|
||||||
|
3. Test with various CIDR ranges
|
||||||
|
4. Measure both average and worst-case performance
|
||||||
|
5. Profile memory usage patterns
|
||||||
|
6. Test shard distribution uniformity
|
||||||
|
|
||||||
## Roadmap
|
## Roadmap
|
||||||
|
|
||||||
- [ ] Add support for IPv6
|
- [ ] IPv6 support
|
||||||
- [ ] Add support for custom LCG parameters like adding port numbers
|
- [ ] Custom LCG parameters
|
||||||
- [ ] Add support for custom chunk sizes & auto-tuning based on available system resources
|
- [ ] Configurable chunk sizes
|
||||||
- [ ] Add support for resuming from a specific point in the sequence
|
- [ ] State persistence
|
||||||
- [ ] Add support for saving the state of the LCG to a file so you can resume later
|
- [ ] Resume capability
|
||||||
- [ ] Add support for sharding line-based input files locally, from as s3 bucket, or from a URL by reading it in chunks.
|
- [ ] S3/URL input support
|
||||||
- [ ] Update the unit tests to include benchmarks & better coverage for future efficiency improvements & validation.
|
- [ ] Extended benchmark suite
|
||||||
|
|
||||||
___
|
## License
|
||||||
|
|
||||||
###### Mirrors for this repository: [acid.vegas](https://git.acid.vegas/pylcg) • [SuperNETs](https://git.supernets.org/acidvegas/pylcg) • [GitHub](https://github.com/acidvegas/pylcg) • [GitLab](https://gitlab.com/acidvegas/pylcg) • [Codeberg](https://codeberg.org/acidvegas/pylcg)
|
This project is released under the MIT License.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
###### Mirrors: [acid.vegas](https://git.acid.vegas/pylcg) • [SuperNETs](https://git.supernets.org/acidvegas/pylcg) • [GitHub](https://github.com/acidvegas/pylcg) • [GitLab](https://gitlab.com/acidvegas/pylcg) • [Codeberg](https://codeberg.org/acidvegas/pylcg)
|
||||||
|
Loading…
Reference in New Issue
Block a user