Initial commit

This commit is contained in:
Dionysus 2024-11-25 22:28:06 -05:00
commit 080d46ea3d
Signed by: acidvegas
GPG Key ID: EF4B922DB85DC9DE
3 changed files with 467 additions and 0 deletions

161
README.md Normal file
View File

@ -0,0 +1,161 @@
# PyLCG
> Linear Congruential Generator for IP Sharding
PyLCG is a Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators *(LCG)* for deterministic random number generation. This tool enables distributed scanning and network reconnaissance by efficiently dividing IP ranges across multiple machines.
___
## Table of Contents
- [Project Origins & Purpose](#project-origins-and-purpose)
- [Overview](#overview)
- [How It Works](#how-it-works)
- [Understanding IP Addresses](#understanding-ip-addresses)
- [The Magic of Linear Congruential Generators](#the-magic-of-linear-congruential-generators)
- [Sharding: Dividing the Work](#sharding-dividing-the-work)
- [Memory-Efficient Processing](#memory-efficient-processing)
- [Real-World Applications](#real-world-applications)
- [Network Security Testing](#network-security-testing)
- [Cloud-Based Scanning](#cloud-based-scanning)
___
## Project Origins & Purpose
PyLCG was inspired by the elegant IP distribution system used in [masscan](https://github.com/robertdavidgraham/masscan), the popular mass IP port scanner. While masscan implements this logic as part of its larger codebase, I wanted to isolate and implement this specific component as a standalone Python library that developers can easily integrate into their own projects.
The goal was to create a clean, well-documented implementation that:
- Can be used as a drop-in solution for any project needing IP distribution capabilities
- Provides the same reliable mathematical foundation as masscan's approach
- Is easy to understand and modify for specific needs
- Works well with modern Python async patterns
By extracting this functionality into its own library, developers can add sophisticated IP distribution capabilities to their network tools without having to reinvent the wheel or extract code from larger projects.
___
## Overview
When performing network reconnaissance or scanning large IP ranges, it's often necessary to split the work across multiple machines. However, this presents several challenges:
1. You want to ensure each machine works on a different part of the network *(no overlap)*
2. You want to avoid scanning IPs in sequence *(which can trigger security alerts)*
3. You need a way to resume scans if a machine fails
4. You can't load millions of IPs into memory at once
PyLCG solves these challenges through clever mathematics and efficient algorithms.
___
## How It Works
### Understanding IP Addresses
First, let's understand how IP addresses work in our system:
- An IP address like `192.168.1.1` is really just a 32-bit number
- A CIDR range like `192.168.0.0/16` represents a continuous range of these numbers
- For example, `192.168.0.0/16` includes all IPs from `192.168.0.0` to `192.168.255.255` *(65,536 addresses)*
### The Magic of Linear Congruential Generators
At the heart of PyLCG is something called a Linear Congruential Generator *(LCG)*. Think of it as a mathematical recipe that generates a sequence of numbers that appear random but are actually predictable if you know the starting point *(seed)*.
Here's how it works:
1. Start with a number *(called the seed)*
2. Multiply it by a carefully chosen constant *(1597 in our case)*
3. Add another carefully chosen constant *(51749)*
4. Take the remainder when divided by 2^32
5. That's your next number! Repeat the process to get more numbers
In mathematical notation:
```
Next_Number = (1597 * Current_Number + 51749) mod 2^32
```
Why these specific numbers?
- `1597` and `51749` were chosen because they create a sequence that:
- Visits every possible number before repeating *(maximum period)*
- Spreads numbers evenly across the range
- Can be calculated quickly on computers
- `2^32` *(4,294,967,296)* is used because it:
- Matches the size of a 32-bit integer
- Is large enough to handle any IP range
- Makes calculations efficient on modern CPUs
### Sharding: Dividing the Work
Let's say you want to scan a /16 network *(65,536 IPs)* using 4 machines. Here's how PyLCG handles it:
1. **Division**: First, it divides the total IPs evenly:
- 65,536 ÷ 4 = 16,384 IPs per shard
- Machine 1: IPs 0-16,383
- Machine 2: IPs 16,384-32,767
- Machine 3: IPs 32,768-49,151
- Machine 4: IPs 49,152-65,535
2. **Randomization**: Within each shard, IPs are randomized using the LCG:
- Each IP index *(0 to 65,535)* is fed through the LCG
- The resulting numbers determine the scan order
- Because we use the same seed, this order is consistent across runs
Example of how IPs might be ordered in Shard 1:
```
Original order: 0, 1, 2, 3, 4, 5...
LCG values: 51749, 134238, 297019, 12983...
Final order: 3, 5, 1, 4, 2, 0... (sorted by LCG values)
```
### Memory-Efficient Processing
To handle large IP ranges without consuming too much memory, PyLCG uses several techniques:
1. **Chunked Processing**
Instead of loading all IPs at once, it processes them in chunks:
```python
# Example with chunk_size = 1000
Chunk 1: Process IPs 0-999
Chunk 2: Process IPs 1000-1999
...and so on
```
2. **Lazy Generation**
- IPs are generated only when needed using Python's async generators
- The system yields one IP at a time rather than creating huge lists
- This keeps memory usage constant regardless of IP range size
3. **Direct Calculation**
- The LCG can jump directly to any position in its sequence
- No need to generate all previous numbers
- Enables efficient random access to any part of the sequence
___
## Real-World Applications
### Network Security Testing
Imagine you're testing the security of a large corporate network:
- You have 5 scanning machines
- You need to scan 1 million IPs
- You want to avoid triggering IDS/IPS systems
PyLCG helps by:
1. Dividing the IPs evenly across your 5 machines
2. Randomizing the scan order to avoid detection
3. Allowing you to pause/resume scans from any point
4. Using minimal memory on each machine
### Cloud-Based Scanning
In cloud environments, PyLCG is particularly useful:
- Easily scale up/down the number of scanning instances
- Each instance knows exactly which IPs to scan
- Consistent results across multiple runs
- Efficient resource usage keeps costs down
___
###### Mirrors for this repository: [acid.vegas](https://git.acid.vegas/pylcg) • [SuperNETs](https://git.supernets.org/acidvegas/pylcg) • [GitHub](https://github.com/acidvegas/pylcg) • [GitLab](https://gitlab.com/acidvegas/pylcg) • [Codeberg](https://codeberg.org/acidvegas/pylcg)

134
pylcg.py Normal file
View File

@ -0,0 +1,134 @@
#!/usr/bin/env python3
# Python implementation of a Linear Congruential Generator for IP Sharding - Developed by acidvegas in Python (https://git.acid.vegas/pylcg)
# pylcg.py
import argparse
import asyncio
import ipaddress
from math import ceil
class LCG:
'''Linear Congruential Generator for deterministic random number generation'''
def __init__(self, seed: int, m: int = 2**32):
self.m = m
self.a = 1597
self.c = 51749
self.seed = seed
self.current = seed
def get_nth(self, n: int) -> int:
'''
Get the nth number in the sequence without generating previous numbers.
:param n: The index of the number to get
'''
# For large n, use the standard next() method to avoid modular arithmetic issues
if n > 1000:
self.current = self.seed
for _ in range(n):
self.next()
return self.current
# For smaller n, use direct calculation
result = self.seed
for _ in range(n):
result = (self.a * result + self.c) % self.m
return result
def next(self) -> int:
'''Generate next random number'''
self.current = (self.a * self.current + self.c) % self.m
return self.current
class IPRange:
'''Memory-efficient IP range iterator'''
def __init__(self, cidr: str):
network = ipaddress.ip_network(cidr)
self.start = int(network.network_address)
self.end = int(network.broadcast_address)
self.total = self.end - self.start + 1
def get_ip_at_index(self, index: int) -> str:
'''
Get IP at specific index without generating previous IPs
:param index: The index of the IP to get
'''
if not 0 <= index < self.total:
raise IndexError('IP index out of range')
return str(ipaddress.ip_address(self.start + index))
async def get_shard_ips(cidr: str, shard_num: int, total_shards: int, seed: int, chunk_size: int = 1000):
'''
Asynchronously generate IPs for the specified shard.
:param cidr: The CIDR range to shard
:param shard_num: The number of the shard to generate
:param total_shards: The total number of shards
:param seed: The seed for the random number generator
:param chunk_size: The size of the chunks to process
'''
# Initialize the IP range and LCG
ip_range = IPRange(cidr)
lcg = LCG(seed)
total_ips = ip_range.total
# Calculate which indices belong to this shard
shard_size = ceil(total_ips / total_shards)
start_idx = shard_num * shard_size
end_idx = min(start_idx + shard_size, total_ips)
# Process in chunks to maintain memory efficiency
for chunk_start in range(start_idx, end_idx, chunk_size):
chunk_end = min(chunk_start + chunk_size, end_idx)
chunk_indices = list(range(chunk_start, chunk_end))
# Generate random values for this chunk
chunk_random_values = [(i, lcg.get_nth(i)) for i in chunk_indices]
chunk_random_values.sort(key=lambda x: x[1])
# Yield IPs in randomized order
for idx, _ in chunk_random_values:
yield ip_range.get_ip_at_index(idx)
# Allow other tasks to run (do we need this?)
await asyncio.sleep(0)
async def main():
parser = argparse.ArgumentParser(description='Async IP address sharding tool')
parser.add_argument('cidr', help='Target IP range in CIDR format')
parser.add_argument('shard_num', type=int, help='Shard number (0-based)')
parser.add_argument('total_shards', type=int, help='Total number of shards')
parser.add_argument('--seed', type=int, default=12345, help='Random seed for LCG')
parser.add_argument('--chunk-size', type=int, default=1000, help='Processing chunk size')
args = parser.parse_args()
if args.shard_num >= args.total_shards:
raise ValueError('Shard number must be less than total shards')
if args.shard_num < 0 or args.total_shards < 1:
raise ValueError('Invalid shard configuration')
async for ip in get_shard_ips(args.cidr, args.shard_num, args.total_shards, args.seed, args.chunk_size):
print(ip)
if __name__ == '__main__':
asyncio.run(main())

172
unit_test.py Normal file
View File

@ -0,0 +1,172 @@
#!/usr/bin/env python3
# Python implementation of a Linear Congruential Generator for IP Sharding - Developed by acidvegas in Python (https://git.acid.vegas/pylcg)
# pylcg.py
import unittest
import asyncio
import ipaddress
import sys
import time
from pylcg import IPRange, get_shard_ips, LCG
# ANSI color codes
class Colors:
BLUE = '\033[94m'
GREEN = '\033[92m'
YELLOW = '\033[93m'
CYAN = '\033[96m'
RED = '\033[91m'
ENDC = '\033[0m'
def progress_bar(iteration: int, total: int, prefix: str = '', length: int = 50) -> None:
'''Simple progress bar using standard Python'''
percent = f"{100 * (iteration / float(total)):.1f}"
filled_length = int(length * iteration // total)
bar = '' * filled_length + '-' * (length - filled_length)
sys.stdout.write(f'\r{Colors.CYAN}{prefix} |{bar}| {percent}%{Colors.ENDC} ')
if iteration == total:
sys.stdout.write('\n')
sys.stdout.flush()
def print_header(message: str) -> None:
'''Print formatted header'''
print(f'\n{Colors.BLUE}{"="*80}')
print(f'TEST: {message}')
print(f'{"="*80}{Colors.ENDC}\n')
def print_success(message: str) -> None:
'''Print success message'''
print(f'{Colors.GREEN}{message}{Colors.ENDC}')
def print_progress(message: str) -> None:
'''Print progress message'''
print(f"{Colors.YELLOW}{message}{Colors.ENDC}")
class TestIPSharder(unittest.TestCase):
@classmethod
def setUpClass(cls):
'''Set up test parameters'''
print_header('Setting up test environment')
cls.test_cidr = '192.0.0.0/16' # 65,536 IPs
cls.test_seed = 12345
cls.total_shards = 4
cls.chunk_size = 1000
# Calculate expected IPs
network = ipaddress.ip_network(cls.test_cidr)
cls.all_ips = {str(ip) for ip in network}
print_success(f"Initialized test environment with {len(cls.all_ips):,} IPs")
def setUp(self):
'''Create event loop for each test'''
self.loop = asyncio.new_event_loop()
asyncio.set_event_loop(self.loop)
def tearDown(self):
'''Clean up event loop'''
self.loop.close()
async def collect_shard_ips(self, shard_num: int):
'''Helper to collect IPs from a shard'''
return {ip async for ip in get_shard_ips(self.test_cidr, shard_num, self.total_shards, self.test_seed, self.chunk_size)}
def test_ip_range_initialization(self):
'''Test IPRange class initialization and calculations'''
print_header('Testing IPRange initialization')
ip_range = IPRange(self.test_cidr)
self.assertEqual(ip_range.total, 65536)
print_success('IP range size correctly calculated')
first_ip = ip_range.get_ip_at_index(0)
last_ip = ip_range.get_ip_at_index(ip_range.total - 1)
print_success(f'IP range spans from {first_ip} to {last_ip}')
def test_shard_completeness(self):
'''Test that all IPs are covered exactly once across all shards'''
print_header('Testing shard completeness')
async def check_completeness():
seen_ips = set()
shard_sizes = []
for shard_num in range(self.total_shards):
progress_bar(shard_num, self.total_shards-1, prefix='Processing shards')
shard_ips = await self.collect_shard_ips(shard_num)
shard_sizes.append(len(shard_ips))
# Check for duplicates and overlap
self.assertEqual(len(shard_ips), len(set(shard_ips)),
f'Duplicates found in shard {shard_num}')
overlap = seen_ips & shard_ips
self.assertEqual(len(overlap), 0,
f'Overlap found with previous shards: {overlap}')
seen_ips.update(shard_ips)
# Verify all IPs are covered
self.assertEqual(seen_ips, self.all_ips,
'Not all IPs were covered by the shards')
print_success(f'All {len(self.all_ips):,} IPs were distributed across shards')
# Print distribution information
for i, size in enumerate(shard_sizes):
print(f"{Colors.CYAN}Shard {i}: {size:,} IPs{Colors.ENDC}")
self.loop.run_until_complete(check_completeness())
def test_lcg_sequence(self):
'''Test LCG sequence generation and performance'''
print_header('Testing LCG sequence generation')
lcg = LCG(seed=self.test_seed)
# Test small sequence
small_n = 100
start_time = time.perf_counter()
small_result = lcg.get_nth(small_n)
small_time = time.perf_counter() - start_time
print_success(f'Small sequence (n={small_n:,}) generated in {small_time:.6f}s')
# Test large sequence
large_n = 1_000_000
start_time = time.perf_counter()
large_result = lcg.get_nth(large_n)
large_time = time.perf_counter() - start_time
print_success(f'Large sequence (n={large_n:,}) generated in {large_time:.6f}s')
# Verify deterministic behavior
lcg2 = LCG(seed=self.test_seed)
print_progress('Verifying sequence determinism...')
for i in range(large_n):
if i % (large_n // 100) == 0: # Update progress every 1%
progress_bar(i, large_n, prefix='Verifying sequence')
lcg2.next()
progress_bar(large_n, large_n, prefix='Verifying sequence')
self.assertEqual(large_result, lcg2.current, 'LCG sequence is not deterministic')
print_success('LCG produces consistent results')
if __name__ == '__main__':
print(f"\n{Colors.CYAN}{'='*80}")
print(f"Starting IP Sharder Tests - Testing with {65536:,} IPs (/16 network)")
print(f"{'='*80}{Colors.ENDC}\n")
unittest.main(verbosity=2)