Compare commits

..

No commits in common. "v1.0.1" and "main" have entirely different histories.
v1.0.1 ... main

9 changed files with 420 additions and 381 deletions

View File

@ -1,6 +1,6 @@
ISC License ISC License
Copyright (c) 2024, acidvegas <acid.vegas@acid.vegas> Copyright (c) 2025, acidvegas <acid.vegas@acid.vegas>
Permission to use, copy, modify, and/or distribute this software for any Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above purpose with or without fee is hereby granted, provided that the above

174
README.md
View File

@ -3,6 +3,8 @@
PyLCG is a high-performance Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators (LCG) for deterministic random number generation. This tool enables distributed scanning & network reconnaissance by efficiently dividing IP ranges across multiple machines while maintaining pseudo-random ordering. PyLCG is a high-performance Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators (LCG) for deterministic random number generation. This tool enables distributed scanning & network reconnaissance by efficiently dividing IP ranges across multiple machines while maintaining pseudo-random ordering.
###### A GoLang version of this library is also available [here](https://github.com/acidvegas/golcg)
## Features ## Features
- Memory-efficient IP range processing - Memory-efficient IP range processing
@ -10,28 +12,35 @@ PyLCG is a high-performance Python implementation of a memory-efficient IP addre
- High-performance LCG implementation - High-performance LCG implementation
- Support for sharding across multiple machines - Support for sharding across multiple machines
- Zero dependencies beyond Python standard library - Zero dependencies beyond Python standard library
- Simple command-line interface - Simple command-line interface and library usage
## Installation ## Installation
### From PyPI
```bash ```bash
pip install pylcg pip install pylcg
``` ```
### From Source
```bash
git clone https://github.com/acidvegas/pylcg
cd pylcg
chmod +x pylcg.py
```
## Usage ## Usage
### Command Line ### Command Line
```bash ```bash
./pylcg.py 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345 pylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345
# Resume from previous state
pylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345 --state 987654321
# Pipe to dig for PTR record lookups
pylcg 192.168.0.0/16 --seed 12345 | while read ip; do
echo -n "$ip -> "
dig +short -x $ip
done
# One-liner for PTR lookups
pylcg 198.150.0.0/16 | xargs -I {} dig +short -x {}
# Parallel PTR lookups
pylcg 198.150.0.0/16 | parallel "dig +short -x {} | sed 's/^/{} -> /'"
``` ```
### As a Library ### As a Library
@ -39,55 +48,103 @@ chmod +x pylcg.py
```python ```python
from pylcg import ip_stream from pylcg import ip_stream
# Generate IPs for the first shard of 4 total shards # Basic usage
for ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345): for ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345):
print(ip) print(ip)
# Resume from previous state
for ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345, state=987654321):
print(ip)
``` ```
## State Management & Resume Capability
PyLCG automatically saves its state every 1000 IPs processed to enable resume functionality in case of interruption. The state is saved to a temporary file in your system's temp directory (usually `/tmp` on Unix systems or `%TEMP%` on Windows).
The state file follows the naming pattern:
```
pylcg_[seed]_[cidr]_[shard]_[total].state
```
For example:
```
pylcg_12345_192.168.0.0_16_1_4.state
```
The state is saved in memory-mapped temporary storage to minimize disk I/O and improve performance. To resume from a previous state:
1. Locate your state file in the temp directory
2. Read the state value from the file
3. Use the same parameters (CIDR, seed, shard settings) with the `--state` parameter
Example of resuming:
```bash
# Read the last state
state=$(cat /tmp/pylcg_12345_192.168.0.0_16_1_4.state)
# Resume processing
pylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345 --state $state
```
Note: When using the `--state` parameter, you must provide the same `--seed` that was used in the original run.
## How It Works ## How It Works
### IP Address Integer Representation
Every IPv4 address is fundamentally a 32-bit number. For example, the IP address "192.168.1.1" can be broken down into its octets (192, 168, 1, 1) and converted to a single integer:
```
192.168.1.1 = (192 × 256³) + (168 × 256²) + (1 × 256¹) + (1 × 256⁰)
= 3232235777
```
This integer representation allows us to treat IP ranges as simple number sequences. A CIDR block like "192.168.0.0/16" becomes a continuous range of integers:
- Start: 192.168.0.0 → 3232235520
- End: 192.168.255.255 → 3232301055
By working with these integer representations, we can perform efficient mathematical operations on IP addresses without the overhead of string manipulation or complex data structures. This is where the Linear Congruential Generator comes into play.
### Linear Congruential Generator ### Linear Congruential Generator
PyLCG uses an optimized LCG implementation with carefully chosen parameters: PyLCG uses an optimized LCG implementation with three carefully chosen parameters that work together to generate high-quality pseudo-random sequences:
| Name | Variable | Value | | Name | Variable | Value |
|------------|----------|--------------| |------------|----------|--------------|
| Multiplier | `a` | `1664525` | | Multiplier | `a` | `1664525` |
| Increment | `c` | `1013904223` | | Increment | `c` | `1013904223` |
| Modulus | `m` | `2^32` | | Modulus | `m` | `2^32` |
This generates a deterministic sequence of pseudo-random numbers using the formula: ###### Modulus
``` The modulus value of `2^32` serves as both a mathematical and performance optimization choice. It perfectly matches the CPU's word size, allowing for extremely efficient modulo operations through simple bitwise AND operations. This choice means that all calculations stay within the natural bounds of CPU arithmetic while still providing a large enough period for even the biggest IP ranges we might encounter.
next = (a * current + c) mod m
```
### Memory-Efficient IP Processing ###### Multiplier
The multiplier value of `1664525` was originally discovered through extensive mathematical analysis for the Numerical Recipes library. It satisfies the Hull-Dobell theorem's strict requirements for maximum period length in power-of-2 modulus LCGs, being both relatively prime to the modulus and one more than a multiple of 4. This specific value also performs exceptionally well in spectral tests, ensuring good distribution properties across the entire range while being small enough to avoid intermediate overflow in 32-bit arithmetic.
Instead of loading entire IP ranges into memory, PyLCG: ###### Increment
1. Converts CIDR ranges to start/end integers The increment value of `1013904223` is a carefully selected prime number that completes our parameter trio. When combined with our chosen multiplier and modulus, it ensures optimal bit mixing throughout the sequence and helps eliminate common LCG issues like short cycles or poor distribution. This specific value was selected after extensive testing showed it produced excellent statistical properties and passed rigorous spectral tests for dimensional distribution.
2. Uses generator functions for lazy evaluation
3. Calculates IPs on-demand using index mapping ### Applying LCG to IP Addresses
4. Maintains constant memory usage regardless of range size
Once we have our IP addresses as integers, the LCG is used to generate a pseudo-random sequence that permutes through all possible values in our IP range:
1. For a given IP range *(start_ip, end_ip)*, we calculate the range size: `range_size = end_ip - start_ip + 1`
2. The LCG generates a sequence using the formula: `X_{n+1} = (a * X_n + c) mod m`
3. To map this sequence back to valid IPs in our range:
- Generate the next LCG value
- Take modulo of the value with range_size to get an offset: `offset = lcg_value % range_size`
- Add this offset to start_ip: `ip = start_ip + offset`
This process ensures that:
- Every IP in the range is visited exactly once
- The sequence appears random but is deterministic
- We maintain constant memory usage regardless of range size
- The same seed always produces the same sequence
### Sharding Algorithm ### Sharding Algorithm
The sharding system uses an interleaved approach: The sharding system employs an interleaved approach that ensures even distribution of work across multiple machines while maintaining randomness. Each shard operates independently using a deterministic sequence derived from the base seed plus the shard index. The system distributes IPs across shards using modulo arithmetic, ensuring that each IP is assigned to exactly one shard. This approach prevents sequential scanning patterns while guaranteeing complete coverage of the IP range. The result is a system that can efficiently parallelize work across any number of machines while maintaining the pseudo-random ordering that's crucial for network scanning applications.
1. Each shard is assigned a subset of indices based on modulo arithmetic
2. The LCG randomizes the order within each shard
3. Work is distributed evenly across shards
4. No sequential scanning patterns
## Performance
PyLCG is designed for maximum performance:
- Generates millions of IPs per second
- Constant memory usage (~100KB)
- Minimal CPU overhead
- No disk I/O required
Benchmark results on a typical system:
- IP Generation: ~5-10 million IPs/second
- Memory Usage: < 1MB for any range size
- LCG Operations: < 1 microsecond per number
## Contributing ## Contributing
@ -100,43 +157,6 @@ We welcome contributions that improve PyLCG's performance. When submitting optim
python3 unit_test.py python3 unit_test.py
``` ```
2. Include before/after benchmark results for:
- IP generation speed
- Memory usage
- LCG sequence generation
- Shard distribution metrics
3. Consider optimizing:
- Number generation algorithms
- Memory access patterns
- CPU cache utilization
- Python-specific optimizations
4. Document any tradeoffs between:
- Speed vs memory usage
- Randomness vs performance
- Complexity vs maintainability
### Benchmark Guidelines
When running benchmarks:
1. Use consistent hardware/environment
2. Run multiple iterations
3. Test with various CIDR ranges
4. Measure both average and worst-case performance
5. Profile memory usage patterns
6. Test shard distribution uniformity
## Roadmap
- [ ] IPv6 support
- [ ] Custom LCG parameters
- [ ] Configurable chunk sizes
- [ ] State persistence
- [ ] Resume capability
- [ ] S3/URL input support
- [ ] Extended benchmark suite
--- ---
###### Mirrors: [acid.vegas](https://git.acid.vegas/pylcg) • [SuperNETs](https://git.supernets.org/acidvegas/pylcg) • [GitHub](https://github.com/acidvegas/pylcg) • [GitLab](https://gitlab.com/acidvegas/pylcg) • [Codeberg](https://codeberg.org/acidvegas/pylcg) ###### Mirrors: [acid.vegas](https://git.acid.vegas/pylcg) • [SuperNETs](https://git.supernets.org/acidvegas/pylcg) • [GitHub](https://github.com/acidvegas/pylcg) • [GitLab](https://gitlab.com/acidvegas/pylcg) • [Codeberg](https://codeberg.org/acidvegas/pylcg)

114
pylcg.py
View File

@ -1,114 +0,0 @@
#!/usr/bin/env python3
# Python implementation of a Linear Congruential Generator for IP Sharding - Developed by acidvegas in Python (https://git.acid.vegas/pylcg)
# pylcg.py
import argparse
import ipaddress
import random
class LCG:
'''Linear Congruential Generator for deterministic random number generation'''
def __init__(self, seed: int, m: int = 2**32):
self.m = m
self.a = 1664525
self.c = 1013904223
self.current = seed
def next(self) -> int:
'''Generate next random number'''
self.current = (self.a * self.current + self.c) % self.m
return self.current
class IPRange:
'''Memory-efficient IP range iterator'''
def __init__(self, cidr: str):
network = ipaddress.ip_network(cidr)
self.start = int(network.network_address)
self.total = int(network.broadcast_address) - self.start + 1
def get_ip_at_index(self, index: int) -> str:
'''
Get IP at specific index without generating previous IPs
:param index: The index of the IP to get
'''
if not 0 <= index < self.total:
raise IndexError('IP index out of range')
return str(ipaddress.ip_address(self.start + index))
def ip_stream(cidr: str, shard_num: int = 1, total_shards: int = 1, seed: int = 0):
'''
Stream random IPs from the CIDR range. Optionally supports sharding.
Each IP in the range will be yielded exactly once in a pseudo-random order.
:param cidr: Target IP range in CIDR format
:param shard_num: Shard number (1-based), defaults to 1
:param total_shards: Total number of shards, defaults to 1 (no sharding)
:param seed: Random seed for LCG (default: random)
'''
# Convert to 0-based indexing internally
shard_index = shard_num - 1
# Initialize IP range and LCG
ip_range = IPRange(cidr)
# Use random seed if none provided
if not seed:
seed = random.randint(0, 2**32-1)
# Initialize LCG
lcg = LCG(seed + shard_index)
# Calculate how many IPs this shard should generate
shard_size = ip_range.total // total_shards
# Distribute remainder
if shard_index < (ip_range.total % total_shards):
shard_size += 1
# Remaining IPs to yield
remaining = shard_size
while remaining > 0:
index = lcg.next() % ip_range.total
if total_shards == 1 or index % total_shards == shard_index:
yield ip_range.get_ip_at_index(index)
remaining -= 1
def main():
parser = argparse.ArgumentParser(description='Ultra-fast random IP address generator with optional sharding')
parser.add_argument('cidr', help='Target IP range in CIDR format')
parser.add_argument('--shard-num', type=int, default=1, help='Shard number (1-based)')
parser.add_argument('--total-shards', type=int, default=1, help='Total number of shards (default: 1, no sharding)')
parser.add_argument('--seed', type=int, default=0, help='Random seed for LCG')
args = parser.parse_args()
if args.total_shards < 1:
raise ValueError('Total shards must be at least 1')
if args.shard_num > args.total_shards:
raise ValueError('Shard number must be less than or equal to total shards')
if args.shard_num < 1:
raise ValueError('Shard number must be at least 1')
for ip in ip_stream(args.cidr, args.shard_num, args.total_shards, args.seed):
print(ip)
if __name__ == '__main__':
main()

View File

@ -1,5 +1,9 @@
#!/usr/bin/env python
# PyLCG - Linear Congruential Generator for IP Sharding - Developed by acidvegas ib Python (https://github.com/acidvegas/pylcg)
# pylcg/__init__.py
from .core import LCG, IPRange, ip_stream from .core import LCG, IPRange, ip_stream
__version__ = "1.0.0" __version__ = "1.0.3"
__author__ = "acidvegas" __author__ = "acidvegas"
__all__ = ["LCG", "IPRange", "ip_stream"] __all__ = ["LCG", "IPRange", "ip_stream"]

View File

@ -1,26 +1,38 @@
#!/usr/bin/env python
# PyLCG - Linear Congruential Generator for IP Sharding - Developed by acidvegas ib Python (https://github.com/acidvegas/pylcg)
# pylcg/cli.py
import argparse import argparse
from .core import ip_stream from .core import ip_stream
def main(): def main():
parser = argparse.ArgumentParser(description='Ultra-fast random IP address generator with optional sharding') parser = argparse.ArgumentParser(description='Ultra-fast random IP address generator with optional sharding')
parser.add_argument('cidr', help='Target IP range in CIDR format') parser.add_argument('cidr', help='Target IP range in CIDR format')
parser.add_argument('--shard-num', type=int, default=1, help='Shard number (1-based)') parser.add_argument('--shard-num', type=int, default=1, help='Shard number (1-based)')
parser.add_argument('--total-shards', type=int, default=1, help='Total number of shards (default: 1, no sharding)') parser.add_argument('--total-shards', type=int, default=1, help='Total number of shards (default: 1, no sharding)')
parser.add_argument('--seed', type=int, default=0, help='Random seed for LCG') parser.add_argument('--seed', type=int, required=True, help='Random seed for LCG (required)')
parser.add_argument('--state', type=int, help='Resume from specific LCG state (must be used with same seed)')
args = parser.parse_args() args = parser.parse_args()
if args.total_shards < 1: if args.total_shards < 1:
raise ValueError('Total shards must be at least 1') raise ValueError('Total shards must be at least 1')
if args.shard_num > args.total_shards: if args.shard_num > args.total_shards:
raise ValueError('Shard number must be less than or equal to total shards') raise ValueError('Shard number must be less than or equal to total shards')
if args.shard_num < 1:
raise ValueError('Shard number must be at least 1')
if args.state is not None and not args.seed:
raise ValueError('When using --state, you must provide the same --seed that was used originally')
for ip in ip_stream(args.cidr, args.shard_num, args.total_shards, args.seed, args.state):
print(ip)
if args.shard_num < 1:
raise ValueError('Shard number must be at least 1')
for ip in ip_stream(args.cidr, args.shard_num, args.total_shards, args.seed):
print(ip)
if __name__ == '__main__': if __name__ == '__main__':
main() main()

View File

@ -1,19 +1,93 @@
#!/usr/bin/env python
# PyLCG - Linear Congruential Generator for IP Sharding - Developed by acidvegas ib Python (https://github.com/acidvegas/pylcg)
# pylcg/core.py
import ipaddress import ipaddress
import random import random
class LCG: class LCG:
'''Linear Congruential Generator for deterministic random number generation''' '''Linear Congruential Generator for deterministic random number generation'''
def __init__(self, seed: int, m: int = 2**32): def __init__(self, seed: int, m: int = 2**32):
self.m = m self.m = m
self.a = 1664525 self.a = 1664525
self.c = 1013904223 self.c = 1013904223
self.current = seed self.current = seed
def next(self) -> int: def next(self) -> int:
'''Generate next random number''' '''Generate next random number'''
self.current = (self.a * self.current + self.c) % self.m
return self.current
# Rest of the code from pylcg.py goes here... self.current = (self.a * self.current + self.c) % self.m
# (IPRange class and ip_stream function) return self.current
class IPRange:
'''Memory-efficient IP range iterator'''
def __init__(self, cidr: str):
network = ipaddress.ip_network(cidr)
self.start = int(network.network_address)
self.total = int(network.broadcast_address) - self.start + 1
def get_ip_at_index(self, index: int) -> str:
'''
Get IP at specific index without generating previous IPs
:param index: The index of the IP to get
'''
if not 0 <= index < self.total:
raise IndexError('IP index out of range')
return str(ipaddress.ip_address(self.start + index))
def ip_stream(cidr: str, shard_num: int = 1, total_shards: int = 1, seed: int = 0, state: int = None):
'''
Stream random IPs from the CIDR range. Optionally supports sharding.
Each IP in the range will be yielded exactly once in a pseudo-random order.
:param cidr: Target IP range in CIDR format
:param shard_num: Shard number (1-based), defaults to 1
:param total_shards: Total number of shards, defaults to 1 (no sharding)
:param seed: Random seed for LCG (default: random)
:param state: Resume from specific LCG state (default: None)
'''
# Convert to 0-based indexing internally
shard_index = shard_num - 1
# Initialize IP range and LCG
ip_range = IPRange(cidr)
# Use random seed if none provided
if not seed:
seed = random.randint(0, 2**32-1)
# Initialize LCG
lcg = LCG(seed + shard_index)
# Set LCG state if provided
if state is not None:
lcg.current = state
# Calculate how many IPs this shard should generate
shard_size = ip_range.total // total_shards
# Distribute remainder
if shard_index < (ip_range.total % total_shards):
shard_size += 1
# Remaining IPs to yield
remaining = shard_size
while remaining > 0:
index = lcg.next() % ip_range.total
if total_shards == 1 or index % total_shards == shard_index:
yield ip_range.get_ip_at_index(index)
remaining -= 1
# Save state every 1000 IPs
if remaining % 1000 == 0:
from .state import save_state
save_state(seed, cidr, shard_num, total_shards, lcg.current)

24
pylcg/state.py Normal file
View File

@ -0,0 +1,24 @@
#!/usr/bin/env python
# PyLCG - Linear Congruential Generator for IP Sharding - Developed by acidvegas ib Python (https://github.com/acidvegas/pylcg)
# pylcg/state.py
import os
import tempfile
def save_state(seed: int, cidr: str, shard: int, total: int, lcg_current: int):
'''
Save LCG state to temp file
:param seed: Random seed for LCG
:param cidr: Target IP range in CIDR format
:param shard: Shard number (1-based)
:param total: Total number of shards
:param lcg_current: Current LCG state
'''
file_name = f'pylcg_{seed}_{cidr.replace("/", "_")}_{shard}_{total}.state'
state_file = os.path.join(tempfile.gettempdir(), file_name)
with open(state_file, 'w') as f:
f.write(str(lcg_current))

View File

@ -1,43 +1,47 @@
#!/usr/bin/env python
# PyLCG - Linear Congruential Generator for IP Sharding - Developed by acidvegas ib Python (https://github.com/acidvegas/pylcg)
# setup.py
from setuptools import setup, find_packages from setuptools import setup, find_packages
with open("README.md", "r", encoding="utf-8") as fh: with open('README.md', 'r', encoding='utf-8') as fh:
long_description = fh.read() long_description = fh.read()
setup( setup(
name="pylcg", name='pylcg',
version="1.0.0", version='1.0.3',
author="acidvegas", author='acidvegas',
author_email="acid.vegas@acid.vegas", author_email='acid.vegas@acid.vegas',
description="Linear Congruential Generator for IP Sharding", description='Linear Congruential Generator for IP Sharding',
long_description=long_description, long_description=long_description,
long_description_content_type="text/markdown", long_description_content_type='text/markdown',
url="https://github.com/acidvegas/pylcg", url='https://github.com/acidvegas/pylcg',
project_urls={ project_urls={
"Bug Tracker": "https://github.com/acidvegas/pylcg/issues", 'Bug Tracker': 'https://github.com/acidvegas/pylcg/issues',
"Documentation": "https://github.com/acidvegas/pylcg#readme", 'Documentation': 'https://github.com/acidvegas/pylcg#readme',
"Source Code": "https://github.com/acidvegas/pylcg", 'Source Code': 'https://github.com/acidvegas/pylcg',
}, },
classifiers=[ classifiers=[
"Development Status :: 5 - Production/Stable", 'Development Status :: 5 - Production/Stable',
"Intended Audience :: Developers", 'Intended Audience :: Developers',
"License :: OSI Approved :: ISC License (ISCL)", 'License :: OSI Approved :: ISC License (ISCL)',
"Operating System :: OS Independent", 'Operating System :: OS Independent',
"Programming Language :: Python :: 3", 'Programming Language :: Python :: 3',
"Programming Language :: Python :: 3.6", 'Programming Language :: Python :: 3.6',
"Programming Language :: Python :: 3.7", 'Programming Language :: Python :: 3.7',
"Programming Language :: Python :: 3.8", 'Programming Language :: Python :: 3.8',
"Programming Language :: Python :: 3.9", 'Programming Language :: Python :: 3.9',
"Programming Language :: Python :: 3.10", 'Programming Language :: Python :: 3.10',
"Programming Language :: Python :: 3.11", 'Programming Language :: Python :: 3.11',
"Topic :: Internet", 'Topic :: Internet',
"Topic :: Security", 'Topic :: Security',
"Topic :: Software Development :: Libraries :: Python Modules", 'Topic :: Software Development :: Libraries :: Python Modules',
], ],
packages=find_packages(), packages=find_packages(),
python_requires=">=3.6", python_requires='>=3.6',
entry_points={ entry_points={
'console_scripts': [ 'console_scripts': [
'pylcg=pylcg.cli:main', 'pylcg=pylcg.cli:main',
], ],
}, },
) )

View File

@ -1,135 +1,150 @@
#!/usr/bin/env python3 #!/usr/bin/env python
import unittest # PyLCG - Linear Congruential Generator for IP Sharding - Developed by acidvegas ib Python (https://github.com/acidvegas/pylcg)
# unit_test.py
import ipaddress import ipaddress
import time import time
import unittest
from pylcg import IPRange, ip_stream, LCG from pylcg import IPRange, ip_stream, LCG
class Colors: class Colors:
BLUE = '\033[94m' BLUE = '\033[94m'
GREEN = '\033[92m' GREEN = '\033[92m'
YELLOW = '\033[93m' YELLOW = '\033[93m'
CYAN = '\033[96m' CYAN = '\033[96m'
RED = '\033[91m' RED = '\033[91m'
ENDC = '\033[0m' ENDC = '\033[0m'
def print_header(message: str) -> None: def print_header(message: str) -> None:
print(f'\n\n{Colors.BLUE}{"="*80}') print(f'\n\n{Colors.BLUE}{"="*80}')
print(f'TEST: {message}') print(f'TEST: {message}')
print(f'{"="*80}{Colors.ENDC}\n') print(f'{"="*80}{Colors.ENDC}\n')
def print_success(message: str) -> None: def print_success(message: str) -> None:
print(f'{Colors.GREEN}{message}{Colors.ENDC}') print(f'{Colors.GREEN}{message}{Colors.ENDC}')
def print_info(message: str) -> None: def print_info(message: str) -> None:
print(f"{Colors.CYAN} {message}{Colors.ENDC}") print(f"{Colors.CYAN} {message}{Colors.ENDC}")
def print_warning(message: str) -> None: def print_warning(message: str) -> None:
print(f"{Colors.YELLOW}! {message}{Colors.ENDC}") print(f"{Colors.YELLOW}! {message}{Colors.ENDC}")
class TestIPSharder(unittest.TestCase): class TestIPSharder(unittest.TestCase):
@classmethod @classmethod
def setUpClass(cls): def setUpClass(cls):
print_header('Setting up test environment') print_header('Setting up test environment')
cls.test_cidr = '192.0.0.0/16' # 65,536 IPs cls.test_cidr = '192.0.0.0/16' # 65,536 IPs
cls.test_seed = 12345 cls.test_seed = 12345
cls.total_shards = 4 cls.total_shards = 4
# Calculate expected IPs # Calculate expected IPs
network = ipaddress.ip_network(cls.test_cidr) network = ipaddress.ip_network(cls.test_cidr)
cls.all_ips = {str(ip) for ip in network} cls.all_ips = {str(ip) for ip in network}
print_success(f"Initialized test environment with {len(cls.all_ips):,} IPs") print_success(f"Initialized test environment with {len(cls.all_ips):,} IPs")
def test_ip_range_initialization(self):
print_header('Testing IPRange initialization')
start_time = time.perf_counter()
ip_range = IPRange(self.test_cidr) def test_ip_range_initialization(self):
self.assertEqual(ip_range.total, 65536) print_header('Testing IPRange initialization')
start_time = time.perf_counter()
first_ip = ip_range.get_ip_at_index(0) ip_range = IPRange(self.test_cidr)
last_ip = ip_range.get_ip_at_index(ip_range.total - 1) self.assertEqual(ip_range.total, 65536)
elapsed = time.perf_counter() - start_time first_ip = ip_range.get_ip_at_index(0)
print_success(f'IP range initialization completed in {elapsed:.6f}s') last_ip = ip_range.get_ip_at_index(ip_range.total - 1)
print_info(f'IP range spans from {first_ip} to {last_ip}')
print_info(f'Total IPs in range: {ip_range.total:,}')
def test_lcg_sequence(self): elapsed = time.perf_counter() - start_time
print_header('Testing LCG sequence generation') print_success(f'IP range initialization completed in {elapsed:.6f}s')
print_info(f'IP range spans from {first_ip} to {last_ip}')
print_info(f'Total IPs in range: {ip_range.total:,}')
# Test sequence generation speed
lcg = LCG(seed=self.test_seed)
iterations = 1_000_000
start_time = time.perf_counter() def test_lcg_sequence(self):
for _ in range(iterations): print_header('Testing LCG sequence generation')
lcg.next()
elapsed = time.perf_counter() - start_time
print_success(f'Generated {iterations:,} random numbers in {elapsed:.6f}s') # Test sequence generation speed
print_info(f'Average time per number: {(elapsed/iterations)*1000000:.2f} microseconds') lcg = LCG(seed=self.test_seed)
iterations = 1_000_000
# Test deterministic behavior start_time = time.perf_counter()
lcg1 = LCG(seed=self.test_seed) for _ in range(iterations):
lcg2 = LCG(seed=self.test_seed) lcg.next()
elapsed = time.perf_counter() - start_time
start_time = time.perf_counter() print_success(f'Generated {iterations:,} random numbers in {elapsed:.6f}s')
for _ in range(1000): print_info(f'Average time per number: {(elapsed/iterations)*1000000:.2f} microseconds')
self.assertEqual(lcg1.next(), lcg2.next())
elapsed = time.perf_counter() - start_time
print_success(f'Verified LCG determinism in {elapsed:.6f}s') # Test deterministic behavior
lcg1 = LCG(seed=self.test_seed)
lcg2 = LCG(seed=self.test_seed)
def test_shard_distribution(self): start_time = time.perf_counter()
print_header('Testing shard distribution and randomness') for _ in range(1000):
self.assertEqual(lcg1.next(), lcg2.next())
elapsed = time.perf_counter() - start_time
# Test distribution across shards print_success(f'Verified LCG determinism in {elapsed:.6f}s')
sample_size = 65_536 # Full size for /16
shard_counts = {i: 0 for i in range(1, self.total_shards + 1)} # 1-based sharding
unique_ips = set()
duplicate_count = 0
start_time = time.perf_counter()
# Collect IPs from each shard def test_shard_distribution(self):
for shard in range(1, self.total_shards + 1): # 1-based sharding print_header('Testing shard distribution and randomness')
ip_gen = ip_stream(self.test_cidr, shard, self.total_shards, self.test_seed)
shard_unique = set()
# Get all IPs from this shard # Test distribution across shards
for ip in ip_gen: sample_size = 65_536 # Full size for /16
if ip in unique_ips: shard_counts = {i: 0 for i in range(1, self.total_shards + 1)} # 1-based sharding
duplicate_count += 1 unique_ips = set()
else: duplicate_count = 0
unique_ips.add(ip)
shard_unique.add(ip)
shard_counts[shard] = len(shard_unique) start_time = time.perf_counter()
elapsed = time.perf_counter() - start_time # Collect IPs from each shard
for shard in range(1, self.total_shards + 1): # 1-based sharding
ip_gen = ip_stream(self.test_cidr, shard, self.total_shards, self.test_seed)
shard_unique = set()
# Print distribution statistics # Get all IPs from this shard
print_success(f'Generated {len(unique_ips):,} IPs in {elapsed:.6f}s') for ip in ip_gen:
print_info(f'Average time per IP: {(elapsed/len(unique_ips))*1000000:.2f} microseconds') if ip in unique_ips:
print_info(f'Unique IPs generated: {len(unique_ips):,}') duplicate_count += 1
else:
unique_ips.add(ip)
shard_unique.add(ip)
if duplicate_count > 0: shard_counts[shard] = len(shard_unique)
print_warning(f'Duplicates found: {duplicate_count:,} ({(duplicate_count/len(unique_ips))*100:.2f}%)')
expected_per_shard = sample_size // self.total_shards elapsed = time.perf_counter() - start_time
for shard, count in shard_counts.items():
deviation = abs(count - expected_per_shard) / expected_per_shard * 100 # Print distribution statistics
print_info(f'Shard {shard}: {count:,} unique IPs ({deviation:.2f}% deviation from expected)') print_success(f'Generated {len(unique_ips):,} IPs in {elapsed:.6f}s')
print_info(f'Average time per IP: {(elapsed/len(unique_ips))*1000000:.2f} microseconds')
print_info(f'Unique IPs generated: {len(unique_ips):,}')
if duplicate_count > 0:
print_warning(f'Duplicates found: {duplicate_count:,} ({(duplicate_count/len(unique_ips))*100:.2f}%)')
expected_per_shard = sample_size // self.total_shards
for shard, count in shard_counts.items():
deviation = abs(count - expected_per_shard) / expected_per_shard * 100
print_info(f'Shard {shard}: {count:,} unique IPs ({deviation:.2f}% deviation from expected)')
# Test randomness by checking sequential patterns
ips_list = sorted([int(ipaddress.ip_address(ip)) for ip in list(unique_ips)[:1000]])
sequential_count = sum(1 for i in range(len(ips_list)-1) if ips_list[i] + 1 == ips_list[i+1])
sequential_percentage = (sequential_count / (len(ips_list)-1)) * 100
print_info(f'Sequential IP pairs in first 1000: {sequential_percentage:.2f}% (lower is more random)')
# Test randomness by checking sequential patterns
ips_list = sorted([int(ipaddress.ip_address(ip)) for ip in list(unique_ips)[:1000]])
sequential_count = sum(1 for i in range(len(ips_list)-1) if ips_list[i] + 1 == ips_list[i+1])
sequential_percentage = (sequential_count / (len(ips_list)-1)) * 100
print_info(f'Sequential IP pairs in first 1000: {sequential_percentage:.2f}% (lower is more random)')
if __name__ == '__main__': if __name__ == '__main__':
print(f"\n{Colors.CYAN}{'='*80}") print(f"\n{Colors.CYAN}{'='*80}")
print(f"Starting IP Sharder Tests - Testing with 65,536 IPs (/16 network)") print(f"Starting IP Sharder Tests - Testing with 65,536 IPs (/16 network)")
print(f"{'='*80}{Colors.ENDC}\n") print(f"{'='*80}{Colors.ENDC}\n")
unittest.main(verbosity=2) unittest.main(verbosity=2)