Masscan exclude.conf output added, v4/v6 only mode added. Added searching features, improved README, updater improved, etc

2023-11-08 13:29:19 -05:00 · 2023-11-08 13:29:19 -05:00 · 7df3f15bd1
parent ba1d418b25
commit 7df3f15bd1
9 changed files with 19992 additions and 4617 deletions
--- a/.screens/preview.png
+++ b/.screens/preview.png
--- a/README.md
+++ b/README.md
@ -1,31 +1,31 @@
 # avoidr
-> masscan with exclusive exclusions
+
+Avoidr is a Python-based utility designed to enhance the functionality of scanning tools like masscan by providing a means to generate exclusive exclusions of IP addresses. It operates by searching through an ASN *(Autonomous System Number)* database for user-defined strings, identifying all IP ranges associated with the ASNs that match the search criteria.
+
+This targeted approach is particularly useful for users seeking to avoid scanning IP ranges belonging to government entities or other sensitive domains. The project streamlines the process of filtering out specific IP ranges, thereby facilitating safer and more responsible scanning activities.
+
+## Usage
+
+To use Avoidr, simply run the script with the desired arguments:
+| Argument          | Description                                          |
+| ----------------- | ---------------------------------------------------- |
+| `-4`, `--ipv4`    | Process IPv4 addresses only                          |
+| `-6`, `--ipv6`    | Process IPv6 addresses only                          |
+| `-x`, `--exclude` | Create exclusions for masscan instead of JSON output |
+| `-s`, `--search`  | Comma-separated strings to search *(no output file)* |
+| `-u`, `--update`  | Update the ASN database                              |
+
+The script can generate a JSON file with the results or a .conf file with the exclusions ready to be used by masscan.
+
+## Custom Queries
+The `custom.txt` file serves as a configuration input for Avoidr, containing a list of keywords corresponding to organizations whose IP ranges users wish to exclude from network scanning activities. When Avoidr processes this file, it searches the ASN database for these keywords, retrieves the related IP ranges, and generates a list of exclusions to prevent scanning tools like masscan from interacting with potentially sensitive or restricted network spaces.
+
+The predefined list in the `custom.txt` file will yeild roughly **358,402,432** IPv4 addresses, which is almost **10%** of the total IPv4 address space.
+
+## Preview

 ![](.screens/preview.png)

-## Information
-This is still a work in progress.
-
-This is just a little side project I am working on that will search keywords in a database of **Autonomous System Numbers** *(ASN)*. The ASN is then turned into a list of its respective IP ranges that fall under it.
-
-The ranges are all stored in a JSON file for easy parsing. Depending on what you are scanning for, this list can be altered to better suit your needs.
-
-As it stands, there are *4,294,967,296* IPv4 addresses. After excluding reserved, private, & governement ranges, you can drop that number drastically, thus speeding up your scan times.
-
-```
-Total IPv4 Addresses   : 4,294,967,296
-Total IPv4 After Clean : 3,343,567,221
-Total IPv6 Addresses   : 340,282,366,920,938,463,463,374,607,431,768,211,456
-Total IPv6 After Clean : 336,289,486,288,049,758,211,573,978,091,720,015,870
-```
-
-## Todo
- Do we need parsers for Office/Google from their provided JSON or do all those ranges fall under a single ASN?
- distributed masscan using the masscan python library
- masscan exclude.conf output format *(with comments describing the ranges)*
- possibly find a database that contains all the prefixes behind an ASN *(bgpview heavily throttles and can only handle 1 ASN at a time)* *(for now a bad.json is generated to list empty ASN's)*
- Seperate queries by sectors *(Government, social media, financial institutons, schools, etc)*
-
 ___

 ###### Mirrors
--- a/avoidr.py
+++ b/avoidr.py
@ -0,0 +1,190 @@
+#/usr/bin/env python
+# avoidr (masscan with exclusive exclusions) - developed by acidvegas in python (https://git.acid.vegas/avoidr)
+
+import hashlib
+import ipaddress
+import json
+import logging
+import os
+import sys
+import time
+import urllib.request
+from zipfile import ZipFile
+
+# Globals
+grand_total = {'4': 0, '6': 0}
+results     = dict()
+
+# Setup logger
+logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
+
+
+def calculate_hash(path):
+	'''
+	Calculate the SHA1 hash of a file.
+	
+	:param path: The path to the file to hash.
+	'''
+	hash_sha1 = hashlib.sha1()
+	with open(path, 'rb') as f:
+		for chunk in iter(lambda: f.read(4096), b''):
+			hash_sha1.update(chunk)
+	return hash_sha1.hexdigest()
+
+
+def download_file(url: str, dest_filename: str, chunk_size: int = 1024*1024):
+    '''
+    Download a file from a given URL in chunks and save to a destination filename.
+
+    :param url: The URL of the file to download
+    :param dest_filename: The destination filename to save the downloaded file
+    :param chunk_size: Size of chunks to download. Default is set to 1MB.
+    '''
+    with urllib.request.urlopen(url) as response:
+        total_size = int(response.getheader('Content-Length').strip())
+        downloaded_size = 0
+        with open(dest_filename, 'wb') as out_file:
+            while True:
+                start_time = time.time()
+                chunk = response.read(chunk_size)
+                if not chunk:
+                    break
+                downloaded_size += len(chunk)
+                out_file.write(chunk)
+                end_time = time.time()
+                speed = len(chunk) / (end_time - start_time)
+                progress = (downloaded_size / total_size) * 100
+                sys.stdout.write(f'\rDownloaded {downloaded_size} of {total_size} bytes ({progress:.2f}%) at {speed/1024:.2f} KB/s\r')
+                sys.stdout.flush()
+            print()
+
+
+def get_url(url) -> str:
+	'''
+	Get the contents of a URL.
+	
+	:param url: The URL to get the contents of.
+	'''
+	data = {'Accept': 'application/vnd.github.v3+json', 'User-Agent': 'Avoidr/1.0 (https://git.acid.vegas/avoidr)'}
+	req = urllib.request.Request(url, headers=data)
+	return urllib.request.urlopen(req, timeout=10).read().decode()
+
+
+def update_database():
+	'''Update the ASN database.'''
+
+	logging.info('Checking for database updates...')
+
+	DB = 'databases/fullASN.json.zip'
+	update = False
+	os.makedirs('databases', exist_ok=True)
+
+	if not os.path.exists(DB):
+		update = True
+	else:
+		old_hash = calculate_hash(DB)
+		new_hash = json.loads(get_url('https://api.github.com/repos/ipapi-is/ipapi/contents/'+DB))['sha']
+		if old_hash != new_hash:
+			update = True
+
+	if update:
+		logging.info('Updating database...')
+		for OLD_DB in (DB, DB[:-4]):
+			if os.path.exists(OLD_DB):
+				os.remove(OLD_DB)
+		download_file('https://github.com/ipapi-is/ipapi/raw/main/'+DB, DB)
+		with ZipFile(DB) as zObject:
+			zObject.extract(DB[10:-4], 'databases')
+	else:
+		logging.info('Database is up-to-date!')
+
+
+
+def process_asn(data: dict):
+	'''
+	Proccess an ASN and add it to the results.
+	
+	:param data: The ASN data to process.
+	'''
+
+	title = data['descr'] if 'org' not in data else data['descr'] + ' / ' + data['org']
+	results[data['asn']] = {'name': title, 'ranges': dict()}
+
+	if 'prefixes' in data and not args.ipv6:
+		results[data['asn']]['ranges']['4'] = data['prefixes']
+		total = total_ips(data['prefixes'])
+		grand_total['4'] += total
+		logging.info('Found \033[93mAS{0}\033[0m \033[1;30m({1})\033[0m containing \033[32m{2:,}\033[0m IPv4 ranges with \033[36m{3:,}\033[0m total IP addresses'.format(data['asn'], title, len(data['prefixes']), total))
+
+	if 'prefixesIPv6' in data and not args.ipv4:
+		results[data['asn']]['ranges']['6'] = data['prefixesIPv6']
+		total = total_ips(data['prefixesIPv6'])
+		grand_total['6'] += total
+		logging.info('Found \033[93mAS{0}\033[0m \033[1;30m({1})\033[0m containing \033[32m{2:,}\033[0m IPv6 ranges with \033[36m{3:,}\033[0m total IP addresses'.format(data['asn'], title, len(data['prefixesIPv6']), total))
+
+
+def total_ips(ranges: list) -> int:
+	'''
+	Calculate the total number of IP addresses in a list of CIDR ranges.
+	
+	:param ranges: The list of CIDR ranges to calculate the total number of IP addresses for.
+	'''
+	return sum(ipaddress.ip_network(cidr).num_addresses for cidr in ranges)
+
+
+
+# Main
+if __name__ == '__main__':
+	import argparse
+
+	parser = argparse.ArgumentParser(description='masscan with exclusive exclusions')
+	parser.add_argument('-4', '--ipv4', help='process IPv4 addresses only', action='store_true')
+	parser.add_argument('-6', '--ipv6', help='process IPv6 addresses only', action='store_true')
+	parser.add_argument('-x', '--exclude', help='create exclusions for masscan instead of a json output', action='store_true')
+	parser.add_argument('-s', '--search', help='comma seperated strings to search (no output file)', type=str)
+	parser.add_argument('-u', '--update', help='update the ASN database', action='store_true')
+
+	args = parser.parse_args()
+
+	if args.update or not os.path.exists('databases/fullASN.json'):
+		update_database()
+
+	asn_data = json.loads(open('databases/fullASN.json').read())
+
+	if args.search:
+		queries = args.search.split(',')
+	else:
+		queries = [line.rstrip() for line in open('custom.txt').readlines()]
+
+	logging.debug(f'Searching {len(queries):,} queries against {len(asn_data):,} ASNs...')
+
+	for asn in asn_data:
+		for field in [x for x in asn_data[asn] if x in ('descr','org')]:
+			if [x for x in queries if x.lower() in asn_data[asn][field].lower()]:
+				if asn_data[asn]['asn'] not in results:
+					process_asn(asn_data[asn])
+					break
+
+	if not args.search:
+		os.makedirs('output', exist_ok=True)
+
+		if args.exclude:
+			with open('output/exclude.conf', 'w') as fp:
+				for item in results:
+					fp.write(f'# AS{item} - {results[item]["name"]}\n')
+					for version in results[item]['ranges']:
+						for _range in results[item]['ranges'][version]:
+							fp.write(_range+'\n')
+					fp.write('\n')
+		else:
+			with open('output/out.json', 'w') as fp:
+				json.dump(results, fp)
+	else:
+		logging.info('Add these to your custom.txt file to create output files...')
+
+	total_v4 = ipaddress.ip_network('0.0.0.0/0').num_addresses
+	total_v6 = ipaddress.ip_network('::/0').num_addresses
+	print('Total IPv4 Addresses   : {0:,}'.format(total_v4))
+	print('Total IPv4 After Clean : {0:,}'.format(total_v4-grand_total['4']))
+	print('Total IPv6 Addresses   : {0:,}'.format(total_v6))
+	print('Total IPv6 After Clean : {0:,}'.format(total_v6-grand_total['6']))
--- a/avoidr/avoidr.py
+++ b/avoidr/avoidr.py
@ -1,96 +0,0 @@
-#/usr/bin/env python
-# avoidr (masscan with exclusive exclusions) - developed by acidvegas in python (https://git.acid.vegas/avoidr)
-
-import hashlib
-import ipaddress
-import json
-import os
-import urllib.request
-from zipfile import ZipFile
-
-# Globals
-grand_total = {'4': 0, '6': 0}
-results     = dict()
-
-def calculate_hash(path):
-	''' Calculate the SHA1 hash of a file. '''
-	hash_sha1 = hashlib.sha1()
-	with open(path, 'rb') as f:
-		for chunk in iter(lambda: f.read(4096), b''):
-			hash_sha1.update(chunk)
-	return hash_sha1.hexdigest()
-
-def get_url(url, git=False) -> str:
-	''' Get the contents of a URL. '''
-	data = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}
-	if git:
-		data['Accept'] = 'application/vnd.github.v3+json'
-	req = urllib.request.Request(url, headers=data)
-	return urllib.request.urlopen(req, timeout=10).read().decode()
-
-def update_database():
-	''' Update the ASN database.  '''
-	DB = 'databases/fullASN.json.zip'
-	try:
-		os.mkdir('databases')
-	except FileExistsError:
-		pass
-	if os.path.exists(DB):
-		old_hash = calculate_hash(DB)
-		new_hash = json.loads(get_url('https://api.github.com/repos/ipapi-is/ipapi/contents/'+DB))['sha']
-		if old_hash != new_hash:
-			print('[~] New database version available! Downloading...')
-			os.remove(DB)
-			if os.path.exists(DB[:-4]):
-				os.remove(DB[:-4])
-			urllib.request.urlretrieve('https://github.com/ipapi-is/ipapi/raw/main/'+DB, DB)
-			with ZipFile(DB) as zObject:
-				zObject.extract(DB[10:-4], 'databases')
-	else:
-		print('[~] Downloading missing database...')
-		urllib.request.urlretrieve('https://github.com/ipapi-is/ipapi/raw/main/'+DB, DB)
-		if os.path.exists(DB[:-4]):
-			os.remove(DB[:-4])
-		with ZipFile(DB) as zObject:
-			zObject.extract(DB[10:-4], 'databases')
-
-def process_asn(data):
-	''' Process an ASN. '''
-	if data['asn'] not in results:
-		title = data['descr'] if 'org' not in data else data['descr'] + ' / ' + data['org']
-		results[data['asn']] = {'name': title, 'ranges': dict()}
-		if 'prefixes' in data:
-			results[data['asn']]['ranges']['4'] = data['prefixes']
-			total = total_ips(data['prefixes'])
-			grand_total['4'] += total
-			print('Found \033[93mAS{0}\033[0m \033[1;30m({1})\033[0m containing \033[32m{2:,}\033[0m IPv4 ranges with \033[36m{3:,}\033[0m total IP addresses'.format(data['asn'], title, len(data['prefixes']), total))
-		if 'prefixesIPv6' in data:
-			results[data['asn']]['ranges']['6'] = data['prefixesIPv6']
-			total = total_ips(data['prefixesIPv6'])
-			grand_total['6'] += total
-			print('Found \033[93mAS{0}\033[0m \033[1;30m({1})\033[0m containing \033[32m{2:,}\033[0m IPv6 ranges with \033[36m{3:,}\033[0m total IP addresses'.format(data['asn'], title, len(data['prefixesIPv6']), total))
-
-def total_ips(ranges, total=0):
-	for _range in ranges:
-		total += ipaddress.ip_network(_range).num_addresses
-	return total
-
-# Main
-print('[~] Checking for database updates...')
-update_database()
-data    = json.loads(open('databases/fullASN.json').read())
-queries = [item.rstrip() for item in open('custom.txt').readlines()]
-print('[~] Searching {len(queries):,} queries against {len(data):,} ASNs...')
-for item in data:
-	for field in [x for x in data[item] if x in ('descr','org')]:
-		if [x for x in queries if x.lower() in data[item][field].lower()]:
-			process_asn(data[item])
-			break
-with open('out.json', 'w') as fp:
-	json.dump(results, fp)
-total_v4 = ipaddress.ip_network('0.0.0.0/0').num_addresses
-total_v6 = ipaddress.ip_network('::/0').num_addresses
-print('Total IPv4 Addresses   : {0:,}'.format(total_v4))
-print('Total IPv4 After Clean : {0:,}'.format(total_v4-grand_total['4']))
-print('Total IPv6 Addresses   : {0:,}'.format(total_v6))
-print('Total IPv6 After Clean : {0:,}'.format(total_v6-grand_total['6']))
--- a/avoidr/custom.txt
+++ b/avoidr/custom.txt
@ -2,7 +2,6 @@
 Air Force Systems Command
 Army & Navy Building
 Autonomous nonprofit organisation Computer Incident Response Center
-Bank of America
 Central Intelligence Agency
 Defense Advanced Research Projects Agency
 Department of Homeland Security
@ -13,14 +12,12 @@ Dept. of Information Technology & Cyber Security
 DoD Network Information Center
 Dod Joint Spectrum Center
 FBI Criminal Justice Information Systems
-Facebook Inc
 Federal Aviation Administration
 Federal Aviation Agency
 Federal Deposit Insurance Corporation
 Federal Emergency Management Agency
 Federal Energy Regulatory Commission
 Federal Reserve Board
-GitHub, Inc
 Government Telecommunications and Informatics Services
 ICANN
 Institute of Nuclear Power Operations
--- a/avoidr/databases/fullASN.json
+++ b/avoidr/databases/fullASN.json
--- a/avoidr/databases/fullASN.json.zip
+++ b/avoidr/databases/fullASN.json.zip
--- a/output/exclude.conf
+++ b/output/exclude.conf
--- a/output/out.json
+++ b/output/out.json