A utility for ingesting various large scale reconnaissance data logs into Elasticsearch
The is a suite of tools to aid in the ingestion of recon data from various sources (httpx, masscan, zonefiles, etc) into an Elasticsearch cluster. The entire codebase is designed with asynconous processing, aswell as load balancing ingestion across all of the nodes in your cluster. Additionally, live data ingestion is supported from many of the sources supported. This means data can be directly processed and ingested into your Elasticsearch cluster instantly. The structure allows for the developement of "modules" or "plugins" if you will, to quickly create custom ingestion helpers for anything!
Note: The <input> can be a file or a directory of files, depending on the ingestion script.
Options
General arguments
Argument
Description
input_path
Path to the input file or directory
--watch
Create or watch a FIFO for real-time indexing
Elasticsearch arguments
Argument
Description
Default
--host
Elasticsearch host
http://localhost/
--port
Elasticsearch port
9200
--user
Elasticsearch username
elastic
--password
Elasticsearch password
$ES_PASSWORD
--api-key
Elasticsearch API Key for authentication
$ES_APIKEY
--self-signed
Elasticsearch connection with a self-signed certificate
Elasticsearch indexing arguments
Argument
Description
Default
--index
Elasticsearch index name
Depends on ingestor
--pipeline
Use an ingest pipeline for the index
--replicas
Number of replicas for the index
1
--shards
Number of shards for the index
1
Performance arguments
Argument
Description
Default
--chunk-max
Maximum size in MB of a chunk
100
--chunk-size
Number of records to index in a chunk
50000
--retries
Number of times to retry indexing a chunk before failing
100
--timeout
Number of seconds to wait before retrying a chunk
60
Ingestion arguments
Argument
Description
--certs
Index Certstream records
--httpx
Index HTTPX records
--masscan
Index Masscan records
--massdns
Index massdns records
--zone
Index zone DNS records
This ingestion suite will use the built in node sniffer, so by connecting to a single node, you can load balance across the entire cluster.
It is good to know how much nodes you have in the cluster to determine how to fine tune the arguments for the best performance, based on your environment.
GeoIP Pipeline
Create & add a geoip pipeline and use the following in your index mappings: