Project overview
There are over 1.6 million ham licenses in the FCC ULS database. How many of those hams have websites based on their callsigns? To get a comprehensive list, we can first download the FCC ULS ham database. Then we can extract the callsigns and do an nslookup for each one. The only potential issue with this idea is the bandwidth. If we want to check for a .com, .net, and a .org for each domain, that is around 5 million DNS queries.
Luckily, DNS queries don't take up too much bandwidth. But because of my shared, low resource hosting, I don't want to get flagged for over-usage. I will run this on a semi-random intermittent basis, for 5-8 hours a day, with a 60 second break every 10,000 domains. And I will manually trim the callsign (input) file of already queried callsign domains to give myself something to do.
Downloading & extracting ham callsigns
Under Amateur Radio Service, click Licenses.
Save the file to your computer.
Unzip the file.
Discard everything except EN.dat.
-
Extract all of the callsigns to a new file, callsigns.txt:
awk -F '|' '{print $5}' EN.dat > callsigns.txt
-
Remove all of the duplicate callsigns from callsigns.txt and create a new callsigns-running.txt file:
sort callsigns.txt | uniq > callsigns-running.txt
Query for domains with matching callsigns
-
Create a new bash script (domain.sh):
#!/bin/bash # File containing callsigns, one per line callsign_file="callsigns-running.txt" # Output file for domains that resolve resolved_domains_file="resolved_domains.txt" counter=0 echo "Started script: $(TZ='America/Denver' date '+%a %b %d %I:%M %p') MDT 2025" | tee -a domain.log # Loop through each callsign in the file while read -r callsign; do ((counter++)) # Sleep for 60 seconds every 10,000 domains # This way we can cancel the program or give the network a pause # if [ $counter -gt 10000 ]; then echo "Pausing script: $(TZ='America/Denver' date '+%a %b %d %I:%M %p') MDT 2025" | tee -a domain.log echo "" | tee -a domain.log echo "Sleeping for 60 seconds..." | tee -a domain.log echo "Next callsign to check is $callsign" | tee -a domain.log sleep "60" counter=0 echo "" | tee -a domain.log echo "Continuing script: $(TZ='America/Denver' date '+%a %b %d %I:%M %p') MDT 2025" | tee -a domain.log fi # Generate the possible domain names domains=("${callsign}.com" "${callsign}.org" "${callsign}.net") # Check each domain by first resolving it using nslookup for domain in "${domains[@]}"; do # Attempt DNS resolution if ( nslookup "$domain" &>/dev/null ); then # If domain resolves, add it to the output file echo "$domain" >> "$resolved_domains_file" break # Stop after the first successful resolution for this callsign fi done done < "$callsign_file" echo "Resolved domains have been saved to $resolved_domains_file" | tee -a domain.log
-
Create a new bash script (trim-callsigns.sh):
#!/bin/bash # # Given a file callsigns-running.txt like this: # # ABC # DEF # HIG # XYZ # XXX # ZZZZ # # After running the command: awk '/XYZ/ {found=1} found' callsigns-running.txt > new_callsign.txt # # The content of new_callsign.txt will be: # # XYZ # XXX # ZZZZ # # Check if an argument (CALLSIGN) is provided if [ -z "$1" ]; then echo "Usage: $0
" exit 1 fi # Run the awk command with the argument passed as CALLSIGN and overwrite callsigns-running.txt # And delete all callsigns up to our CALLSIGN argument # if grep -q "$1" callsigns-running.txt; then awk -v pattern="$1" '$0 ~ pattern {found=1} found' callsigns-running.txt > temp_callsigns.txt && mv temp_callsigns.txt callsigns-running.txt echo "callsigns-running.txt was trimmed to $1" | tee -a domain.log else echo "No matching callsign found." fi Tip: This script makes sure the callsigns-running.txt file only contains callsigns that still need to be queried.
-
Create a new bash script (status.sh):
#!/bin/bash # # Run as needed to see the status of our queries # callsigns_completed=$(($(wc -l < callsigns.txt) - $(wc -l < callsigns-running.txt))) callsigns_total=$(wc -l < callsigns.txt) callsigns_left=$(( callsigns_total - callsigns_completed )) echo "Callsigns completed (as of last trim): $callsigns_completed" echo "Callsigns left to query: $callsigns_left" echo "Resolved domains: " $(wc -l < resolved_domains.txt)
Run the domain.sh script:
chmod +x domain.sh; ./domain.sh
Started script: Mon Feb 17 11:02 AM MDT 2025 Pausing script: Mon Feb 17 11:45 AM MDT 2025 Sleeping for 60 seconds... Next callsign to check is KE4DMU Continuing script: Mon Feb 17 11:46 AM MDT 2025 Pausing script: Mon Feb 17 12:22 PM MDT 2025 Sleeping for 60 seconds... Next callsign to check is KE4TTZ
Tip: When you see "Next callsign to check is..." it means that you will have a 60 second window to press control-C to exit the script.
After you exit the script cleanly, trim the callsigns file of already queried callsigns using the last callsign as your input argument:
chmod +x trim-callsigns.sh; ./trim-callsigns.sh KE4TTZ
Tips:
You can check domain.log for the running time and last callsign info. But it will only have the real last callsign queried if you exit the script cleanly! By trimming the callsigns-running.txt file, we are deleting the callsigns that we already did nslookups for. That way when we run the script tomorrow, they won't be queried a second time!
Alternately, you can just kill the process and then run trim-callsigns.sh with the last queried callsign in resolved_domains.txt, then manually remove that callsign from callsigns-running.txt.
After you've trimmed the callsigns-running.txt file, you're free to run domain.sh again and you can repeat this process until all domains have been queried.
Filtering the resolved domains
You should now have a resolved_domains.txt file with a list of real, working, domains. Do not load any of these in your browser! Quite a lot of these appear to be malicious websites. So we will need a way to filter out the legitimate ham radio sites versus the fake, irrelevant, or dangerous sites.
Download the index page for each domain
-
Create a new bash script (get-domains.sh)
#!/bin/bash # Output directory for storing HTML content output_dir="domain-data" # Create the output directory if it doesn't exist mkdir -p "$output_dir" # Timeout in seconds (for example, 30 seconds) timeout=30 # Maximum file size limit (in bytes, 5MB here as an example) max_size=5000000 # Loop through each domain in the file while read -r domain; do # Set a timeout and limit the download size curl -s --max-time "$timeout" --max-filesize "$max_size" "$domain" -o "$output_dir/$(echo "$domain" | sed 's/[^a-zA-Z0-9]/_/g').html" # Check if the curl command was successful if [ $? -eq 0 ]; then echo "Content downloaded for $domain and saved as $output_dir/$(echo "$domain" | sed 's/[^a-zA-Z0-9]/_/g').html" else echo "Error downloading $domain. Either it timed out or was too large." fi done < resolved_domains.txt echo "Content saved in $output_dir."
-
Now we can perform a simple search for relevant terms:
grep "radio" * | grep -Pv "\[type=radio\]" > potentially_relevant_sites.txt
Tip: The grep -Pv part excludes css mismatches.
Now from a safe browser, you can explore the results.