Project overview

There are over 1.6 million ham licenses in the FCC ULS database. How many of those hams have websites based on their callsigns? To get a comprehensive list, we can first download the FCC ULS ham database. Then we can extract the callsigns and do an nslookup for each one. The only potential issue with this idea is the bandwidth. If we want to check for a .com, .net, and a .org for each domain, that is around 5 million DNS queries.

Luckily, DNS queries don't take up too much bandwidth. But because of my shared, low resource hosting, I don't want to get flagged for over-usage. I will run this on a semi-random intermittent basis, for 5-8 hours a day, with a 60 second break every 10,000 domains. And I will manually trim the callsign (input) file of already queried callsign domains to give myself something to do.

Downloading & extracting ham callsigns

  1. Go to the FCC ULS Weekly Databases section.

  2. Under Amateur Radio Service, click Licenses.

  3. Save the file to your computer.

  4. Unzip the file.

  5. Discard everything except EN.dat.

  6. Extract all of the callsigns to a new file, callsigns.txt:

    awk -F '|' '{print $5}' EN.dat > callsigns.txt

  7. Remove all of the duplicate callsigns from callsigns.txt and create a new callsigns-running.txt file:

    sort callsigns.txt | uniq > callsigns-running.txt

Query for domains with matching callsigns

  1. Create a new bash script (domain.sh):

    
    #!/bin/bash
    
    # File containing callsigns, one per line
    callsign_file="callsigns-running.txt"
    
    # Output file for domains that resolve
    resolved_domains_file="resolved_domains.txt"
    
    counter=0
    
    echo "Started script: $(TZ='America/Denver' date '+%a %b %d %I:%M %p') MDT 2025" |  tee -a domain.log
    
    # Loop through each callsign in the file
    while read -r callsign; do
    
        ((counter++))
    
        # Sleep for 60 seconds every 10,000 domains
        # This way we can cancel the program or give the network a pause
        #
        if [ $counter -gt 10000 ]; then
            echo "Pausing script: $(TZ='America/Denver' date '+%a %b %d %I:%M %p') MDT 2025" |  tee -a domain.log
            echo "" |  tee -a domain.log
            echo "Sleeping for 60 seconds..." |  tee -a domain.log
            echo "Next callsign to check is $callsign" |  tee -a domain.log
            sleep "60"
            counter=0
            echo "" |  tee -a domain.log
            echo "Continuing script: $(TZ='America/Denver' date '+%a %b %d %I:%M %p') MDT 2025" |  tee -a domain.log
        fi
    
        # Generate the possible domain names
        domains=("${callsign}.com" "${callsign}.org" "${callsign}.net")
    
        # Check each domain by first resolving it using nslookup
        for domain in "${domains[@]}"; do
            # Attempt DNS resolution
            if ( nslookup "$domain" &>/dev/null ); then
                # If domain resolves, add it to the output file
                echo "$domain" >> "$resolved_domains_file"
                break  # Stop after the first successful resolution for this callsign
            fi
        done
    done < "$callsign_file"
    
    echo "Resolved domains have been saved to $resolved_domains_file" |  tee -a domain.log
    
    

  2. Create a new bash script (trim-callsigns.sh):

    #!/bin/bash
    #
    # Given a file callsigns-running.txt like this:
    #
    # ABC
    # DEF
    # HIG
    # XYZ
    # XXX
    # ZZZZ
    #
    # After running the command: awk '/XYZ/ {found=1} found' callsigns-running.txt > new_callsign.txt
    #
    # The content of new_callsign.txt will be:
    #
    # XYZ
    # XXX
    # ZZZZ
    #
    
    # Check if an argument (CALLSIGN) is provided
    if [ -z "$1" ]; then
      echo "Usage: $0 "
      exit 1
    fi
    
    # Run the awk command with the argument passed as CALLSIGN and overwrite callsigns-running.txt
    # And delete all callsigns up to our CALLSIGN argument
    #
    if grep -q "$1" callsigns-running.txt; then
            awk -v pattern="$1" '$0 ~ pattern {found=1} found' callsigns-running.txt > temp_callsigns.txt && mv temp_callsigns.txt callsigns-running.txt
            echo "callsigns-running.txt was trimmed to $1"  |  tee -a domain.log
    else
            echo "No matching callsign found."
    fi
    
    

    Tip: This script makes sure the callsigns-running.txt file only contains callsigns that still need to be queried.

  3. Create a new bash script (status.sh):

    #!/bin/bash
    #
    # Run as needed to see the status of our queries
    #
    callsigns_completed=$(($(wc -l < callsigns.txt) - $(wc -l < callsigns-running.txt)))
    callsigns_total=$(wc -l < callsigns.txt)
    callsigns_left=$(( callsigns_total - callsigns_completed ))
    
    echo "Callsigns completed (as of last trim): $callsigns_completed"
    echo "Callsigns left to query: $callsigns_left"
    echo "Resolved domains: " $(wc -l < resolved_domains.txt)
    
    

  4. Run the domain.sh script:

    chmod +x domain.sh; ./domain.sh

    Started script: Mon Feb 17 11:02 AM MDT 2025
    Pausing script: Mon Feb 17 11:45 AM MDT 2025
    
    Sleeping for 60 seconds...
    Next callsign to check is KE4DMU
    
    Continuing script: Mon Feb 17 11:46 AM MDT 2025
    Pausing script: Mon Feb 17 12:22 PM MDT 2025
    
    Sleeping for 60 seconds...
    Next callsign to check is KE4TTZ
    
    

    Tip: When you see "Next callsign to check is..." it means that you will have a 60 second window to press control-C to exit the script.

  5. After you exit the script cleanly, trim the callsigns file of already queried callsigns using the last callsign as your input argument:

    chmod +x trim-callsigns.sh; ./trim-callsigns.sh KE4TTZ

    Tips:

    • You can check domain.log for the running time and last callsign info. But it will only have the real last callsign queried if you exit the script cleanly! By trimming the callsigns-running.txt file, we are deleting the callsigns that we already did nslookups for. That way when we run the script tomorrow, they won't be queried a second time!

    • Alternately, you can just kill the process and then run trim-callsigns.sh with the last queried callsign in resolved_domains.txt, then manually remove that callsign from callsigns-running.txt.

  6. After you've trimmed the callsigns-running.txt file, you're free to run domain.sh again and you can repeat this process until all domains have been queried.

Filtering the resolved domains

You should now have a resolved_domains.txt file with a list of real, working, domains. Do not load any of these in your browser! Quite a lot of these appear to be malicious websites. So we will need a way to filter out the legitimate ham radio sites versus the fake, irrelevant, or dangerous sites.

Download the index page for each domain

  1. Create a new bash script (get-domains.sh)

    
    #!/bin/bash
    
    # Output directory for storing HTML content
    output_dir="domain-data"
    
    # Create the output directory if it doesn't exist
    mkdir -p "$output_dir"
    
    # Timeout in seconds (for example, 30 seconds)
    timeout=30
    
    # Maximum file size limit (in bytes, 5MB here as an example)
    max_size=5000000
    
    # Loop through each domain in the file
    while read -r domain; do
        # Set a timeout and limit the download size
        curl -s --max-time "$timeout" --max-filesize "$max_size" "$domain" -o "$output_dir/$(echo "$domain" | sed 's/[^a-zA-Z0-9]/_/g').html"
    
        # Check if the curl command was successful
        if [ $? -eq 0 ]; then
            echo "Content downloaded for $domain and saved as $output_dir/$(echo "$domain" | sed 's/[^a-zA-Z0-9]/_/g').html"
        else
            echo "Error downloading $domain. Either it timed out or was too large."
        fi
    done < resolved_domains.txt
    
    echo "Content saved in $output_dir."
    
    

  2. Now we can perform a simple search for relevant terms:

    grep "radio" * | grep -Pv "\[type=radio\]" > potentially_relevant_sites.txt

    Tip: The grep -Pv part excludes css mismatches.

  3. Now from a safe browser, you can explore the results.