Checking the Domain Name In Email Addresses

By Chris Bartram and many others

In an earlier article, I described using Suprtool to check the syntax of email addresses and delete invalid ones.

I was toying with figuring out how to call MPE CI string functions on the Suprtool defined strings... like WORD() to extract the domain name... then call a ci command/program that did a dns lookup of the domain name. I figured something like that would be pretty slick. Turns out Mike Hornsby had the same idea:

Chris,
To take this to the next level, I like to use the MPE/IX WHOIS freeware utility from this web location:
  http://www.allegro.com/software/hp3000/allegro.html

The following script will validate an email address and look up the domain name registration using the Allegro ported version of the WHOIS program. This can be very useful for validating the URL and it also returns the company name, address, and sometimes a general phone number.

Warning: The script must run the WHOIS program under the Posix shell (sh.hpbin.sys) and the system must be connected to the Internet to route the WHOIS requests. Also, you must set up the security in the Allegro account according to the instructions on their web page.

Emailchk Command File:

PARM EMAIL_NAME = "MIKEH@BEECHGLEN.COM"
SETVAR ATPOS, POS("@","!EMAIL_NAME")
IF ATPOS = 0 THEN
   ECHO BAD EMAIL FORMAT ERROR -1
   SETVAR EMAIL_ERR,-1
   echo !EMAIL_NAME >> BADDOTS
   ESCAPE
ENDIF
SETVAR DOTPOS, POS(".","!EMAIL_NAME")
IF DOTPOS = 0 THEN
   ECHO BAD EMAIL FORMAT ERROR -2
   SETVAR EMAIL_ERR,-2
   echo !EMAIL_NAME >> BADDOTS
   ESCAPE
ENDIF
SETVAR URLPOS,ATPOS+1
SETVAR WHO_URL,RTRIM(STR("!EMAIL_NAME",!URLPOS,80))
ECHO !WHO_URL
PURGE WHOIN
BUILD WHOIN;REC=-80,,F,ASCII
ECHO /ALLEGRO/PUB/WHOIS -C !WHO_URL > WHOIN
RUN SH.HPBIN.SYS <WHOIN

As well, if the email name is found to be missing dots (.), this script appends the name to a BADDOTS error file.

Mike Hornsby (mikeh@beechglen.com)
Co-founder/Chief Technical Officer
Beechglen Development Inc. (beechglen.com)
513-922-0509
MikeH@beechglen.com

Paul Gobes of Robelle Technical Support tested the command file with WHOIS and below are the results for a successful and unsuccessful email address.

Below are the results for a successful and unsuccessful email address.

Successful WHOIS

:emailchk paulgobes@hotmail.com
hotmail.com


Whois Server Version 1.3

Domain names in the .com, .net, and .org domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.

   Domain Name: HOTMAIL.COM
   Registrar: NETWORK SOLUTIONS, INC.
   Whois Server: whois.networksolutions.com
   Referral URL: www.networksolutions.com
   Name Server: NS1.HOTMAIL.COM
   Name Server: NS3.HOTMAIL.COM
   Updated Date: 12-may-2000


>>> Last update of whois database: Mon, 18 Dec 2000 11:02:57 EST <<<

The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and
Registrars.


Found InterNIC referral to whois.networksolutions.com.


Registrant:
Hotmail Corporation (HOTMAIL-DOM)
   1065 La Avenida
   Mtn. View, CA 94043
   US

   Domain Name: HOTMAIL.COM

   Administrative Contact, Technical Contact, Billing Contact:
      Records, Custodian of  (COR58)  enforce_policy@HOTMAIL.COM
      MSN Hotmail
      1065 La Avendia
      Mtn. View , CA 94043
      (650) 693-7066 (FAX) (650) 693-7061

   Record last updated on 12-May-2000.
   Record expires on 28-Mar-2001.
   Record created on 27-Mar-1996.
   Database last updated on 18-Dec-2000 14:23:20 EST.

   Domain servers in listed order:

   NS1.HOTMAIL.COM            216.200.206.140
   NS3.HOTMAIL.COM            209.185.130.68

End Run

WHOIS Failure

:emailchk  paul@jsdlfkjslfj.com
jsdlfkjslfj.com


Whois Server Version 1.3

Domain names in the .com, .net, and .org domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.

No match for "JSDLFKJSLFJ.COM".

>>> Last update of whois database: Mon, 18 Dec 2000 11:02:57 EST <<<

The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and
Registrars.

End Run

Paul couldn't find any error codes returned by WHOIS, nor did the documentation make any reference to them. This makes it difficult to separate the failed domains from the successful domains.

Another restriction is that WHOIS defaults to a server that only finds .COM, .NET, .ORG and .EDU domains, but not national domains such as robelle.ca (Canada) and bbc.co.uk (United Kingdom). You could modify the command file to point to a different server using the '-h server' flag, if your addresses were from a different domain. A final problem is that WHOIS can fail due to heavy network traffic.

So it doesn't really make sense to use WHOIS in batch to check thousands of email addresses.


Linking Suprtool Records to MPE CI Code

Paul Gobes further went on to devise a clever way of invoking the Emailchk command file on every syntactically valid email address, producing a long listing of the WHOIS results. The email addresses with a valid format may still contain spelling errors, which will often by caught by WHOIS (i.e., HTMAIL.COM instead of HOTMAIL.COM).

Here is a job to extract all the valid email addresses from the Ecometry cust-xref dataset and convert them into a series of EMAILCHK command file calls.

   !SUPR
   BA macord,1,DOALL
   get cust-xref
   def email,xref-no[3],47
   def emtype,xref-no[1],2
   if emtype='EM','EU' and email=='?@&@?@.?@'
   extract cust-edp
   extract "emailchk ",email
   output  mydsnschk,temp
   exit
   !
   !mydsnchk

This task inserts the string "emailchk" in front of the EMAIL address in each record, making it into an invocation of the EMAILCHK command file. Then just type MYDNSCHK to invoke EMAILCHK once per entry in the EMAILS file, producing a long listing of WHOIS output that you can review manually. [Note: an improvement on this would be to add a second parameter to EMAILCHK with the customer id number, then add CUST-EDP to the command file invocation: ext ",",cust-edp; this would allow you to print out which customer has which email address in the listing.]


Trying NSLOOKUP Instead of WHOIS

Ok! Here's a command file that DOES work 100% to verify an Internet domain name. It uses the NSLOOKUP utility (public domain; the program ported from named - not the program contributed by Lars Appel though).

I call the command file veremail.xeq. You pass it a domain name (the part of the email address after the "@") and a customer number. It reports whether the domain is valid or not, using the same criteria that Internet mail systems use (looking for either MX or A type DNS records). It can be modified to output only bad addresses in a way that could be imported back into Ecometry (or whatever) so the bad addresses could be deleted, or just create a list of bad addresses that a customer service team could manually investigate.

parm domain,custno=0

setvar hpautocont TRUE

purge NSRES > $NULL
purge NSRES,temp > $NULL
purge NSCOMM > $NULL
purge NSCOMM,temp > $NULL
purge NSINX > $NULL
purge NSINX,temp > $NULL
purge NSOUTX > $NULL
purge NSOUTX,temp > $NULL
purge NSERRX > $NULL
purge NSERRX,temp > $NULL

echo set type=any                 > NSINX
echo !domain                     >> NSINX

save NSINX

echo  /BIND/PUB/bin/nslookup !NSOUTX 2!>NSERRX  >NSCOMM
save NSCOMM

xeq sh.hpbin.sys "./NSCOMM"

grep.hpbin.sys "'Non-existent ' NSERRX" > NSRES
grep.hpbin.sys "'Unrecognized command: ' NSOUTX" >> NSRES

setvar _okdomain TRUE
if finfo("NSRES","exists") then
  if finfo("NSRES","EOF")>0 then
    setvar _okdomain FALSE
  endif
endif

if !_okdomain then
  echo Domain name "!domain" for customer#!custno is ok
 else
  echo Domain name "!domain" for customer#!custno is invalid
endif

purge NSRES > $NULL
purge NSRES,temp > $NULL
purge NSCOMM > $NULL
purge NSCOMM,temp > $NULL
purge NSINX > $NULL
purge NSINX,temp > $NULL
purge NSOUTX > $NULL
purge NSOUTX,temp > $NULL
purge NSERRX > $NULL
purge NSERRX,temp > $NULL
setvar hpautocont TRUE

If you look closely, you will see that this command file also outputs the customer number with each result, making it possible to update the original database.


Scaling Up the NSLOOKUP Task

At Robelle, Paul Gobes tried the NSLOOKUP technique and came up with a timing of about 2-3 seconds per email address to do the lookup. Since Robelle has about 2000 email addresses in its customer database, this took 2 hours. However, the US Mint database has 1.2 million email addresses, so it would take 53 days to check them all.

What to do? Working with Paul Gobes at Robelle we came up with a strategy to save already checked domain names so they don't have to be checked again on the Internet. Once the first big processing was done, the regular checking would be much faster. We also adjusted the job to strip off the user name and only test the domain part of the email address, thereby eliminating many duplicates (this was done with Qedit and regular expressions - thanks to Dave Lo at Robelle for the help on this).

Here is Paul's 7-step plan to do what needs doing. Note: one crucial step requires Qedit, which the US Mint has, but not everyone else will.. so some users will need to buy Qedit ;-)

1) Get emails that need testing:

   SUPR
    BA macord,1,DOALL
    get cust-xref
    def custz9,1,9,display
    def email,xref-no[3],47
    def emtype,xref-no[1],2
    def domain,1,47
    table t1,email,file,emailok.data.sgaii
    if emtype='EM','EU' and email=='?@&@?@.?@' and &
       email >< '?@ ?@' and NOT $lookup(t1,email)
    extract custz9 = cust-edp
    extract domain = email
    extract email
    out    file1,link
    exit

2) Use Qedit to trim the username from the domain. Please note that Qedit is editing a self-describing file; it retains the user labels between the Text and Keep so that the file can still be input into Suprtool and the field attributes will be recognized. Qedit edits a single field by doing a Set Left and Set Right to set margins.

    QEDIT
    set lang text
    set incr .001
    set work labels on                   {keep as link file}
    text file1
    set left 10                          {skip the cust acct#}
    set right 56                         {skip the full email}
    change "(.*@)(.*)" (reg) "\2"  @     {use regular expressions}
    set left
    set right
    keep file2,yes
    exit

3) Use Suprtool to remove duplicates and the customer #

    SUPR
      in file2
      sort domain
      dup none keys
      ext "domcheck ", domain
      out file3
      exit

4) Now create the domcheck command file

     SUPR
     in *;out domcheck,temp;exit
     PARM domain = "robelle.com"

     setvar hpautocont TRUE
     purge NSRES > $NULL
     purge NSCOMM > $NULL
     purge NSINX > $NULL
     purge NSOUTX > $NULL
     purge NSERRX > $NULL
     echo set type=any                 > NSINX
     echo !domain                     >> NSINX

     save NSINX

     echo  /SYS/UTIL/NSLOOKUP !NSOUTX 2!>NSERRX  >NSCOMM
     save NSCOMM

     xeq sh.hpbin.sys "./NSCOMM"

     grep.hpbin.sys "'Non-existent ' NSERRX" > NSRES

     setvar _okdomain TRUE
     if finfo("NSRES","exists") then
        if finfo("NSRES","EOF")>0 then
           setvar _okdomain FALSE
        endif
     endif

     if not !_okdomain then
        echo !domain  >> badadds
     endif

     purge NSRES > $NULL
     purge NSCOMM > $NULL
     purge NSINX > $NULL
     purge NSOUTX > $NULL
     purge NSERRX > $NULL
     setvar hpautocont TRUE
     !eod

5) Invoke the domain checking command file

       !file3

6) Use Suprtool to find the customer #s for the bad addressess

     SUPR
     in file2
     table t1,domain,sorted,badadds
     if $lookup(t1,domain)
     ext  "Customer# ", custz9, "has bad domain: ", email
     out *
     xeq

7) Append the good emails to emailok.data.sgaii

     in file2
     table t2,domain,sorted,badadds
     if not $lookup(t2,domain)
     ext email
     out emailok.data.sgaii,append
     exit


As you can see, this idea is still a work in progress. Please send any further improvements to me and to support@robelle.com - thanks.

Chris.Bartram@usmint.treas.gov
January 2, 2001