In an earlier article, I described using Suprtool to check the syntax of email addresses and delete invalid ones.
I was toying with figuring out how to call MPE CI string functions on the Suprtool defined strings... like WORD() to extract the domain name... then call a ci command/program that did a dns lookup of the domain name. I figured something like that would be pretty slick. Turns out Mike Hornsby had the same idea:
Chris,
To take this to the next level, I like to use the MPE/IX WHOIS freeware utility from this web location:http://www.allegro.com/software/hp3000/allegro.htmlThe following script will validate an email address and look up the domain name registration using the Allegro ported version of the WHOIS program. This can be very useful for validating the URL and it also returns the company name, address, and sometimes a general phone number.
Warning: The script must run the WHOIS program under the Posix shell (sh.hpbin.sys) and the system must be connected to the Internet to route the WHOIS requests. Also, you must set up the security in the Allegro account according to the instructions on their web page.
Emailchk Command File:
PARM EMAIL_NAME = "MIKEH@BEECHGLEN.COM" SETVAR ATPOS, POS("@","!EMAIL_NAME") IF ATPOS = 0 THEN ECHO BAD EMAIL FORMAT ERROR -1 SETVAR EMAIL_ERR,-1 echo !EMAIL_NAME >> BADDOTS ESCAPE ENDIF SETVAR DOTPOS, POS(".","!EMAIL_NAME") IF DOTPOS = 0 THEN ECHO BAD EMAIL FORMAT ERROR -2 SETVAR EMAIL_ERR,-2 echo !EMAIL_NAME >> BADDOTS ESCAPE ENDIF SETVAR URLPOS,ATPOS+1 SETVAR WHO_URL,RTRIM(STR("!EMAIL_NAME",!URLPOS,80)) ECHO !WHO_URL PURGE WHOIN BUILD WHOIN;REC=-80,,F,ASCII ECHO /ALLEGRO/PUB/WHOIS -C !WHO_URL > WHOIN RUN SH.HPBIN.SYS <WHOINAs well, if the email name is found to be missing dots (.), this script appends the name to a BADDOTS error file.
Mike Hornsby (mikeh@beechglen.com)
Co-founder/Chief Technical Officer
Beechglen Development Inc. (beechglen.com)
513-922-0509
MikeH@beechglen.com
Paul Gobes of Robelle Technical Support tested the command file with WHOIS and below are the results for a successful and unsuccessful email address.
Below are the results for a successful and unsuccessful email address.
:emailchk paulgobes@hotmail.com hotmail.com Whois Server Version 1.3 Domain names in the .com, .net, and .org domains can now be registered with many different competing registrars. Go to http://www.internic.net for detailed information. Domain Name: HOTMAIL.COM Registrar: NETWORK SOLUTIONS, INC. Whois Server: whois.networksolutions.com Referral URL: www.networksolutions.com Name Server: NS1.HOTMAIL.COM Name Server: NS3.HOTMAIL.COM Updated Date: 12-may-2000 >>> Last update of whois database: Mon, 18 Dec 2000 11:02:57 EST <<< The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and Registrars. Found InterNIC referral to whois.networksolutions.com. Registrant: Hotmail Corporation (HOTMAIL-DOM) 1065 La Avenida Mtn. View, CA 94043 US Domain Name: HOTMAIL.COM Administrative Contact, Technical Contact, Billing Contact: Records, Custodian of (COR58) enforce_policy@HOTMAIL.COM MSN Hotmail 1065 La Avendia Mtn. View , CA 94043 (650) 693-7066 (FAX) (650) 693-7061 Record last updated on 12-May-2000. Record expires on 28-Mar-2001. Record created on 27-Mar-1996. Database last updated on 18-Dec-2000 14:23:20 EST. Domain servers in listed order: NS1.HOTMAIL.COM 216.200.206.140 NS3.HOTMAIL.COM 209.185.130.68 End Run
:emailchk paul@jsdlfkjslfj.com jsdlfkjslfj.com Whois Server Version 1.3 Domain names in the .com, .net, and .org domains can now be registered with many different competing registrars. Go to http://www.internic.net for detailed information. No match for "JSDLFKJSLFJ.COM". >>> Last update of whois database: Mon, 18 Dec 2000 11:02:57 EST <<< The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and Registrars. End Run
Paul couldn't find any error codes returned by WHOIS, nor did the documentation make any reference to them. This makes it difficult to separate the failed domains from the successful domains.
Another restriction is that WHOIS defaults to a server that only finds .COM, .NET, .ORG and .EDU domains, but not national domains such as robelle.ca (Canada) and bbc.co.uk (United Kingdom). You could modify the command file to point to a different server using the '-h server' flag, if your addresses were from a different domain. A final problem is that WHOIS can fail due to heavy network traffic.
So it doesn't really make sense to use WHOIS in batch to check thousands of email addresses.
Here is a job to extract all the valid email addresses from the Ecometry cust-xref dataset and convert them into a series of EMAILCHK command file calls.
!SUPR BA macord,1,DOALL get cust-xref def email,xref-no[3],47 def emtype,xref-no[1],2 if emtype='EM','EU' and email=='?@&@?@.?@' extract cust-edp extract "emailchk ",email output mydsnschk,temp exit ! !mydsnchk
This task inserts the string "emailchk" in front of the EMAIL address in
each record, making it into an invocation of the EMAILCHK command file. Then
just type MYDNSCHK to invoke EMAILCHK once per entry in the EMAILS file,
producing a long listing of WHOIS output that you can review manually.
[Note: an improvement on this would be to add a second parameter to EMAILCHK
with the customer id number, then add CUST-EDP to the command file
invocation: ext ",",cust-edp
; this would allow you to print out which
customer has which email address in the listing.]
I call the command file veremail.xeq
.
You pass it a domain name (the part of
the email address after the "@") and a customer number. It reports whether
the domain is valid or not, using the same criteria that Internet mail
systems use (looking for either MX or A type DNS records). It can be
modified to output only bad addresses in a way that could be imported back
into Ecometry (or whatever) so the bad addresses could be deleted, or just
create a list of bad addresses that a customer service team could manually
investigate.
parm domain,custno=0 setvar hpautocont TRUE purge NSRES > $NULL purge NSRES,temp > $NULL purge NSCOMM > $NULL purge NSCOMM,temp > $NULL purge NSINX > $NULL purge NSINX,temp > $NULL purge NSOUTX > $NULL purge NSOUTX,temp > $NULL purge NSERRX > $NULL purge NSERRX,temp > $NULL echo set type=any > NSINX echo !domain >> NSINX save NSINX echo /BIND/PUB/bin/nslookup !NSOUTX 2!>NSERRX >NSCOMM save NSCOMM xeq sh.hpbin.sys "./NSCOMM" grep.hpbin.sys "'Non-existent ' NSERRX" > NSRES grep.hpbin.sys "'Unrecognized command: ' NSOUTX" >> NSRES setvar _okdomain TRUE if finfo("NSRES","exists") then if finfo("NSRES","EOF")>0 then setvar _okdomain FALSE endif endif if !_okdomain then echo Domain name "!domain" for customer#!custno is ok else echo Domain name "!domain" for customer#!custno is invalid endif purge NSRES > $NULL purge NSRES,temp > $NULL purge NSCOMM > $NULL purge NSCOMM,temp > $NULL purge NSINX > $NULL purge NSINX,temp > $NULL purge NSOUTX > $NULL purge NSOUTX,temp > $NULL purge NSERRX > $NULL purge NSERRX,temp > $NULL setvar hpautocont TRUE
If you look closely, you will see that this command file also outputs the customer number with each result, making it possible to update the original database.
What to do? Working with Paul Gobes at Robelle we came up with a strategy to save already checked domain names so they don't have to be checked again on the Internet. Once the first big processing was done, the regular checking would be much faster. We also adjusted the job to strip off the user name and only test the domain part of the email address, thereby eliminating many duplicates (this was done with Qedit and regular expressions - thanks to Dave Lo at Robelle for the help on this).
Here is Paul's 7-step plan to do what needs doing. Note: one crucial step requires Qedit, which the US Mint has, but not everyone else will.. so some users will need to buy Qedit ;-)
1) Get emails that need testing:
SUPR BA macord,1,DOALL get cust-xref def custz9,1,9,display def email,xref-no[3],47 def emtype,xref-no[1],2 def domain,1,47 table t1,email,file,emailok.data.sgaii if emtype='EM','EU' and email=='?@&@?@.?@' and & email >< '?@ ?@' and NOT $lookup(t1,email) extract custz9 = cust-edp extract domain = email extract email out file1,link exit
2) Use Qedit to trim the username from the domain. Please note that Qedit is editing a self-describing file; it retains the user labels between the Text and Keep so that the file can still be input into Suprtool and the field attributes will be recognized. Qedit edits a single field by doing a Set Left and Set Right to set margins.
QEDIT set lang text set incr .001 set work labels on {keep as link file} text file1 set left 10 {skip the cust acct#} set right 56 {skip the full email} change "(.*@)(.*)" (reg) "\2" @ {use regular expressions} set left set right keep file2,yes exit
3) Use Suprtool to remove duplicates and the customer #
SUPR in file2 sort domain dup none keys ext "domcheck ", domain out file3 exit
4) Now create the domcheck command file
SUPR in *;out domcheck,temp;exit PARM domain = "robelle.com" setvar hpautocont TRUE purge NSRES > $NULL purge NSCOMM > $NULL purge NSINX > $NULL purge NSOUTX > $NULL purge NSERRX > $NULL echo set type=any > NSINX echo !domain >> NSINX save NSINX echo /SYS/UTIL/NSLOOKUP !NSOUTX 2!>NSERRX >NSCOMM save NSCOMM xeq sh.hpbin.sys "./NSCOMM" grep.hpbin.sys "'Non-existent ' NSERRX" > NSRES setvar _okdomain TRUE if finfo("NSRES","exists") then if finfo("NSRES","EOF")>0 then setvar _okdomain FALSE endif endif if not !_okdomain then echo !domain >> badadds endif purge NSRES > $NULL purge NSCOMM > $NULL purge NSINX > $NULL purge NSOUTX > $NULL purge NSERRX > $NULL setvar hpautocont TRUE !eod
5) Invoke the domain checking command file
!file3
6) Use Suprtool to find the customer #s for the bad addressess
SUPR in file2 table t1,domain,sorted,badadds if $lookup(t1,domain) ext "Customer# ", custz9, "has bad domain: ", email out * xeq
7) Append the good emails to emailok.data.sgaii
in file2 table t2,domain,sorted,badadds if not $lookup(t2,domain) ext email out emailok.data.sgaii,append exit
As you can see, this idea is still a work in progress. Please send any further improvements to me and to support@robelle.com - thanks.
Chris.Bartram@usmint.treas.gov
January 2, 2001