Cleaning Up Email Addresses in Ecometry

By Chris Bartram

Thought I'd pass this on; a script that scans the email addresses in CUST-XREF, looking for junk (like zeros, NONE, NA, etc) and getting rid of them. It then does a second pass that checks the remaining addresses for valid syntax (using a pattern match in Suprtool) - i.e. something @ something . something ... not perfect, but it will find most badly formed or mistyped addresses.

This script emails the list of CUST-EDP records to a contact list; easy enough to just print it or not as you want (replace the !xeq sendhtml line with a PRINT command or something). The lists are left on the system as the files 'NULLMAIL' and 'BADEMAIL' in Ascii format.

The script is written for the Ecometry application, but the techniques would apply, with revisions, to other applications as well.

      !JOB fixemail,jobs.sgaii
      !continue
      !purge nullmail > $NULL
      !SUPR
      BA macord,1,DOALL
      get cust-xref
      def email,xref-no[3],47
      def emtype,xref-no[1],2
      if emtype='EM' and (email='NONE ','N/A','NA ','0 ','00 ',&
      '000 ','0000 ' OR email=='00000@')
      extract cust-edp
      extract email
      OUTPUT nullmail,ascii
      DELETE
      XEQ
      EXIT
      !IF FINFO("NULLMAIL","EOF")>0 THEN
      !  sendhtml.xeq.threek ^macsl2.data.sys,,,NULLMAIL,&
      !    'Improperly entered Email Addresses - deleted'
      !ENDIF
      !continue
      !purge bademail > $NULL
      !SUPR
      BA macord,1,DOALL
      get cust-xref
      def email,xref-no[3],47
      def emtype,xref-no[1],2
      if emtype='EM' and email><'?@&@?@.?@'
      extract cust-edp
      extract email
      OUTPUT bademail,ascii
      XEQ
      EXIT
      !IF FINFO("BADEMAIL","EOF")>0 THEN
      !  sendhtml.xeq.threek ^macsl2.data.sys,,,BADEMAIL,&
      !    'Badly formatted Email Addresses - need to be corrected'
      !ENDIF
      !EOJ

All this started as an exercise for me. I'd heard all the complaints from sites using kickmail from Ecometry to send out email, so I modified their letter-printing program to use the NetMail/3000 API directly Netmail/3000 is a product of 3k Associates). Now I'm trying to get the Mint's email addresses/database cleaned up sufficiently to actually use it in production.

While what I posted works for 4.x and earlier Ecometry sites, I was informed that it needs enhancements for 5.x versions:

Chris,
You might want to update your script for newer (5.X) versions of Ecometry. The email address is now held in two records of CUST-XREF. The first is as you've shown with a SEARCH-TYPE of "EM". The new record has the email address upshifted, and the SEARCH-TYPE equals "EU".

Randall Davis
Director, Enterprise Operations
KBkids.com
rdavis@kbkids.com
303.226.6210

Here is a simple enhancement to the IF command to also delete the upshifted email addresses of Ecometry 5.x:

      def email,xref-no[3],47
      def emtype,xref-no[1],2
      if emtype='EM','EU' and email='NONE ','N/A','0','00 ','000 ','0000','NA '


Of course with just this change, the job will list each bad email address twice (once upshifted and once in original case). I want to list and delete all the bad entries, but would rather not list both the EM and EU record for the same customer in the report. If there are at most two records per customer, the problem can be solved by adding a SORT CUST-EDP command and a DUP NONE KEYS command. The DELETE is still done during the input phase, so all bad emails are deleted. The DUP command applies after the input, delete and sort, so only one record per CUST-EDP value will be written the to BADEMAIL file.

Here is the revised job stream:

      !JOB fixemail,jobs.sgaii
      !continue
      !purge nullmail > $NULL
      !SUPR
      BA macord,1,DOALL
      get cust-xref
      def email,xref-no[3],47
      def emtype,xref-no[1],2
      if emtype='EM','EU' and (email='NONE ','N/A','NA ','0 ','00 ',&
      '000 ','0000 ' OR email=='00000@')
      extract cust-edp
      extract email
      sort cust-edp
      dup none keys
      OUTPUT nullmail,ascii
      DELETE
      XEQ
      EXIT
      !IF FINFO("NULLMAIL","EOF")>0 THEN
      !  sendhtml.xeq.threek ^macsl2.data.sys,,,NULLMAIL,&
      !    'Improperly entered Email Addresses - deleted'
      !ENDIF
      !continue
      !purge bademail > $NULL
      !SUPR
      BA macord,1,DOALL
      get cust-xref
      def email,xref-no[3],47
      def emtype,xref-no[1],2
      if emtype='EM','EU' and email><'?@&@?@.?@'
      extract cust-edp
      extract email
      OUTPUT bademail,ascii
      XEQ
      EXIT
      !IF FINFO("BADEMAIL","EOF")>0 THEN
      !  sendhtml.xeq.threek ^macsl2.data.sys,,,BADEMAIL,&
      !    'Badly formatted Email Addresses - need to be corrected'
      !ENDIF
      !EOJ
Note: Doing a DUP NONE KEYS and a DELETE in the same pass, as shown above, does work, but may not be best practice. Robelle generally suggests that users don't archive and delete in the same pass, because its not recoverable (i.e., if the job is aborted, records are deleted but you don't have the archive copy). Better practice is to archive in pass one, and once you have the secure copy, go back and delete from the dataset in a separate pass.

I was toying with figuring out how to check the domain names in the email addresses against the Internet to see if they are valid. Stay tuned for part 2 of this article... (published January 2, 2001).


As you can see, this job stream is still a work in progress. Please send any further improvements to me and to support@robelle.com - thanks.

Chris.Bartram@usmint.treas.gov
December 13, 2000