Recently, as a result of our articles on checking email addresses, I received an inquiry from Kurt Sager of SWS, the Robelle distributor in Switzerland, asking how to verify a large number of URL addresses automatically:
In some databases we have many Internet URL's, mostly links to home pages, or an html page inside a web site.It happens that such addresses contains typing errors, or the web site disappears ... we all know the problem!
We can easily and automatically create a flat file, say once a month,containing one URL per line, possibly many many thousands lines.
We need a utility to check the validity of all the addresses, possibly write a new file with the invalid adresses for other actions to take.
I didn't know of a method to do this, so I posted the question on the HP3000-L. Mike Hornsby suggested I look at Lars Appel's port of the GNU "wget" utility:
www.editcorp.com/Personal/Lars_Appel/wget/I downloaded and installed it, and it seems to work just fine.
/l testurls 1 http://www.robelle.com 2 http://www.robelle.com/tips/qedit-glue.html 3 http://www.robelle.com/bogus
xeq sh.hpbin.sys "-L -c ""wget -nv -i /SYS/TESTING/TESTURLS -o /SYS/TESTING/RESULT -O /dev/null"""notes:
/l result 1 10:47:22 URL:http://www.robelle.com:80/ [12296] -> "/dev/null" [1] 2 10:47:23 URL:http://www.robelle.com:80/tips/qedit-glue.html [4405] -> "/dev/null" [1] 3 http://www.robelle.com:80/bogus: 4 10:47:23 ERROR 404: Not Found. 5 6 FINISHED --10:47:23-- 7 Downloaded: 16,701 bytes in 2 files
This should be reasonably easy to massage into a list of failed addresses with Qedit.
Hans.Hendriks@robelle.com
January 29, 2001