Re: Fwd: Spam help requested

From: Arthur Keller <voting_at_kellers_dot_org>
Date: Tue Apr 12 2005 - 01:36:15 CDT

The approach I suggest doesn't make addresses unreadable to a human
viewer, so there's no problem for items (2) and (3). The problem
then is solving (1).

Best regards,

At 11:41 AM -0400 4/11/05, David Mertz wrote:
>On Apr 11, 2005, at 4:46 AM, Jan Karrman wrote:
>>I did just write this Perl one-liner to do this:
>>printf qw(&#%d; &#x%x;)[int(rand(2))], ord $& while $ARGV[0] =~ /./g;
>The problem isn't the munging algorithm, the simple thing we do for
>headers is fine: e.g. mertz_at_gnosis_dot_cx. We're not trying to
>make it a crypto puzzle, just moderately non-scannable by spam
>It's identifying ALL AND ONLY the email addresses that should be
>munged within message bodies that's the problem.
>(1) Find every genuine email address (grabbing the whole address,
>but none of the surrounding text).
>(2) Don't falsely identify anything that has an @ sign w/o being an
>email address
>(3) Don't identify anything that IS an email address, but that is
>desirable to leave in its original form (e.g. copies of press
>releases, contact info for politicians, spacing-sensitive table
>layouts, petitions, etc).
>(1) and (2) are tractable, but not trivial. (3) is a real bear.
>If anyone sends me a script to do the right thing, I'd be happy to
>run it over the whole collection of old archive HTML files (be sure
>your script doesn't get confused by HTML and only work on plain
>text). And also don't get messed up with attachments of various
>OVC discuss mailing lists
>Send requests to subscribe or unsubscribe to

Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
OVC discuss mailing lists
Send requests to subscribe or unsubscribe to
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Sat Apr 30 23:17:05 2005

This archive was generated by hypermail 2.1.8 : Sat Apr 30 2005 - 23:17:22 CDT