Fwd: Spam help requested

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Mon Apr 11 2005 - 10:41:11 CDT

On Apr 11, 2005, at 4:46 AM, Jan Karrman wrote:
> I did just write this Perl one-liner to do this:
> printf qw(&#%d; &#x%x;)[int(rand(2))], ord $& while $ARGV[0] =~ /./g;

The problem isn't the munging algorithm, the simple thing we do for
headers is fine: e.g. mertz_at_gnosis_dot_cx. We're not trying to make
it a crypto puzzle, just moderately non-scannable by spam spiders.

It's identifying ALL AND ONLY the email addresses that should be munged
within message bodies that's the problem.

(1) Find every genuine email address (grabbing the whole address, but
none of the surrounding text).
(2) Don't falsely identify anything that has an @ sign w/o being an
email address
(3) Don't identify anything that IS an email address, but that is
desirable to leave in its original form (e.g. copies of press releases,
contact info for politicians, spacing-sensitive table layouts,
petitions, etc).

(1) and (2) are tractable, but not trivial. (3) is a real bear.

If anyone sends me a script to do the right thing, I'd be happy to run
it over the whole collection of old archive HTML files (be sure your
script doesn't get confused by HTML and only work on plain text). And
also don't get messed up with attachments of various sorts.

_______________________________________________
OVC discuss mailing lists
Send requests to subscribe or unsubscribe to arthur@openvotingconsortium.org
==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Sat Apr 30 23:17:05 2005

This archive was generated by hypermail 2.1.8 : Sat Apr 30 2005 - 23:17:22 CDT