Re: Mining the data in the correspondance archives

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Tue Nov 30 2004 - 14:44:42 CST

On Nov 30, 2004, at 3:35 PM, laird popkin wrote:
> The indexing part is well solved by a number of open source text
> indexing engines, such as Apache's Lucene, or Zilverline.
> KnowledgeTree also looks very interesting -- it's a full fledged
> document management system that includes text indexing. So that might
> be overkill.

Well, yeah. But it's even easier to solve using Google.

That's what the archive site already does; google happily spiders our
email archive with a good regularity. The search box there is just a
google search with a "site:..." restriction (and I think a little
kludge where I add the term 'hypermail' which is the archive generating
program that puts a little blurb on archived pages; just to exclude
other documents I may host at the same domain).

OVC discuss mailing lists
Send requests to subscribe or unsubscribe to
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
Received on Tue Nov 30 23:17:42 2004

This archive was generated by hypermail 2.1.8 : Tue Nov 30 2004 - 23:17:44 CST