Re: Mining the data in the correspondance archives

From: laird popkin <lairdp_at_gmail_dot_com>
Date: Tue Nov 30 2004 - 14:35:55 CST

The indexing part is well solved by a number of open source text
indexing engines, such as Apache's Lucene, or Zilverline.
KnowledgeTree also looks very interesting -- it's a full fledged
document management system that includes text indexing. So that might
be overkill.

The trick might be in actually getting a copy of the email archives
(or it might be trivial, depending on how they're set up). Since David
Mertz runs the archive, it's probably possible to throw them into a
text indexing engine...

I'm not too sure about an AI tool to summarize each thread, though --
that sounds pretty tricky. I remember Oracle had a technology that
could take any document and generate a short summary. Perhaps there's
a free tool like that somewhere?

- LP

On Mon, 29 Nov 2004 13:16:59 -0800 (PST), Edmund R. Kennedy
<> wrote:
> Hello All:
> Despite the impression I may have given people, I hate to tell people, "Look
> in the correspondance archives!" I keep meaning to go back and mine the
> various conversation threads into consensus summaries in the Wiki but I just
> don't seem to have the time While Keith Copenhagen should be thanked
> (thank you Keith) for keeping up with the current discussions (as well as
> generating some himself) I don't expect him to do this data mining. Is
> there a technology that would at least index the archives? What would be
> ideal would be some sort of AI system that would summarize each thread.
> Regardless, I could use some help on this issue.
> Thanks, Ed Kennedy
> --
> 10777 Bendigo Cove
> San Diego, CA 92126-2510
> "We must all cultivate our gardens." Candide-Voltaire
> _______________________________________________
> OVC discuss mailing lists
> Send requests to subscribe or unsubscribe to

- Laird Popkin, cell: 917/453-0700
OVC discuss mailing lists
Send requests to subscribe or unsubscribe to
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Tue Nov 30 23:17:42 2004

This archive was generated by hypermail 2.1.8 : Tue Nov 30 2004 - 23:17:44 CST