Re: Mining the data in the correspondance archives

From: Keith Copenhagen <K_at_copetech_dot_com>
Date: Tue Nov 30 2004 - 22:17:05 CST

Perhaps we could generate a (limited) list of key terms :

For example :
"Open Source"
"Hardware Platform"
"Operating System"
"Readable Ballot"
"Voter Rolls"

The resulting list of emails could turn into wiki pages (maybe with a grep
+/-1 line kind of intro) and over time we could by hand cull them back to
ones that capture the consderations and the concensus.


On Tue, 30 Nov 2004 13:42:03 -0800 (PST), Edmund R. Kennedy
<> wrote:

> Hello:
> Um. David, I mean preparing a separate index of the documents that
> people could consult to see what's there. I didn't mean 'indexing'
> although that's a reasonable interpretation. That sort of key index is
> effectively invisible to the final user and isn't not very useful when
> you don't know what key word to use. When I turn to a paper index and
> don't know the search term I can skim quickly through the index and
> usually find what I'm looking for quickly.
> I'm kind of thinking something like an index of threads, an index of
> authors, resorting those by date, or size, etc. Right now, the
> correspondence archives are effectively a knowledge swamp. The ultimate
> goal is try to drain the swamp and systematically extract the
> information to end up in the Wiki or even the FAQ. No, I'm not hip deep
> in alligators yet.
> David Mertz <> wrote:
> On Nov 30, 2004, at 3:35 PM, laird popkin wrote:
>> The indexing part is well solved by a number of open source text
>> indexing engines, such as Apache's Lucene, or Zilverline.
>> KnowledgeTree also looks very interesting -- it's a full fledged
>> document management system that includes text indexing. So that might
>> be overkill.
> Well, yeah. But it's even easier to solve using Google.
> That's what the archive site already does; google happily spiders our
> email archive with a good regularity. The search box there is just a
> google search with a "site:..." restriction (and I think a little
> kludge where I add the term 'hypermail' which is the archive generating
> program that puts a little blurb on archived pages; just to exclude
> other documents I may host at the same domain).
> _______________________________________________
> OVC discuss mailing lists
> Send requests to subscribe or unsubscribe to

Keith Copenhagen
OVC discuss mailing lists
Send requests to subscribe or unsubscribe to
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Tue Nov 30 23:17:43 2004

This archive was generated by hypermail 2.1.8 : Tue Nov 30 2004 - 23:17:44 CST