Date: 12 Jun 2001 22:40:24 -0500
From: Chris Green <cmg@uab.edu>
To: htdig-general@lists.sourceforge.net
Subject: [htdig] initial work on indexing mail files

Well, I've done a bit of work on the default_type stuff and trying to
get htdig working with my somewhat nonstandard mail form of 1 file per
message.  Someone could extend the mailparse.py script to do mail
spools but I can't think of a good way to index inside a mail spool.

The converter ignores that which is mime and is not of text/plain.  It
was written with python 1.5.2

Patches are against 3.2.0b2 - there might be an extra cout here or
there. 

I've written some messy patches to add a default_type attribute to
htdig in with support at the htdig/Retriever.cc level with a bit of
support in the htnet/HtFile.cc Didn't know where else it was
appropriate to put this type of thing and doing it over and over seems
kludgy in other places.

Please comment on this ( I will admit that I'm not even a
non-laughable C++ programmer - too many years since C++ and I last met
)


my htdig.conf:

database_dir:		/home/sprout/tmp/htdig/db
start_url:		http://localhost/files.html
local_urls:		http://localhost/=/home/sprout/Mail/
local_urls_only:	true

# default extension type for
# things we can't figure out any other way
#
# only use this option when you know what you are going to be parsing
# otherwise you will need a converter than handles anything

# default_type:      application/nnml
default_type: text/plain
external_parsers:  application/nnml->text/html \
/home/sprout/src/python/mailtests/mailparse.py

-- 
Chris Green <cmg@uab.edu>
Let not the sands of time get in your lunch.
