From grdetil@scrc.umanitoba.ca Wed Oct 20 10:46:18 1999 Date: Wed, 20 Oct 1999 12:09:50 -0500 (CDT) From: Gilles Detillieux To: htdig3-dev@htdig.org Subject: [htdig3-dev] external converter docs for htdig Hi again. Here is a patch to add documentation for yesterday's patch for external converter support. Again, it applies to 3.1.3. --- htdig-3.1.3/htdoc/attrs.html.noconv Wed Sep 22 11:18:41 1999 +++ htdig-3.1.3/htdoc/attrs.html Wed Oct 20 11:37:52 1999 @@ -1625,9 +1625,29 @@ content-type that the parser can handle while the second string of each pair is the path to the external parsing program. If quoted, it may contain parameters, - separated by spaces.

+ separated by spaces.
+ External parsing can also be done with external + converters, which convert one content-type to + another. To do this, instead of just specifying + a single content-type as the first string + of a pair, you specify two types, in the form + type1->type2, + as a single string with no spaces. The second + string will define an external converter + rather than an external parser, to convert + the first type to the second. If the second + type is user-defined, then + it's up to the converter script to put out a + "Content-Type: type" header followed + by a blank line, to indicate to htdig what type it + should expect for the output, much like what a CGI + script would do. The resulting content-type must + be one that htdig can parse, either internally, + or with another external parser or converter.
+ Only one external parser or converter can be + specified for any given content-type.

The parser program takes four command-line - parameters, not counting parameters and parameters + parameters, not counting any parameters already given in the command string:
infile content-type URL configuration-file
@@ -1688,7 +1708,10 @@

The external parser is to write information for - htdig on its standard output.
+ htdig on its standard output. Unless it is an + external converter, which will output a document + of a different content-type, then its output must + follow the format described here.
The output consists of records, each record terminated with a newline. Each record is a series of (unless expressively allowed to be empty) non-empty tab-separated @@ -1927,7 +1950,9 @@ text/html /usr/local/bin/htmlparser \
- application/ms-word "/usr/local/bin/mswordparser -w" + application/pdf /usr/local/bin/parse_doc.pl \
+ application/msword->text/plain "/usr/local/bin/mswordtotxt -w" \
+ application/x-gunzip->user-defined /usr/local/bin/ungzipper -- Gilles R. Detillieux E-mail: Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.