From hans-peter.nilsson@axis.com Mon Jan 4 17:19:39 1999 Date: Tue, 5 Jan 1999 01:27:50 +0100 From: Hans-Peter Nilsson To: htdig@sdsu.edu Subject: htdig: Patch for external_parser in attrs.html: Corrected and extended documentation The current description of external parsers is wrong; they do *not* take input on stdin; see htdig3/htdig/ExternalParser.cc Here's an update. I also changed one of the examples to show how parameters can be passed. It should also be noted that the "u" field should specify a complete, non-relative URL. Maybe this is a bug, since the "i" field can be relative. The safe way to go here IMHO is to update the documentation, *then* perhaps fix the code; here we go. No empty fields are allowed. Think strtok ("\t\t","\t") or try it yourself; you'll get an "external parser error". There's also a random typo fix for "second string [of] each pair" on the first line. htdoc/ChangeLog: Thu Jan 5 00:47:22 1998 Hans-Peter Nilsson * attrs.html: Correct and add more verbose description of external parser program parameters and fields. Index: attrs.html =================================================================== RCS file: /opt/htdig/cvs/htdig3/htdoc/attrs.html,v retrieving revision 1.9 diff -p -c -r1.9 attrs.html *** attrs.html 1998/12/13 05:44:54 1.9 --- attrs.html 1999/01/05 00:25:01 *************** *** 1208,1220 **** The external parsers are specified as pairs of strings. The first string of each pair is the content-type that the parser can handle while the ! second string each pair is the path to the external ! parsing program. The parsing program will get the ! document to be parsed on its standard input and it is ! to write information for htdig on its standard ! output.
The output consists of records, each record terminated ! with a newline. Each record is a series of tab separated fields. The first field is a single character that specifies the record type. The rest of the fields are determined by the record type. --- 1208,1281 ---- The external parsers are specified as pairs of strings. The first string of each pair is the content-type that the parser can handle while the ! second string of each pair is the path to the external ! parsing program. If quoted, it may contain parameters, ! separated by spaces.

! The parser program takes four command-line ! parameters, not counting parameters and parameters ! given in the command string:
! infile content-type URL configuration-file
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
! Parameter ! ! Description ! ! Example !
! infile ! ! A temporary file with the contents to be parsed. ! ! /var/tmp/htdext.14242 !
! content-type ! ! The MIME-type of the contents. ! ! text/html !
! URL ! ! The URL of the contents. ! ! http://www.htdig.org/attrs.html !
! configuration-file ! ! The configuration-file in effect. ! ! /etc/htdig/htdig.conf !

! The external parser is to write information for ! htdig on its standard output.
The output consists of records, each record terminated ! with a newline. Each record is a series of non-empty tab separated fields. The first field is a single character that specifies the record type. The rest of the fields are determined by the record type. *************** *** 1340,1346 **** A hyperlink to another document that is ! referenced by the current document. --- 1401,1409 ---- A hyperlink to another document that is ! referenced by the current document. It must be ! complete and non-relative, using the URL parameter to ! resolve any relative references found in the document. *************** *** 1409,1415 ****

external_parsers: text/html /usr/local/bin/htmlparser ! application/ms-word /usr/local/bin/mswordparser
--- 1472,1478 ----
external_parsers: text/html /usr/local/bin/htmlparser ! application/ms-word "/usr/local/bin/mswordparser -w"
brgds, H-P ---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to htdig-request@sdsu.edu containing the single word "unsubscribe" in the body of the message.