From grdetil@scrc.umanitoba.ca Wed Mar 17 17:50:29 1999 Date: Wed, 17 Mar 1999 17:13:05 -0600 (CST) From: Gilles Detillieux To: htdig@htdig.org Subject: Re: [htdig] 3.1.1: Does noindex_start, noindex_stop work? According to me, back in late February... > According to Frank Richter: > > Then I had by mistake an empty noindex_start: value in the conf file, oh > > dear, no words were indexed at all (my error, but might be dangerous for > > others too). > > Yes, you're right. The code should check for an empty string, and disable > the feature if that's the case. Right now, it just does a strncmp() > with a length of 0, which will always match. I think this should also > use mystrncasecmp() instead, and mystrcasestr() to find the end, so that > it won't care if the tags are upper or lower case. Objections? Well, I didn't hear any objections, so here's the patch to make these fixes to htdig/HTML.cc, as well as fix up the discrepancies in the documentation. I'll be committing these to CVS shortly. --- ./htdig/HTML.cc.skipendbug Wed Mar 17 16:11:52 1999 +++ ./htdig/HTML.cc Wed Mar 17 17:05:15 1999 @@ -125,9 +125,10 @@ // Filter out section marked to be ignored for indexing. // This can contain any HTML. // - if (strncmp((char *)position, skip_start, strlen(skip_start)) == 0) + if (*skip_start && + mystrncasecmp((char *)position, skip_start, strlen(skip_start)) == 0) { - q = (unsigned char*)strstr((char *)position, skip_end); + q = (unsigned char*)mystrcasestr((char *)position, skip_end); if (!q) *position = '\0'; // Rest of document will be skipped... else --- ./htdoc/attrs.html.skipendbug Tue Feb 16 23:03:53 1999 +++ ./htdoc/attrs.html Wed Mar 17 16:21:55 1999 @@ -3433,7 +3433,7 @@
noindex_start, - noindex_stop + noindex_end
@@ -3453,7 +3453,7 @@ default:
- <!--htdig-noindex--> <!--/htdig-noindex--> + <!--htdig_noindex--> <!--/htdig_noindex-->
description: @@ -3468,14 +3468,14 @@ SCRIPT sections in 'uneditable' documents can be skipped; note how noindex_start does not contain an ending >: this allows for all SCRIPT tags to be matched regardless of attributes defined (different types or - languages). + languages). Note that the match for this string is case insensitive.
example:
noindex_start: <SCRIPT
- noindex_stop: </SCRIPT> + noindex_end: </SCRIPT>
--- ./htdoc/cf_byname.html.skipendbug Tue Feb 16 23:03:54 1999 +++ ./htdoc/cf_byname.html Wed Mar 17 16:22:47 1999 @@ -105,8 +105,8 @@ * next_page_text
* no_excerpt_text
* no_excerpt_show_top
+ * noindex_end
* noindex_start
- * noindex_stop
* no_next_page_text
* no_page_list_header
* no_page_number_text
--- ./htdoc/cf_byprog.html.skipendbug Tue Feb 16 23:03:54 1999 +++ ./htdoc/cf_byprog.html Wed Mar 17 16:23:10 1999 @@ -56,8 +56,8 @@ * meta_description_factor
* minimum_word_length
* modification_time_is_now
+ * noindex_end
* noindex_start
- * noindex_stop
* pdf_parser
* remove_default_doc
* robotstxt_name
-- Gilles R. Detillieux E-mail: Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.