Date: Thu, 12 Jul 2001 13:43:11 -0500 (CDT)
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
To: "ht://Dig mailing list" <htdig-general@lists.sourceforge.net>
Subject: Re: [htdig] PATCH - HtFile.cc bug in 3.2.0b* (was: List of numbers
    chokes htdig 3.1.5)

According to me...
> I tried it with the 3.2.0b4-070801 snapshot, using a file:// URL, and
> it took 8 minutes on an 866 MHz Pentium III, but it messed up somehow.
> It seems to have lost all the newlines from the file, so it tried to
> index 1 big number.  I'll need to look into whether this is a problem
> with the HtFile handler or the Plaintext parser.  Once I debug it, I'll
> try profiling it too, although the word indexing is quite different in
> 3.2.

Different indeed!  Once I fixed a glaring error in HtFile.cc, htdig
3.2.0b4 (070801 snapshot) correctly indexed the spp2.txt file in a few
short seconds.

<rant>
So, even though HtFile.cc has been in there since before 3.2.0b1, it's
pretty obvious that NOBODY EVER TESTED THIS CODE BEFORE!!!!!  ARRRGGH!
Why are major revisions and big additions continually being committed
to this source tree without even the most basic testing?
</rant>

Here is my patch to this latest snapshot, which I'll be committing
later today.  It fixes this bug, and also a couple apparent problems
in the mime.types handling.  It seems to me that if it's unable to
open the mime.types file, it will keep trying on every request.  Also,
if the mime.types file is there but empty, it doesn't fall back to the
built-in rules, which it does now with this patch (sort of a side effect,
but I think a good one, of my fix to keep it from continually trying to
open the file).

--- htnet/HtFile.cc.readbug	Sun May 20 02:13:53 2001
+++ htnet/HtFile.cc	Thu Jul 12 12:57:28 2001
@@ -88,10 +88,10 @@ HtFile::DocStatus HtFile::Request()
 
    if (!mime_map)
      {
+       mime_map = new Dictionary();
        ifstream in(config->Find("mime_types").get());
        if (in)
          {
-           mime_map = new Dictionary();
            String line;
            while (in >> line)
              {
@@ -170,7 +170,7 @@ HtFile::DocStatus HtFile::Request()
    if (ext == NULL)
      return Transport::Document_not_local;
 
-   if (mime_map)
+   if (mime_map && mime_map->Count())
      {
        String *mime_type = (String *)mime_map->Find(ext + 1);
        if (mime_type)
@@ -190,20 +190,21 @@ HtFile::DocStatus HtFile::Request()
 
    _response._modification_time = new HtDateTime(stat_buf.st_mtime);
 
-   ifstream in((const char *)_url.path());
-   if (!in)
+   FILE *f = fopen((const char *)_url.path(), "r");
+   if (f == NULL)
      return Document_not_found;
 
-   String tmp;
-   while (in >> tmp)
+   char	docBuffer[8192];
+   int		bytesRead;
+   while ((bytesRead = fread(docBuffer, 1, sizeof(docBuffer), f)) > 0)
      {
-       if (_response._contents.length()+tmp.length() > _max_document_size)
-         tmp.chop(_response._contents.length()+tmp.length()
-                    - _max_document_size);
-       _response._contents.append(tmp);
-       if (_response._contents.length() >= _max_document_size)
-         break;
+	if (_response._contents.length() + bytesRead > _max_document_size)
+	    bytesRead = _max_document_size - _response._contents.length();
+	_response._contents.append(docBuffer, bytesRead);
+	if (_response._contents.length() >= _max_document_size)
+	    break;
      }
+   fclose(f);
 
    _response._content_length = stat_buf.st_size;
    _response._document_length = _response._contents.length();

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <htdig-general@lists.sourceforge.net>
To unsubscribe, send a message to <htdig-general-request@lists.sourceforge.net> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
