
From grdetil@scrc.umanitoba.ca Thu Feb 22 11:46:55 2001
Date: Thu, 22 Feb 2001 13:27:15 -0600 (CST)
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
To: "ht://Dig mailing list" <htdig-general@lists.sourceforge.net>
Subject: Re: [htdig] date range - PATCH for 3.2.0b3-021501 - corrected!

[ This patch fixes a small problem in the one posted about an hour earlier. ]

According to me...
> Really, though, this had been nagging at me for a while, so I finally
> decided to tackle it.  I sorted through the old messages, and the patch
> archive.  Joe had archived one of Mike's early attempts, from April 1,
> 1999, but didn't have the more recent April 6, 1999 patch.  I grabbed
> that one, which I had fortunately saved, and fixed the header that
> BeroList had mangled so I could extract it.  It was a patch for 3.1.1,
> but it applied to 3.1.5.
> 
> Mike had gotten something functional working, and had planned for a lot
> of user input contingencies, but never seemed to get the changes into
> createURL and setVariables to pass on the parameters.  In subsequent
> discussions he got bogged down in how to set up select lists for input
> parameters, and in the end he never posted a patch for propagating the
> start and end date parameters to subsequent search forms.  Setting up
> select lists is easy, now, with the build_select_lists attribute, so I
> just added code to propagate the simple numeric parameters.
> 
> I also added a number of fixes, some of which had been discussed prior
> to Mike's patch, but never implemented, and added a bit of documentation.
> The fixes are: using server's local time zone, initialising the structures
> fully, handling 2 digit years, setting the end time to the last second of
> the given day, and correctly handling Feb. 29.  The code now depends on a
> working mktime() function in your C library.  If you don't have it, C code
> for it exists in htlib/mktime.c, but this isn't compiled by the Makefile.
> 
> The patch can be applied in the main htdig-3.1.5 source directory using
> "patch -p0 < this-file".  I don't have a sample search form, but that
> should be easy to figure out.  I'll try to adapt this for 3.2.0b3 once
> I can get the latest snapshot and build it.  Be aware that in 3.1.x,
> the date range selection will trigger the same slowdown as sorting by
> date or a non-zero backlink_factor or date_factor, when there are lots
> of initial matches to process, as explained in FAQ 5.10.
> 
> I hope many people find this useful.  My thanks to Mike Grommet for
> his early work on this, and to all who prodded me into updating and
> finishing it.


OK, I said I'd do it, so here it is.  This patch can be applied in the
main htdig-3.2.0b3-021501 source directory using "patch -p0 < this-file".
It includes doc patches, so maybe it's not too late for the 3.2.0b3
release (Geoff's call).


Thu Feb 22 11:56:56 2001  Gilles Detillieux  <grdetil@scrc.umanitoba.ca>

	* htsearch/htsearch.cc (main), htsearch/Display.cc (setVariables,
	createURL, buildMatchList), htdoc/hts_form.html,
	htdoc/hts_templates.html: Add Mike Grommet's date range search
	feature.


--- htdoc/hts_form.html.nodrange	Thu Feb 15 11:11:20 2001
+++ htdoc/hts_form.html	Thu Feb 22 09:00:41 2001
@@ -147,6 +147,19 @@
 		make this item a drop down menu so the user can select the
 		type of sort at search time.
 	  </dd>
+	  <dt>
+		<strong>startyear</strong>, <strong>startmonth</strong>, <strong>startday</strong>,
+		<strong>endyear</strong>, <strong>endmonth</strong>, <strong>endday</strong>
+	  </dt>
+	  <dd>
+		These values specify the allowed range of document
+		modification dates allowed in the search results.
+		They can be used to restrict the search
+		to particular "ages" of documents, new or old.<br>
+		 The default is the full range of documents in the database.
+		These values can also be specified by configuration attributes
+		of the same names in the configuration file.
+	  </dd>
 	</dl>
 	<hr size="4" noshade>
 
--- htdoc/hts_templates.html.nodrange	Thu Feb 15 11:11:21 2001
+++ htdoc/hts_templates.html	Thu Feb 22 08:58:50 2001
@@ -381,6 +381,14 @@
 		the right.
 	  </dd>
 	  <dt>
+		<strong>STARTYEAR</strong>, <strong>STARTMONTH</strong>, <strong>STARTDAY</strong>,
+		<strong>ENDYEAR</strong>, <strong>ENDMONTH</strong>, <strong>ENDDAY</strong>
+	  </dt>
+	  <dd>
+		The currently specified date range for restricting search
+		results.
+	  </dd>
+	  <dt>
 		<strong>SYNTAXERROR</strong>
 	  </dt>
 	  <dd>
--- htsearch/htsearch.cc.nodrange	Thu Oct 19 22:40:59 2000
+++ htsearch/htsearch.cc	Thu Feb 22 08:55:18 2001
@@ -223,6 +223,27 @@ for (int cInd=0; errorMsg == NULL && cIn
     if (input.exists("sort"))
 	config.Add("sort", input["sort"]);
 
+    // Changes added 3-31-99, by Mike Grommet
+    // Check form entries for starting date, and ending date
+    // Each date consists of a month, day, and year
+
+    if (input.exists("startmonth"))
+	config.Add("startmonth", input["startmonth"]);
+    if (input.exists("startday"))
+	config.Add("startday", input["startday"]);
+    if (input.exists("startyear"))
+	config.Add("startyear", input["startyear"]);
+
+    if (input.exists("endmonth"))
+	config.Add("endmonth", input["endmonth"]);
+    if (input.exists("endday"))
+	config.Add("endday", input["endday"]);
+    if (input.exists("endyear"))
+	config.Add("endyear", input["endyear"]);
+
+    // END OF CHANGES BY MIKE GROMMET    
+
+
     minimum_word_length = config.Value("minimum_word_length", minimum_word_length);
 
     StringList form_vars(config["allow_in_form"], " \t\r\n");
--- htsearch/Display.cc.nodrange	Mon Jan 15 17:49:38 2001
+++ htsearch/Display.cc	Thu Feb 22 09:22:26 2001
@@ -445,6 +445,12 @@ Display::setVariables(int pageNumber, Li
     } else {
       vars.Add("CGI", new String(getenv("SCRIPT_NAME")));
     }
+    vars.Add("STARTYEAR", new String(config["startyear"]));
+    vars.Add("STARTMONTH", new String(config["startmonth"]));
+    vars.Add("STARTDAY", new String(config["startday"]));
+    vars.Add("ENDYEAR", new String(config["endyear"]));
+    vars.Add("ENDMONTH", new String(config["endmonth"]));
+    vars.Add("ENDDAY", new String(config["endday"]));
 	
     String	*str;
     char	*format = input->get("format");
@@ -778,6 +784,18 @@ Display::createURL(String &url, int page
 	url << "keywords=" << encodeInput("keywords") << ';';
     if (input->exists("words"))
 	url << "words=" << encodeInput("words") << ';';
+    if (input->exists("startyear"))
+	url << "startyear=" << encodeInput("startyear") << ';';
+    if (input->exists("startmonth"))
+	url << "startmonth=" << encodeInput("startmonth") << ';';
+    if (input->exists("startday"))
+	url << "startday=" << encodeInput("startday") << ';';
+    if (input->exists("endyear"))
+	url << "endyear=" << encodeInput("endyear") << ';';
+    if (input->exists("endmonth"))
+	url << "endmonth=" << encodeInput("endmonth") << ';';
+    if (input->exists("endday"))
+	url << "endday=" << encodeInput("endday") << ';';
     StringList form_vars(config["allow_in_form"], " \t\r\n");
     for (i= 0; i < form_vars.Count(); i++)
     {
@@ -1136,6 +1154,180 @@ Display::buildMatchList()
     double date_score = 0;
     double base_score = 0;
 
+
+    // Additions made here by Mike Grommet ...
+
+    tm startdate;     // structure to hold the startdate specified by the user
+    tm enddate;       // structure to hold the enddate specified by the user
+
+    time_t eternity = ~(1<<(sizeof(time_t)*8-1));  // will be the largest value holdable by a time_t
+    tm *endoftime;     // the time_t eternity will be converted into a tm, held by this variable
+
+    time_t timet_startdate;
+    time_t timet_enddate;
+    int monthdays[] = {31,28,31,30,31,30,31,31,30,31,30,31};
+
+    // boolean to test to see if we need to build date information or not
+    int dategiven = ((config.Value("startmonth")) ||
+		     (config.Value("startday"))   ||
+		     (config.Value("startyear"))  ||
+		     (config.Value("endmonth"))   ||
+		     (config.Value("endday"))     ||
+		     (config.Value("endyear")));
+
+    // find the end of time
+    endoftime = gmtime(&eternity);
+
+    if(dategiven)    // user specified some sort of date information
+      {
+	time_t now = time((time_t *)0); 	// fill in all fields for mktime
+	tm *lt = localtime(&now); 		//  - Gilles's fix
+	startdate = *lt; 
+	enddate = *lt; 
+
+	// set up the startdate structure
+	// see man mktime for details on the tm structure
+	startdate.tm_sec = 0;
+	startdate.tm_min = 0;
+	startdate.tm_hour = 0;
+	startdate.tm_yday = 0;
+	startdate.tm_wday = 0;
+
+	// The concept here is that if a user did not specify a part of a date,
+	// then we will make assumtions...
+	// For instance, suppose the user specified Feb, 1999 as the start
+	// range, we take steps to make sure that the search range date starts
+	// at Feb 1, 1999,
+	// along these same lines:  (these are in MM-DD-YYYY format)
+	// Startdates:      Date          Becomes
+	//                  01-01         01-01-1970
+	//                  01-1970       01-01-1970
+	//                  04-1970       04-01-1970
+	//                  1970          01-01-1970
+	// These things seem to work fine for start dates, as all months have
+	// the same first day however the ending date can't work this way.
+
+	if(config.Value("startmonth"))	// form input specified a start month
+	  {
+	    startdate.tm_mon = config.Value("startmonth") - 1;
+	    // tm months are zero based.  They are passed in as 1 based
+	  }
+	else startdate.tm_mon = 0;	// otherwise, no start month, default to 0
+
+	if(config.Value("startday"))	// form input specified a start day
+	  {
+	    startdate.tm_mday = config.Value("startday");
+	    // tm days are 1 based, they are passed in as 1 based
+	  }
+	else startdate.tm_mday = 1;	// otherwise, no start day, default to 1
+
+	// year is handled a little differently... the tm_year structure
+	// wants the tm_year in a format of year - 1900.
+	// since we are going to convert these dates to a time_t,
+	// a time_t value of zero, the earliest possible date
+	// occurs Jan 1, 1970.  If we allow dates < 1970, then we
+	// could get negative time_t values right???
+	// (barring minor timezone offsets west of GMT, where Epoch is 12-31-69)
+
+	if(config.Value("startyear"))	// form input specified a start year
+	  {
+	    startdate.tm_year = config.Value("startyear") - 1900;
+	    if (startdate.tm_year < 69-1900)	// correct for 2-digit years 00-68
+		startdate.tm_year += 2000;	//  - Gilles's fix
+	    if (startdate.tm_year < 0)	// correct for 2-digit years 69-99
+		startdate.tm_year += 1900;
+	  }
+	else startdate.tm_year = 1970-1900;
+	     // otherwise, no start day, specify start at 1970
+
+	// set up the enddate structure
+	enddate.tm_sec = 59;		// allow up to last second of end day
+	enddate.tm_min = 59;		//  - Gilles's fix
+	enddate.tm_hour = 23;
+	enddate.tm_yday = 0;
+	enddate.tm_wday = 0;
+
+	if(config.Value("endmonth"))	// form input specified an end month
+	  {
+	    enddate.tm_mon = config.Value("endmonth") - 1;
+	    // tm months are zero based.  They are passed in as 1 based
+	  }
+	else enddate.tm_mon = 11;	// otherwise, no end month, default to 11
+
+	if(config.Value("endyear"))	// form input specified a end year
+	  {
+	    enddate.tm_year = config.Value("endyear") - 1900;
+	    if (enddate.tm_year < 69-1900)	// correct for 2-digit years 00-68
+		enddate.tm_year += 2000;	//  - Gilles's fix
+	    if (enddate.tm_year < 0)	// correct for 2-digit years 69-99
+		enddate.tm_year += 1900;
+	  }
+	else enddate.tm_year = endoftime->tm_year;
+	     // otherwise, no end year, specify end at the end of time allowable
+
+	// Months have different number of days, and this makes things more
+	// complicated than the startdate range.
+	// Following the example above, here is what we want to happen:
+	// Enddates:        Date          Becomes
+	//                  04-31         04-31-endoftime->tm_year
+	//                  05-1999       05-31-1999, may has 31 days... we want to search until the end of may so...
+	//                  1999          12-31-1999, search until the end of the year
+
+	if(config.Value("endday"))	// form input specified an end day
+	  {
+	    enddate.tm_mday = config.Value("endday");
+	    // tm days are 1 based, they are passed in as 1 based
+	  }
+	else
+	  {
+	    // otherwise, no end day, default to the end of the month
+	    enddate.tm_mday = monthdays[enddate.tm_mon];
+	    if (enddate.tm_mon == 1)	// February, so check for leap year
+		if (((enddate.tm_year+1900) % 4 == 0 &&
+			    (enddate.tm_year+1900) % 100 != 0) ||
+		    (enddate.tm_year+1900) % 400 == 0)
+			enddate.tm_mday += 1;	// Feb. 29  - Gilles's fix
+	  }
+
+	// Convert the tm values into time_t values.
+	// Web servers specify modification times in GMT, but htsearch
+	// displays these modification times in the server's local time zone.
+	// For consistency, we would prefer to select based on this same
+	// local time zone.  - Gilles's fix
+
+	timet_startdate = mktime(&startdate);
+	timet_enddate = mktime(&enddate);
+
+	// I'm not quite sure what behavior I want to happen if
+	// someone reverses the start and end dates, and one of them is invalid.
+	// for now, if there is a completely invalid date on the start or end
+	// date, I will force the start date to time_t 0, and the end date to
+	// the maximum that can be handled by a time_t.
+
+	if(timet_startdate < 0)
+	    timet_startdate = 0;
+	if(timet_enddate < 0)
+	    timet_enddate = eternity;
+
+	// what if the user did something really goofy like choose an end date
+	// that's before the start date
+
+	if(timet_enddate < timet_startdate)  // if so, then swap them so they are in order
+	  {
+	    time_t timet_temp = timet_enddate;
+	    timet_enddate = timet_startdate;
+	    timet_startdate = timet_temp;
+	  }
+      }
+    else   // no date was specifed, so plug in some defaults
+      {
+	timet_startdate = 0;
+	timet_enddate = eternity;
+      }
+
+    // ... MG
+
+
     URLSeedScore adjustments(config);
  
     // If we knew where to pass it, this would be a good place to pass
@@ -1181,6 +1373,14 @@ Display::buildMatchList()
 	    continue;
 	}
 	
+	// Code added by Mike Grommet for date search ranges
+	// check for valid date range.  toss it out if it isn't relevant.
+	if ((timet_startdate > 0 || enddate.tm_year < endoftime->tm_year) &&
+	    (thisRef->DocTime() < timet_startdate || thisRef->DocTime() > timet_enddate))
+	{
+	    delete thisRef;
+	    continue;
+	}
 
 	thisMatch = ResultMatch::create();
 	thisMatch->setID(id);


-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <htdig-general@lists.sourceforge.net>
To unsubscribe, send a message to <htdig-general-request@lists.sourceforge.net> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
