ht://Dig Copyright © 1995-2000 The ht://Dig Group
Please see the file COPYING for
license information.
| add_anchors_to_excerpt: | no |
|
<SELECT NAME="search_algorithm"> <OPTION VALUE="exact:1 prefix:0.6 synonyms:0.5 endings:0.1" SELECTED>fuzzy <OPTION VALUE="exact:1">exact </SELECT> |
| allow_in_form: | search_algorithm search_results_header |
| allow_numbers: | true |
| allow_virtual_hosts: | false |
| authorization: | myusername:mypassword |
| backlink_factor: | 501.1 |
| bad_extensions: | .foo .bar .bad |
| bad_querystr: | forum=private section=topsecret&passwd=required |
| bad_word_list: | ${common_dir}/badwords.txt |
The default value of this attribute is determined at compile time.
| bin_dir: | /usr/local/bin |
| build_select_lists: |
MATCH_LIST matchesperpage matches_per_page_list \ 1 1 1 matches_per_page "Previous Amount" \ RESTRICT_LIST restrict restrict_names 2 1 2 restrict "" \ FORMAT_LIST format template_map 3 2 1 template_name "" |
| case_sensitive: | false |
| common_dir: | /tmp |
| common_url_parts: |
http://www.htdig.org/ml/ \ .html \ http://dev.htdig.org/ \ http://www.htdig.org/ |
| compression_level: | 6 |
The default value of this attribute is determined at compile time.
| config_dir: | /var/htdig/conf |
| create_image_list: | yes |
| create_url_list: | yes |
| database_base: | ${database_dir}/sales |
The default value of this attribute is determined at compile time.
| database_dir: | /var/htdig |
| date_factor: | 0.35 |
| date_format: | %Y-%m-%d |
| description_factor: | 350 |
| doc_db: | ${database_base}documents.db |
| doc_excerpt: | ${database_base}excerpts.db |
| doc_index: | documents.index.db |
| doc_list: | /tmp/documents.text |
| end_ellipses: | ... |
| end_highlight: | </font> |
| endings_affix_file: | /var/htdig/affix_rules |
| endings_dictionary: | /var/htdig/dictionary |
| endings_root2word_db: | /var/htdig/r2w.db |
| endings_word2root_db: | /var/htdig/w2r.bm |
| excerpt_length: | 500 |
| excerpt_show_top: | yes |
| exclude_urls: | students.html cgi-bin |
| Parameter | Description | Example |
|---|---|---|
| protocol | The URL scheme to be used. | https |
| URL | The URL to be retrieved. | https://www.htdig.org:8008/attrs.html |
| configuration-file | The configuration-file in effect. | /etc/htdig/htdig.conf |
The external protocol script is to write information for htdig on the standard output. The output must follow the form described here. The output consists of a header followed by a blank line, followed by the contents of the document. Each record in the header is terminated with a newline. Each record is a series of (unless expressively allowed to be empty) non-empty tab-separated fields. The first field is a single character that specifies the record type. The rest of the fields are determined by the record type.
| Record type | Fields | Description |
|---|---|---|
| s | status code |
An HTTP-style status code, e.g. 200, 404. Typical codes include:
|
| r | reason | A text string describing the status code, e.g "Redirect" or "Not Found." |
| m | status code | The modification time of this document. While the code is fairly flexible about the time/date formats it accepts, it is recommended to use something standard, like RFC1123: Sun, 06 Nov 1994 08:49:37 GMT, or ISO-8601: 1994-11-06 08:49:37 GMT. |
| t | content-type | A valid MIME type for the document, like text/html or text/plain. |
| l | content-length | The length of the document on the server, which may not necessarily be the length of the buffer returned. |
| u | url | The URL of the document, or in the case of a redirect, the URL that should be indexed as a result of the redirect. |
| external_protocols: |
https /usr/local/bin/handler.pl \ ftp /usr/local/bin/ftp-handler.pl |
The parser program takes four command-line
parameters, not counting any parameters already
given in the command string:
infile content-type URL configuration-file
| Parameter | Description | Example |
|---|---|---|
| infile | A temporary file with the contents to be parsed. | /var/tmp/htdext.14242 |
| content-type | The MIME-type of the contents. | text/html |
| URL | The URL of the contents. | http://www.htdig.org/attrs.html |
| configuration-file | The configuration-file in effect. | /etc/htdig/htdig.conf |
The external parser is to write information for
htdig on its standard output. Unless it is an
external converter, which will output a document
of a different content-type, then its output must
follow the format described here.
The output consists of records, each record terminated
with a newline. Each record is a series of (unless
expressively allowed to be empty) non-empty tab-separated
fields. The first field is a single character
that specifies the record type. The rest of the fields
are determined by the record type.
| Record type | Fields | Description |
|---|---|---|
| w | word | A word that was found in the document. |
| location | A number indicating the normalized location of the word within the document. The number has to fall in the range 0-1000 where 0 means the top of the document. | |
| heading level |
A heading level that is used to compute the
weight of the word depending on its context in
the document itself. The level is in the range of
0-10 and are defined as follows:
|
|
| u | document URL | A hyperlink to another document that is referenced by the current document. It must be complete and non-relative, using the URL parameter to resolve any relative references found in the document. |
| hyperlink description | For HTML documents, this would be the text between the <a href...> and </a> tags. | |
| t | title | The title of the document |
| h | head | The top of the document itself. This is used to build the excerpt. This should only contain normal ASCII text |
| a | anchor | The label that identifies an anchor that can be used as a target in an URL. This really only makes sense for HTML documents. |
| i | image URL | An URL that points at an image that is part of the document. |
| m | http-equiv | The HTTP-EQUIV attribute of a META tag. May be empty. |
| name | The NAME attribute of this META tag. May be empty. | |
| contents | The CONTENTS attribute of this META tag. May be empty. |
| external_parsers: |
text/html /usr/local/bin/htmlparser \ application/pdf /usr/local/bin/parse_doc.pl \ application/msword->text/plain "/usr/local/bin/mswordtotxt -w" \ application/x-gunzip->user-defined /usr/local/bin/ungzipper |
| extra_word_characters: | _ |
| head_before_get: | true |
| heading_factor: | 20 |
| htnotify_sender: | bigboss@yourcompany.com |
| http_proxy: | http://proxy.bigbucks.com:3128 |
| http_proxy_exclude: | http://intranet.foo.com/ |
| image_list: | allimages |
The default value of this attribute is determined at compile time.
| image_url_prefix: | /images/htdig |
| include: | ${config_dir}/htdig.conf |
| iso_8601: | true |
| keywords_factor: | 12 |
| keywords_meta_tag_names: | keywords description |
| limit_normalized: | http://www.mydomain.com |
| limit_urls_to: | .sdsu.edu kpbs [.*\.html] |
| local_default_doc: | default.html default.htm index.html index.htm |
| local_urls: | http://www.foo.com/=/usr/www/htdocs/ |
| local_urls_only: | true |
| local_user_urls: | http://www.my.org/=/home/,/www/ |
| locale: | en_US |
| logging: | true |
| maintainer: | ben.dover@uptight.com |
| match_method: | boolean |
| matches_per_page: | 999 |
| max_connection_requests: | 100 |
| max_description_length: | 40 |
| max_descriptions: | 15 |
| max_doc_size: | 5000000 |
| max_head_length: | 50000 |
| max_hop_count: | 4 |
| max_keywords: | 10 |
| max_meta_description_length: | 1000 |
| max_prefix_matches: | 100 |
| max_retries: | 6 |
| max_stars: | 6 |
| maximum_pages: | 20 |
| maximum_word_length: | 15 |
| meta_description_factor: | 20 |
| metaphone_db: | ${database_base}.mp.db |
| method_names: | or Or and And |
| mime_types: | /etc/mime.types |
| minimum_prefix_length: | 2 |
| minimum_speling_length: | 3 |
| minimum_word_length: | 2 |
| next_page_text: | <img src="/htdig/buttonr.gif"> |
| no_excerpt_show_top: | yes |
| no_excerpt_text: |
| no_next_page_text: |
| no_page_list_header: | <hr noshade size=2>All results on this page.<br> |
| no_page_number_text: |
<strong>1</strong> <strong>2</strong> \ <strong>3</strong> <strong>4</strong> \ <strong>5</strong> <strong>6</strong> \ <strong>7</strong> <strong>8</strong> \ <strong>9</strong> <strong>10</strong> |
| no_prev_page_text: |
| no_title_text: | "No Title Found" |
| noindex_end: | </SCRIPT> |
| noindex_start: | <SCRIPT |
| nothing_found_file: | /www/searching/nothing.html |
| page_list_header: |
| page_number_separator: | "</td> <td>" |
| page_number_text: |
<em>1</em> <em>2</em> \ <em>3</em> <em>4</em> \ <em>5</em> <em>6</em> \ <em>7</em> <em>8</em> \ <em>9</em> <em>10</em> |
The default value of this attribute is determined at compile time, to include the path to the acroread executable.
| pdf_parser: | /usr/local/Acrobat3/bin/acroread -toPostScript -pairs |
| persistent_connections: | false |
| prefix_match_character: | ing |
| prev_page_text: | <img src="/htdig/buttonl.gif"> |
| regex_max_words: | 10 |
| remove_bad_urls: | true |
| remove_default_doc: | default.html default.htm index.html index.htm |
| remove_unretrieved_urls: | true |
| robotstxt_name: | myhtdig |
| script_name: | /search/results.shtml |
| search_algorithm: | exact:1 soundex:0.3 |
| search_results_footer: | /usr/local/etc/ht/end-stuff.html |
| search_results_header: | /usr/local/etc/ht/start-stuff.html |
| search_results_wrapper: | ${common_dir}/wrapper.html |
| server_aliases: |
foo.mydomain.com:80=www.mydomain.com:80 \ bar.mydomain.com:80=www.mydomain.com:80 |
| server_max_docs: | 50 |
| server_wait_time: | 20 |
|
|
| sort: | revtime |
| sort_names: |
score 'Best Match' time Newest title A-Z \ revscore 'Worst Match' revtime Oldest revtitle Z-A |
| soundex_db: | ${database_base}.snd.db |
| star_blank: | http://www.somewhere.org/icons/elephant.gif |
| star_image: | http://www.somewhere.org/icons/elephant.gif |
| star_patterns: |
http://www.sdsu.edu /sdsu.gif \ http://www.ucsd.edu /ucsd.gif |
| start_ellipses: | ... |
| start_highlight: | <font color="#FF0000"> |
| start_url: | http://www.somewhere.org/alldata/index.html |
| substring_max_words: | 100 |
| synonym_db: | ${database_base}.syn.db |
| synonym_dictionary: | /usr/dict/synonyms |
| syntax_error_file: | ${common_dir}/synerror.html |
| template_map: |
Short short ${common_dir}/short.html \ Normal normal builtin-long \ Detailed detail ${common_dir}/detail.html |
| template_name: | long |
| template_patterns: |
http://www.sdsu.edu ${common_dir}/sdsu.html \ http://www.ucsd.edu ${common_dir}/ucsd.html |
| text_factor: | 0 |
| timeout: | 42 |
| title_factor: | 12 |
| translate_amp: | true |
| translate_lt_gt: | true |
| translate_quot: | true |
| uncoded_db_compatible: |
false
|
| url_list: | /tmp/urls |
| url_log: | /tmp/htdig.progress |
| url_part_aliases: |
http://search.example.com/~htdig *site \ http://www.htdig.org/this/ *1 \ .html *2 |
| url_part_aliases: |
http://www.htdig.org/ *site \ http://www.htdig.org/that/ *1 \ .htm *2 |
| use_doc_date: | true |
| use_meta_description: | true |
| use_star_image: | no |
| user_agent: | htdig-digger |
| valid_extensions: | .html .htm .shtml |
| valid_punctuation: | -' |
| No example provided |
| word_db: | ${database_base}.allwords.db |
| word_dump: | /tmp/words.txt |
| word_list: | ${database_base}.allwords.text |
| wordlist_compress: | true |
| wordlist_page_size: | 8192 |
| wordlist_cache_size: | 40000000 |
| wordlist_compress_debug: | 2 |
| No example provided |
| No example provided |
| No example provided |
| wordlist_monitor: | true |
| wordlist_monitor_period: | .1 |
| wordlist_monitor_fields: | put/s nwalks/s |
| wordlist_monitor_output: | file:/home/bosc/trash/wlmonout |