ftp.nice.ch/pub/next/connectivity/infosystems/WAIStation.1.9.6.N.b.tar.gz#/WAIS/doc/protspec.txt

This is protspec.txt in view mode; [Download] [Up]





                       WAIS Interface Protocol
                  Prototype Functional Specification

                             Version 1.5
                            April 23, 1990

 Franklin Davis, Brewster Kahle, Harry Morris, Jim Salem, Tracy Shen
                    Thinking Machines Corporation

                  Rod Wang, John Sui, Mark Grinbaum
                      Dow Jones & Company, Inc.




        Contents

        1. Overview
                1.1 Supported Facilities
                1.2 Unsupported Facilities
                1.3 Conformance with Version 1 of Z39.50
                1.4 Errors in the Standard

        2. Initialization Facility
                2.1 Init APDU
                2.2 Init-Response APDU

        3. Search Facility
                3.1 Search APDU
                3.2 Search-Response APDU

        4. Element-Set-Names supported by DowQuest
                4.1 Document-Header-Request
                4.2 Document-Text-Request
                4.3 Document-Header
                4.4 Document-Text
                4.5 Document-Short-Header
                4.6 Document-Headline
                4.7 Document-Long-Header
                4.8 Document-Codes

        5. Data Element Definitions
                5.1 Tag Values of the Data Element

        Appendix A. Type-3 Query (Relevance Feedback)

        Appendix B. Sample APDUs in WAIS Demonstration System
                B.1 Init APDU
                B.2 Init-Response APDU
                B.3 Search APDU
                B.4 Search-Response APDU
        Appendix C. DowQuest Code Formats

1. Overview

The purpose of this interface is to establish an application level
(ISO 2) protocol for query/retrieval applications.  The initial
implementation will provide a protocol for the DowQuest database
service provided by Dow Jones News Retrieval.  Workstation interfaces
will be implemented on the Macintosh as part of the WAIS project (Wide
Area Information Server).  The intention is to provide a sophisticated
and expandable computer-to-computer interface for future databases.

This protocol is based on the Z39.50-1988 ("the standard") Information
Retrieval Service Definitions and Protocol Specification for Library
Applications.  Each section of this document includes references in
square brackets "[]" to the appropriate section(s) in the Z39.50
specification.

The standard specifies an Opens Systems Interconnection application
layer service definition and protocol specification for Information
Retrieval.  The Information Retrieval protocol allows an application
on one computer to query the database of another computer.  The
protocol specifies the procedures and structures for the intersystem
submission of a search request (including the syntax of the query),
request for the transmission of database records located by a search,
the responses to the request, access control, and resource control.

This is the last version of the WAIS protocol to be based on the
Z39.50 standard.  The next version will implement the newer SR-1
standard, which is based on Z39.50, but is written in ASN.1.

The WAIS extensions to the standard are primarily to support
"relevance feedback" queries.  (The standard currently supports a
boolean query syntax.)  The Present facility is not used, in order to
allow the target system to be "stateless" (to always delete Result-
Sets.)  Instead, a Type-1 query is used for text retrieval.  In order
to retrieve document number xxx, a search is performed with a query
specifying that System-Control-Number=xxx.

The WAIS extensions also enable the origin to request a range of
document text.  The Type-1 query is used as described in the previous
paragraph with the addition of Chunk-Code parameters.  The portion of
the document that matches the Chunk-Code values will be returned, e.g.
"System-Control-Number=xxx AND Line>1000 AND Line <= 2000" would
return lines 1001 through 2000 of document xxx.

This protocol requires the target system to return unique document IDs
in a Search-Response, labeled as System-Control-Number (see Appendix C
of the standard).  These document IDs are used by the origin (user
interface) to specify documents when requesting display of a document
or in relevance feedback searches.

Retrieval of large documents dependsw on the ability to specify a
range of a document in a search.  This will be specified with an
extension called "Chunks."  This version of the protocol does not have
a method for the origin and target to negotiate the available chunk
types.  Three chunk types are currently defined for DowQuest: Byte,
Line, and Paragraph.

For efficiency reasons it is useful to refer to a document range with
large "chunks" that have been marked in the text by the target system.
The chunk markers and IDs are not displayed to the user, but are used
by the origin when the user selects a range of a document for a
relevance feedback query.  The Init-Response APDU is extended to
provide "chunk" markers and sizes which may be used to specify
document ranges in relevance feedback queries.

The User Information part of APDUs is used in more complex ways in
this extension than was originally envisioned in the standard.  In the
standard, the User Information part was a single Element of type
"any."  The WAIS protocol extensions uses User-Information-Field
preceding the set of elements in the user information part of an APDU.
This is the length in bytes of all the following elements, excluding
the User-Information-Length element.


1.1 Supported Facilities

For the June 1990 target delivery date of the prototype WAIS system,
DowQuest will support only 2 facilities from the Z39.50 specification.

The "Initialization Facility" [3.2.1] includes an "Init APDU"
[4.1.1.1, table A2] and an "Init-Response APDU" [4.1.1.2, Table A3].

The "Search Facility" [3.2.2] includes a "Search APDU" [4.1.1.3, table
A4] and a "Search-Response APDU" [4.1.1.4, table A5].

"APDU" means "Application Protocol Data Unit," which is a unit of data
passed between an origin (user workstation) and target (database
server).  These and other terms are defined in section 2 of the Z39.50
specification.

The Search APDU will be extended to have a new query type: Type-3,
"Relevance Feedback Query."

The Search-Response APDU will be modified to include new elements in
Database-Records, including Document-IDs (used for relevance feedback)
and other fields, specified in section 4 of this document.


1.2 Unsupported Facilities

The remaining 5 facilities from Z39.50 are not supported in the WAIS
prototype.

The "Retrieval Facility" will not be supported in the Wais prototype.
Document text will be retrieved using a Type-1 query based on
System-Control-Number (document ID).

The "Result-Set-Delete Facility" is not needed because DowQuest will
always delete all Result-Sets after returning a Search-Response APDU.

The "Access Control Facility" will not be supported.  All users will
have access to all data in DowQuest.

The "Accounting/Resource Control Facility" will not be supported.
DowQuest responses have a maximum size.

The "Termination Facility" is not needed because DowQuest will not
store any state about user sessions.  Each request and response will
be a complete transaction, independent of all others.  Either the
origin or the target may abort a session at any time.


1.3 Conformance with Version 1 of Z39.50

1.3.1 Extensibility

As specified in section 4.3 of the standard, WAIS systems will ignore
unknown data elements and options in received Init APDUs.

1.3.2 Static Requirements

The DowQuest system will conform to the Static Requirements specified
in section 4.4.1 of the standard, with extensions noted in this
document, except that it will NOT support general boolean Type-1
queries.  The Type-1 query will be used only for retrieval of
documents based on System-Control-Number and Chunks.

1.3.3 Dynamic Requirements

WAIS systems will conform to the Dynamic Requirements specified in
section 4.4.2 of the standard.  There are restrictions on the Type-1
Query.

1.3.4 Statement Requirements

DowQuest will be capable of acting in the role of target.  It supports
version 1 of the standard.

See section 1.2 of this document for unsupported facilities.

Result-Sets will always be unilaterally deleted by DowQuest.  It will
not accept Search APDUs specifying named result sets.  Each input and
response message pair is a complete, independent transaction.  Thus,
multiple users may share a single session, although the order of
responses is not guaranteed to be the same order as the requests.  If
multiple users share a connection, the origin must use Reference-IDs
to identify input/response message pairs.

DowQuest supports element set names in Search APDUs as specified in
section 4 of this document.

The maximum number of database names that may be specified in a Search
APDU will be determined by the implementors.


1.4 Errors in the Standard

Table A7 on p. 43 of the standard is a copy of table A6.  Table A7
should contain the fields defined in 4.1.1.6, p.  23.  Earlier
versions of the WAIS protocol specification contained the same error
in table B.6.

2. Initialization Facility

DowQuest will accept an Init APDU at any time, and will always respond
with an Init-Response APDU.  Since DowQuest is stateless, the
Initialization facility is not required to begin a user session, but
it may be used anytime to get the system parameters.

The Init-Response APDU may specify "chunk" parameters that may be used
to specify a range of a document in a relevance feedback Type-3 Query.
[??? The chunk negotiation needs to be defined more completely.]

The Init-Response APDU may also specify newline characters,
non-displayable field markers, and highlight/non-highlight markers,
and fields describing how often the target is updated and when the
target is updated.


2.1 Init APDU

The Init APDU requests information about the database service [3.2.1,
4.1.1.1, and Table A2].  Since DowQuest is stateless, Init is not
required to begin a user session.

The Options field must always have 0="will not use" for the Delete
facility.

See Appendix B.1 of this document for an example Init APDU.


2.2 Init-Response APDU

The Init-Response APDU provides information about the database service
[3.2.1, 4.1.1.2, and Table A3].

The Options field will always have 0="will not support" for the
access-control and resource-control facilities.

Implementation-Name will be "DowQuest", and the Implementation-Version
will be set by the implementors, to be updated as new versions are
released.

Preferred-Message-Size and Maximum-Record-Size will be determined
during the implementation.

See Appendix B.2 of this document for an example Init-Response APDU.


2.2.1 Chunk IDs

The User-Information-Field of the Init-Response APDU will contain
four elements indicating ways the origin may specify a region of a
document to be used in a relevance feedback Type-3 query.  The region
is composed of a range of "chunks" such as bytes or paragraphs.  The
elements are:

        Search-Chunk-Code-Bitmap  O       bitmap
        Present-Chunk-Code-Bitmap [???] O bitmap
        Chunk-ID-Length           C       integer
        Chunk-Marker              C       ASCII

Search-Chunk-Code-Bitmap specifies the chunk codes the target will
accept in Type-1 Queries in Search APDUS requesting display of
document regions.  The bitmap indicates with a "1" in a bit position
that the corresponding code number will be accepted by the target
system.  For example, to indicate that the target accepts accepts
Chunk-Codes 1 and 3 in a Search APDU it would return
Search-Chunk-Code-Bitmap with bits 1 and three set to 1 and all other
bits 0.

Initially, four Chunk-Codes are defined.  The default is 1 "Byte" (see
section 5 of this document):

        Chunk-Code=0 "Document"
        Chunk-Code=1 "Byte"
        Chunk-Code=2 "Line"
        Chunk-Code=3 "Paragraph"

(In the future this may be extended to include other measures, such as
Word, Page, or Chapter-ID.  Other media such as audio might use chunks
such as Song-ID or Seconds.  Video might use Frame or Scene-ID.)

Chunk-Code=1 "Byte" is the most general case.  With this chunk size,
Chunk-Marker and Chunk-ID-Length are not used.  The origin may
indicate ranges of a document in bytes by setting Chunk-Code=1 and
providing pairs of byte-offsets in a relevance feedback Type-3 query.
If any Chunk-Code > 1 is accepted, the target must also provide
Chunk-ID-Length and Chunk-Marker.

DowQuest will provide Chunk-Code=3 (Paragraph-ID) for relevance
feedback Type-3 Queries, and Chunk-Code=2 (Line) for text retrieval
Type-1 Queries.

[??? Need more general chunk mechanism for both tagged and counted
types, e.g. paragraphs are tagged, but lines are counted (each line is
"tagged" only by the presence of a newline).  This will be addressed
in the next version of the protocol.]


2.2.2 Other Markers

DowQuest will also provide elements in the User-Information field of
the Init-Response APDU indicating various non-displayable marker
fields.  These include:

        Highlight-Marker        O       ASCII
        De-Highlight-Marker     C       ASCII
        Newline-Characters      O       ASCII

If Highlight-Marker is present, De-Highlight-Marker is required.


2.2.3 Other Information Elements

WAIS targets may provide elements describing how often and when the
database is updated:

        Update-Frequency        O       [???]
        Update-Times            O       [???]
        [??? pricing info?]     O       [???]

[The format and tags of these fields is TBD.]

3. Search Facility

3.1 Search APDU

The Search APDU will be implemented as defined in the standard [3.2.2,
4.1.1.3, and Table A4].  However, the Result-Set will always be
deleted by DowQuest immediately after returning a Search-Response
APDU, so the Replace-Indicator field in the Search APDU should be
"on," an and Result-Set-Names is not used.  Search APDUs may not refer
to a Result-Set.  This enables DowQuest to be stateless.

The Type-3 Relevance Feedback Query syntax is outside the scope of the
standard.  The syntax used by DowQuest is given in Appendix A.

DowQuest will support the Type-1 Query syntax, but not for general
boolean queries.  Only searches specifying System-Control-Number (and
possibly Chunk ranges) are supported.

See Appendix B.3 of this document for an example Search APDU.


3.2 Search-Response APDU

The Search-Response APDU is almost the same as specified in the
standard [3.2.2, 4.1.1.4, and table A5], with a new type of
Database/Diagnostic-Record.  The elements used in Database-Records
[3.2.2.1.5, A.1.3.1] are specified in section 4 of this document.

The Result-Set will always be deleted by the DowQuest immediately
after sending a Search-Response APDU.

The default element set returned in each Database-Record by DowQuest
in a Search-Response APDU is "Document-Header," defined in section 5
of this document.

For records that are beyond the Medium-Set-Present-Number in the
Search APDU, DowQuest will return the "Document-Short-Header" element
set.  This will probably not happen in normal circumstances since
DowQuest returns a maximum of 16 documents.  The origin can request
the Date/Score/Headline/etc. elements by requesting a Document-
Headline element set in subsequent Search APDUs.  [??? Perhaps we
should use message-length or buffer sizes to control this, instead?]

See Appendix B.4 for an example Search-Response APDU.

4. Element Sets supported by DowQuest

The elements supported by a particular target are outside the Z39.50
standard [3.2.2.1.3].  DowQuest will support the following
Element-Set-Names.  These are used in Search and Search-Response
APDUs.  Element-Set-Names is an optional field in Search APDUs [Table
2, Table 3].

Elements marked with a "*" can only appear in a Search-Response APDU,
since the information is deleted with the Result-Set, so is no longer
available when requesting text, i.e. the text headline and code
elements should only be used with Type-1 queries.

The second column notes whether an element is Required, Optional, or
Conditional in a given APDU.

The elements and their tag values are defined in section 5 of this
document.

4.3 Document-Header

A Search-Response APDU contains one variable element:

        Seed-Words-Used         O       ASCII

The rest of this element set is returned by default for each
Database-Record in a Search-Response APDU:

        System-Control-Number   R       ANY
        Version-Number          O       integer
        Score *                 O       integer
        Best-Match *            O       integer
[???]   Lines                   O       integer
        Document-Length         O       integer
        Source                  O       ASCII
        Date                    O       ASCII
        Title                   C       ASCII
        Geographic-Name         O       ASCII


4.4 Document-Text

This element set may be returned for each Database-Record in a
Search-Response APDU in response to a Type-1 query:

        Document-ID             R       ANY
        Version-Number          O       integer
        Document-Text           R       ASCII


4.5 Document-Short-Header

This element set is returned in the Database-Record in a
Search-Response APDU for documents that are beyond the
Medium-Set-Present-Number:

        Document-ID             R       ANY
        Version-Number          O       integer
        Score *                 O       integer
        Best-Match *            O       integer
        Document-Length         R       integer


4.6 Document-Headline

This element set is returned in a Search-Response APDU when requested
in a Type-1 Query in a Search APDU for documents that were previously
returned with Document-Short-Header element sets because of size
restrictions:

        Document-ID             R       ANY
        Version-Number          O       integer
        Source                  O       ASCII
        Date                    O       ASCII
        Headline                R       ASCII
        Origin                  O       ASCII


4.7 Document-Long-Header

This element set may be optionally requested in a Search APDU to be
returned in a Search-Response APDU:

        Document-ID             R       ANY
        Version-Number          O       integer
        Score *                 O       integer
        Best-Match *            O       integer
        Document-Length         R       integer
        Source                  O       ASCII
        Date                    O       ASCII
        Headline                R       ASCII
        Origin                  O       ASCII
        Stock-Codes             O       ASCII
        Company-Codes           O       ASCII
        Industry-Codes          O       ASCII
        [??? what about more general codes, e.g. author, pricing,
        copyright?] 


4.8 Document-Codes

This element set is returned in a Search-Response APDU when requested
in a Search APDU:

        Document-ID             R       ANY
        Version-Number          O       integer
        Stock-Codes             O       ASCII
        Company-Codes           O       ASCII
        Industry-Codes          O       ASCII

6. Data Element Definitions

Begin-Date-Range is the latest date for finding documents in a query
where Date-Factor is DF_LATER or DF_SPECIFIED_RANGE.  Dates are ASCII,
of the form yyyymmdd.

Best-Match is the approximate byte offset within a document of the
highest-scoring portion of the document.

Chunk-Code specifies the size of chunks used in document regions.  The
default value is 1.  In DowQUest two Chunk-Codes are supported:
DowQuest will provide Chunk-Code=3 (Paragraph-ID) for relevance
feedback Type-3 Queries in a Search APDU, and Chunk-Code=2 (Line) for
text retrieval Type-1 Queries in a Search APDU.  Chunk-Code=1 (Byte)
is the most general case.  With this chunk size, Chunk-Marker and
Chunk-ID-Length are not used.  The origin may indicate ranges of a
document in bytes by setting Chunk-Code=1 and providing pairs of
byte-offsets in a relevance feedback Type-3 query.  Otherwise, the
origin indicates chunk ranges by specifying Chunk-Start-ID and
Chunk-End-ID.

Chunk-End-ID -- see Chunk-Start-ID.

Chunk-ID-Length specifies how many bytes Chunk-IDs will be.  In
DowQuest Chunk-ID-Length for paragraphs is 3 bytes.  The contents of a
Chunk-ID is opaque to the origin system.  The value is used unchanged
when specifying a chunk range in a relevance feedback Type-3 query.

Chunk-Marker specifies an ASCII byte sequence that will occur in the
document text as a delimiter for the start of a chunk (except
Chunk-Code=1 (Byte) which has no markers).  In DowQuest Chunk-IDs for
paragraphs are preceded by "<ESC>l" which is a two-byte Chunk-Marker.

Chunk-Start-ID and Chunk-End-ID are either Chunk-IDs (type ANY) that
were each marked with a Chunk-Marker in the text of a document
returned in a Search-Response APDU; or, if Chunk-Code=1, they are
integers containing byte offsets in the text of the document.  They
delimit the beginning and end of a user-selected relevant region of
the document to be used for a relevance feedback query.

Company-Codes contains ASCII codes describing companies that are
mentioned in a document.

Date is the ascii date a document was published (yyyymmdd).

Date-Factor is one of: 1 "DF_INDEPENDENT", 2 "DF_LATER", 3
"DF_EARLIER", or 4 "DF_SPECIFIED_RANGE".  The default is
Date-Factor=1, which specifies no special weighting of dates.  The
other 3 values specify bonus scoring for documents with dates greater,
less than, or between specified dates, respectively.  Date-Factor=2
uses Begin-Date-Range, Date-Factor=3 uses End-Date-Range, and
Date-Factor=4 uses both.

De-Highlight-Marker -- see Highlight-Marker.

Document-ID is a field that was previously returned in a
Search-Response APDU.  It is unique in the database being searched.
It must be used in a Search APDU exactly as it was returned in a
Search-Response APDU.  See Document-ID-Chunk.

Document-ID-Chunk is the same as a Document-ID element, except that it
must be followed by two or three chunk elements defining a fragment of
the document: Chunk-Code, Chunk-Start-ID, Chunk-End-ID.  Chunk-Code is
optional; if Chunk-Code is missing, the previous value of Chunk-Code
in the current APDU is used; or if Chunk-Code never appeared in this
APDU, the default value is Chunk-Code=1 (Byte).

Document-Length is the length of the entire document in bytes.

Document-Text is a portion of a document text.

End-Date-Range is the earliest date for finding documents in a query
where Date-Factor is DF_EARLIER or DF_SPECIFIED_RANGE.  Dates are ASCII,
of the form yyyymmdd.

Headline is a short ASCII description of the document for presentation
to the user.  In DowQuest it is a maximum of 160 bytes [??? is this a
requirement?].

Highlight-Marker and De-Highlight-Marker are character sequences that
precede and follow text that may be displayed with highlighting.  In
DowQuest, every searchable term is preceded by "<DC1>" (0x11) and
followed by "<DC3>" (0x13).

Industry-Codes contains ASCII codes describing industries that are
mentioned in a document.

Max-Documents-Retrieved is the maximum number of documents requested
by the origin in a Search APDU to be returned in a Search-Response
APDU.  In DowQuest the default value is 16 [??? probably should not
have a default value?].  The target may return less than
Max-Documents-Retrieved documents.

Newline-Characters indicates what characters are used at the end of
lines.  In DowQuest this is "<CR>" (0x0D).

Origin-City is an ASCII name of the city and/or country where a
document originated.

Present-Chunk-Code-Bitmap is a bitmap indicating what Chunk-Codes may
be used in a Present APDU to specify a text range of a document to be
returned.  See Search-Chunk-Code-Bitmap for its definition.  [??? This
is obsolete.  Chunk-Codes must be worked out more completely.]

Score is a measure of how well the document matched the query.  It may
be any integer value.  [??? We may need to define a valid score range
to be used by all targets, or add a field in the Init-Response APDU to
specify the range for the current target.]

Search-Chunk-Code-Bitmap is a bitmap indicating what Chunk-Codes may
be used in a Search APDU query to specify a range of a document.  The
bitmap indicates with a "1" in a bit position that the corresponding
code number will be accepted by the target system.  For example, to
indicate that the target accepts accepts Chunk-Codes 1 and 3 in a
Search APDU it would return Search-Chunk-Code-Bitmap with bits 1 and
three set to 1 and all other bits 0.

Seed-Words is a text string containing the initial seed words in a
relevance feedback Type-3 query.

Seed-Words-Used is the same format as Seed-Words except it contains
only words that actually matched some documents in the database.  This
allows the user interface to give the user feedback about which seed
words were effective in a query.

Source is an ASCII string identifying the original source of a
document (e.g. newspaper name, journal title, etc.)

Stock-Codes contains ASCII stock ticker codes for companies that are
mentioned in a document.

Text-List is a list of text strings that are provided by the user.
They are document fragments that come from outside the DowQuest
database which the user wants to use in a search.  They are processed
in the same manner as seed words except they are not given seed word
weight bonuses.  **This would be a new feature of a query within
DowQuest, and would require changes to the Query Server as well as the
User Server portion of DowQuest.  It will not be implemented for the
June '90 prototype.

User-Information-Length is the length of the entire user information
part of an APDU when it consists of more than one element.
User-Information-Length does not include itself in the length.

Version-Number is used to validate a local copy of a document's text.
If a document is modified in the target server, its Version-Number
must be incremented.  If a document may not be cached, Version-Number
is set to 0.  The default value is 0.

5.1 Tag Values of the Data Element

This table is an extension to the table 19 in section 4.1.3 of the
standard.

Element                 Tag     PDU                     R/O/C
_____________________________________________________________

User-Information-Length[???] 99 Init-Response           C
                                Search                  C
                                Search-Response         C
Chunk-Code              100     Search                  O
Chunk-ID-Length         101     Init-Response           C
Chunk-Marker            102     Init-Response           C
Highlight-Marker        103     Init-Response           O
De-Highlight-Marker     104     Init-Response           C
Newline-Characters      105     Init-Response           O
Seed-Words              106     Search                  C
Document-ID-Chunk       107     Search                  O
Chunk-Start-ID          108     Search                  O
Chunk-End-ID            109     Search                  C
Text-List               110     Search                  O
Date-Factor             111     Search                  O
Begin-Date-Range        112     Search                  O
End-Date-Range          113     Search                  C
Max-Documents-Retrieved 114     Search                  R
Seed-Words-Used         115     Search-Response         O
Document-ID             116     Search                  O
                                Search-Response         R
Version-Number          117     Search-Response         O
Score                   118     Search-Response         O
Best-Match              119     Search-Response         O
Document-Length         120     Search-Response         R
Source                  121     Search-Response         O
Date                    122     Search-Response         O
Headline                123     Search-Response         C
Origin-City             124     Search-Response         O
Search-Chunk-Code-Bitmap  125   Search                  O
Present-Chunk-Code-Bitmap [???] 126 Search              O
Document-Text           127     Search-Response         R
Stock-Codes             128     Search-Response         O
Company-Codes           129     Search-Response         O
Industry-Codes          130     Search-Response         O

Appendix A. Type-3 Query (Relevance Feedback)

Query syntax is not part of the Z39.50 specification, but a Type-1
query is suggested in Appendix B of the standard for Boolean queries.
This is a similar suggestion for relevance feedback queries.

The Type-3 Query supports the relevance feedback style of database
query (as provided by DowQuest).  The Type-3 query includes the
following elements:

        Seed-Words              R       ASCII

        Document-ID             O       ANY     (see Note 1 below)
        Document-ID-Chunk       O       ANY     (see Note 2 below)
          Chunk-Code            O       binary
          Chunk-Start-ID        C       if Chunk-Code=1, binary
                                        else ANY
          Chunk-End-ID          C       if Chunk-Code=1, binary
                                        else ANY

        (may repeat Document-ID and Document-ID-Chunk elements)

        Text-List               O       ASCII   (Not in DowQuest)
        Date-Factor             O       integer
        Begin-Date-Range        C       ASCII
        End-Date-Range          C       ASCII
        Max-Documents-Retrieved R       integer

Note 1: There may be any number of Document-ID and Document-ID-Chunk
elements in a Type-3 Query, intermixed.

Note 2: Each occurrence of a Document-ID-Chunk element must be
followed by two or three chunk elements, defining a fragment of the
document.

Appendix B. Sample APDUs in WAIS Demonstration System

In the following, binary values are shown in hexadecimal preceded by
0x.  Variable fields include a tag and length [see A.1.2.1, A.1.2.2,
and Table 19].  See section 5.1 of this document for tag values for
WAIS elements.


B.1 Init APDU

[see Table 7, Table A2]

ITEM                            BYTE POS.       VALUE           NOTE
______________________________________________________________________
Header-Length-Indicator         1-2             0x0015          21
Header:
  Fixed portion:
    PDU-Type                    3               0x14            20
  Variable Portion:
    Protocol-Version            4-6             0x030101        1
    Options                     7-9             0x0401C0        bit 1,2
    Preferred-Message-Size      10-13           0x05020400      1024
    Maximum-Record-Size         14-17           0x06020800      2048
    Reference-ID                18-23           0x020400000001  1
User information part:
    (none)


B.2 Init-Response APDU

[see Table 8, Table A3]

ITEM                            BYTE POSITION.  VALUE           NOTE
______________________________________________________________________
Header-Length-Indicator         1-2             0x0025          37
Header:
  Fixed portion:
    PDU-Type                    3               0x15            21
    Result                      4               0x01            1="accept"
  Variable Portion:
    Protocol-Version            5-7             0x030101        1
    Options                     8-10            0x0401C0        bit 1,2
    Preferred-Message-Size      11-14           0x05020400      1024
    Maximum-Record-Size         15-18           0x06020400      1024
    Implementation-Name         19-28           0x0908"DowQuest"
    Implementation-Version      29-33           0x1003"1.0"
    Reference-ID                34-39           0x020400000001  1
User-Information-Field          40-42           0x??0217        ??
    Search-Chunk-Code-Bitmap    43-45           0x7D0140        bit 2
    Present-Chunk-Code-Bitmap?? 46-48           0x7E0180        bit 1
    Chunk-Id-Length             49-51           0x650103        3
    Chunk-Marker                52-55           0x66021B6C      "<ESC>l"
    Highlight-Marker            56-58           0x670111        "<DC1>"
    De-Highlight-Marker         59-61           0x680112        "<DC2>"
    Newline-Characters          62-65           0x69020D0A      "<CR><LF>"


B.3 Search APDU

[see Table 9, Table A4]

B.3.1 Example query containing only Seed-Words element (no
      Document-ID):

ITEM                            BYTE POSITION.  VALUE           NOTE
______________________________________________________________________
Header-Length-Indicator         1-2             0x0018          24
Header:
  Fixed portion:
    PDU-Type                    3               0x16            22
    Small-Set-Upper-Bound       4-6             0x000400        1024
    Large-Set-Lower-Bound       7-9             0x000800        2048
    Medium-Set-Present-Number   10-12           0x000800        2048
    Replace-Indicator           13              0x01            1="on"
  Variable Portion:
    Result-Set-Name             14-15           0x1100          ""
    Database-Names              16-17           0x1200          ""
    Query-Type                  18-20           0x130133        "3"
    Reference-ID                21-26           0x020400000002  2
User-Information-Field          27-29           0x??0224        36
  Type-3 Query:
    Seed-Words                  30-62           0x6A1F"Tell me about
                                                Thinking Machines"
      Max-Documents-Retrieved   63-65           0x720110        16
[??? remove this field; use Small-Set-Upper-Bound or something...]


B.3.2 Example query containing Seed-Words, one Document-ID and
      one Document-ID-Chunk element.  This query includes seed word
      "Apple," and specifies using all of document 00000001WJ in the
      search, and paragraphs with IDS 005 through 007 from document
      00000023WJ:

ITEM                            BYTE POSITION.  VALUE           NOTE
______________________________________________________________________
Header-Length-Indicator         1-2             0x0018          24
Header:
  Fixed portion:
    PDU-Type                    3               0x16            22
    Small-Set-Upper-Bound       4-6             0x000400        1024
    Large-Set-Lower-Bound       7-9             0x000800        2048
    Medium-Set-Present-Number   10-12           0x000800        2048
    Replace-Indicator           13              0x01            1="on"
  Variable Portion:
    Result-Set-Name             14-15           0x1100          ""
    Database-Names              16-17           0x1200          ""
    Query-Type                  18-20           0x130133        "3"
    Reference-ID                21-26           0x020400000003  3
User-Information-Field          27-29           0x??0230        48
  Type-3 Query:
    Seed-Words                  30-36           0x6A05"Apple"
    Max-Documents-Retrieved     37-39           0x720110        16
[??? remove this field; use Small-Set-Upper-Bound or something...]
    Document-ID                 40-51           0x740A00000001WJ
    Document-ID-Chunk           52-63           0x740A00000023WJ
    Chunk-Code                  64-66           0x640102        paragraph
    Chunk-Start-ID              68-72           0x6C03"005"     par ID=005
    Chunk-End-ID                73-77           0x6D03"007"     par ID=007


B.4 Search-Response APDU

[see Table 10, Table A5]

ITEM                            BYTE POSITION.  VALUE           NOTE
______________________________________________________________________
Header-Length-Indicator         1-2             0x0014          20
Header:
  Fixed portion:
    PDU-Type                    3               0x17            23
    Search-Status               4               0x00            0="success"
    Result-Count                5-7             0x000002        2
    Number-of-Records-Returned  8-10            0x000002        2
    Next-Result-Set-Position    11-13           0x000000        0
  Variable Portion:
    Present-Status              14-16           0x1B0100        0="success"
    Reference-ID                17-22           0x020400000002  2
User-Information-Field          23-25           0x??01DD        221
    Seed-Words-Used             26-44           0x7311"Thinking Machines"
  Database records:
    Document-Header element set:
      Document-ID               45-58           0x740C"0000000001WJ"
      Version-Number            59-61           0x750100        0
      Score                     62-67           0x760400000022  34
      Best-Match                68-77           0x77080000000000000001
      Document-Length           78-87           0x78080000000000000033
      Source                    88-92           0x7903"WSJ"
      Date                      93-100          0x7A06"900601"  yymmdd *
      Headline                  101-109         0x7B11"TMC Releases WAIS"
      Origin-City               110-124         0x7C0D"Cambridge, MA"

      Document-ID               125-138         0x740C"0000000123ZF"
      Version-Number            139-141         0x750100        0
      Score                     142-147         0x760400000015  21
      Best-Match                148-157         0x7708000000000000006E
      Document-Length           158-167         0x78080000000000000121
      Source                    168-182         0x790D"Business Week"
      Date                      183-190         0x7A06"900603"
      Headline                  191-211         0x7B13"Apple Releases WAIS"
      Origin-City               212-226         0x7C0D"Cupertino, CA"

(*) A Date element should actually be yyyymmdd


Appendix C. DowQuest Code Formats


C.1 Company Codes

[??? TBD]


C.2 Industry Codes

[??? TBD]


C.3 Stock Codes

[??? TBD]

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.