ftp.nice.ch/peanuts/GeneralData/Usenet/news/1989/CSN-89.tar.gz#/comp-sys-next/1989/Aug/NeXT-Database-Prowess

This is NeXT-Database-Prowess in view mode; [Up]


Date: Sun 14-Aug-1989 23:45:09 From: Unknown Subject: NeXT Database Prowess Folks, There is a ton of interest on my campus surrounding the NeXT machine. Surprisingly, or not suprisingly, depending on your view-point, much of the interest comes from NON-technical fields. I have a question that I've seen many people dance around on this net but no one clearly address. Many people have seen the "Complete Works of Shakespear" go through its paces. The next (no pun) step, obviously, for many, is too put the subject matter of their interest "on-line" in a similar fashion. I have two specific requests to pass on by way of example: In one case, the entire written works of Sigmund Freud have been entered electronically (really! the department even got a grant from NEH to do it. They've spent almost $50,000 so far in Kurzweil time!)...they'd like to have that database "NeXTized" or whatever the process is called. A similar situation is for a unit studying the works of Plato. Although the Plato project is not nearly as far along, both projects are VERY interested in the technology. So, the querries: What exactly is the process going on "under" the Shakespear icon, it can't be just a glorified fgrep. How does the cube, burdened with the unix file-system, get such good recall on that large database? Is there a way Cornell could send the disk data to NeXT, or even a third party, and have them put the data on an OD with the proper cross-indexing? We'd want to do the front-end ourselves in IB (obviously, where's the fun without that chance? :-) ) Is there a way to licence the underlying software that drives such cross-referenced databases? Is this a NeXT-developed technology or third party? Obviously the potential is great for any field to have their "hot topics" ready and on-line in such a fashion. Will it be part of a future OS release. Maybe something like AppKit only this would be called DataBaseKit? Please, we're sincere here and the money is (sort of) there or can be found. If anyone has any info please pass it along or, if you could, direct me to someone who is in the know. Thanks in advance. Reply to: Roger Jagoda System Analyst Cornell University Snail Mail: 220 Cornell Comp. Cent. Cornell University, Ithaca, NY 14853 AT&T: (607) 255-8960 >From: z8my@vax5.CIT.CORNELL.EDU
Date: Sun 15-Aug-1989 05:33:00 From: Unknown Subject: Re: NeXT Database Prowess In article <19350@vax5.CIT.CORNELL.EDU> fqoj@vax5.cit.cornell.edu () writes: > ...they'd >like to have that database "NeXTized" or whatever the process is >called. A similar situation is for a unit studying the works of Plato. >What exactly is the process going on "under" the Shakespear icon, it >can't be just a glorified fgrep. How does the cube, burdened with the >unix file-system, get such good recall on that large database? There is a fairly straightforward implemetation of inverted indices. That is, keywords are sifted out from the original text, sorted, and hashed. When the digital librarian looks for a word, it has three files (set up previously) that are exceedingly fast to search, due to the way they are arranged (hashed and sorted). Once they are found there, the keywords reference the individual files and locations of the original text. And actually, it is possible to turn off the indexes and use fgrep, which is necessary to search for certain sophisticated patterns (parts of words) that the indexes can't handle. This is similar to the REFER database system already implemented on any Berkeley (and maybe sys V, yo no se). It is a bit more sophisticated, as there are systems for indexing multitudes of different kinds of files, and more information is available about the objects searched after a key is found and before it is looked up. >Is there a way Cornell could send the disk data to NeXT, or even a third >party, and have them put the data on an OD with the proper >cross-indexing? Not necessary. Just drag the folder (i.e. directory) from the directory browser to an empty icon well in the Digital Librarian. You can index the files from a menu selection (I forget which), but be careful, as DL has a bug that makes it think it has an indexed directory when it isn't really indexed. >We'd want to do the front-end ourselves in IB >(obviously, where's the fun without that chance? :-) ) I am planning on immediately starting a similar project. Perhaps we should share our work. I need to expand on the capabilities that DL just doesn't provide (diplay troff text properly). >Is there a way to licence the underlying software that drives such >cross-referenced databases? Is this a NeXT-developed technology or third >party? Obviously the potential is great for any field to have their "hot >topics" ready and on-line in such a fashion. Will it be part of a future >OS release. Maybe something like AppKit only this would be called >DataBaseKit? You already have the software. There are a bunch of poorly documented function calls (well, not all THAT poorly documented) to handle all the indexing stuff. It is not objective C, but just the ordinary stuff. Search in the digital librarian for "index" and "indexing" under the release notes and the manual pages, and you'll find all sorts of stuff. I recommend you start from a terminal with "man 1 index", and follow the cross references. I responded here because I thought some of this stuff is of general interest, but I would really like to work with you, as I think we could help each other out a lot. Please send mail. | Dan Zerkle home:(805) 968-4683 morning:961-2434 afternoon:687-0110 | | dz@cornu.ucsb.edu dz%cornu@ucsbuxa.bitnet ...ucbvax!hub!cornu!dz | | Snailmail: 6681 Berkshire Terrace #5, Isla Vista, CA 93117 | | Disclaimer: If it's wrong or stupid, pretend I didn't do it. | >From: jordan@Morgan.COM (Jordan Hayes)
Date: Sun 15-Aug-1989 15:07:40 From: Unknown Subject: Re: NeXT Database Prowess One other point about indexed files. Remeber that the index stuff takes roughly as much disk space as the original material, so that if you have 50 MB of Freud, then you'll need about 50 MB to strore the indexes, too. Also, perhaps we should call this software something else. In some quarters, "database" refers to the management of more structured material comprising "entities" which have "attributes" (like Employee == name, age, salary, ssn ). Since NeXT has (will have) such database capabilities, it is confusing to call the Webster's and Shakespeare capabilities "database", too. I suggest we call them "Information Retrieval" utilities. >From: chari@nueces.UUCP (Christopher M. Whatley)
Date: Sun 15-Aug-1989 20:24:31 From: Unknown Subject: Re: NeXT Database Prowess In article <89227.110740UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes: >Also, perhaps we should call this software something else. In some quarters, >"database" refers to the management of more structured material comprising >"entities" which have "attributes" (like Employee == name, age, salary, ssn ). >Since NeXT has (will have) such database capabilities, it is confusing >to call the Webster's and Shakespeare capabilities "database", too. What is wrong with "free-form database" and "relational database". That is what you have with "index" and Sybase SQL. >I suggest we call them "Information Retrieval" utilities. Gee, is seems like I just retrieved some information from Fourth Dimension awhile ago and that I just made a mod.recipes database with "index" a few days ago. Confusing?!?
Date: Sun 21-Aug-1989 17:30:17 From: Unknown Subject: Re: NeXT Database Prowess Just as a comment: Is it just me, or do more people out there think that there could have been a more logical way for NeXT to have bundled the various databases it maintains? Granted, a dictionary is a special kind of database (the keys are generally obvious), but why have the quotations in a separate application from the Shakespeare (which is effectively the complete collection of quotes of The Bard ) ? Couldn't we benefit from the techniques used by the Webster application? I sent a message to NeXT asking for any documentation on the internal structure of the Webster database and got a (very nice) reply saying in effect that this was "private" to Webster. My reaction was one of mild amusement since, as far as I'm concerned it's naive to think that this information won't soon be available (if it isn't already). As a side project, I was thinking of doing something similar to Webster, but using the KJV of the Bible as the text, as an academic exercise. Thus you could include some of the more well known works of art with a religious theme as the ``pictures'' (I like art history). This isn't a flame (well, not really), just a suggestion that the various databases could have been put together in a more effective way. ----------------------------------------------------------------------------- Alan Emtage, "It's currently a problem of access to McGill University,CANADA gigabits through punybaud." - Licklider listmaster@cs.mcgill.ca ----------------------------------------------------------------------------- >From: hollombe@ttidca.TTI.COM (The Polymath)
Date: Sun 23-Aug-1989 10:16:17 From: Unknown Subject: Re: NeXT Database Prowess In article <1445@opus.cs.mcgill.ca> bajan@opus.UUCP (Alan Emtage) writes: >Couldn't we benefit from the techniques used by the Webster application? >I sent a message to NeXT asking for any documentation on the internal >structure of the Webster database and got a (very nice) reply saying in >effect that this was "private" to Webster. My reaction was one of mild >amusement since, as far as I'm concerned it's naive to think that this >information won't soon be available (if it isn't already). The "Webster database" is an image of the typesetter tape. No one's pulling a fast one here--NeXT's indexing is flexible enough to work on a variety of file formats in addition to straight ASCII. NeXT gave you an honest answer; you just asked the wrong question. >As a side project, I was thinking of doing something similar to Webster, >but using the KJV of the Bible as the text, as an academic exercise. When NeXT did their first major demonstrations on our campus, they had (some version of) the Bible online, fully indexed and searchable in the Digital Librarian. No pictures, though. -=EPS=- >From: weltyc@cs.rpi.edu (Chris Welty)
Date: Sun 23-Aug-1989 05:55:00 From: Unknown Subject: Re: NeXT Database Prowess /* Written 12:30 pm Aug 21, 1989 by bajan@opus.cs.mcgill.ca in uxa.cso.uiuc.edu:comp.sys.next */ >As a side project, I was thinking of doing something similar to Webster, >but using the KJV of the Bible as the text, as an academic exercise. Thus >you could include some of the more well known works of art with a >religious theme as the ``pictures'' (I like art history). /* End of text from uxa.cso.uiuc.edu:comp.sys.next */ As a bit of a bible collector, may I suggest using one of the many besides KJ? I don't mean to offend, start a religious war, or a crusade, but the mis-translations in KJV seem to make it one that should be avoided. Two better versions that I hope you might consider are either NIV or better yet New Oxford. Both of these used the original Hebrew and retranslated. The New Oxford even comes with the Apocrypha. Michael Rutman >From: jpd00964@uxa.cso.uiuc.edu

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Marcel Waldvogel and Netfuture.ch.