May 2020
Issue #3
May 2020 Issue #4
Collins Software's Newsletter
June 2020
issue
Searching for stuff...
Software development and specifically compilers, the tools we program in, create all of our search capabilities.

Searches are performed by index, identifier, or content. The result can be any number matches. The types of information needing to be searched can be any digital structure. The most common structures are simple text, disk file hierarchical structures, databases, or program memory.

In programming we generally use only one type of search, which is full context matches.  That is, we test every data structure for a match. This is of course is very inefficient but due to the effort involved in planning and development. Complex searches of this type are seldom implemented by the average programmer.


Faster Searches:
There are ways of speed up searches.  Partitioning, pre-search into indexes, and Classifications. All of which simply make it so that not all entries have to be searched.
  • Partitioning -- a physical separation using part of the search key. (i.e. separate invoices by month)
  • Indexing -- binary trees and similar logic
  • Cataloging -- division by subject matter

Jane will implement all three of these methods.  Each of these search types have many variations which deal with the specifics of a given application.


"I yearn not for the easy path, but for the right path. For 'easy' and 'right' are rarely compatible." -- Craig D. Lounsbrough

"Wrong does not cease to be wrong because the majority share in it." -- Leo Tolstoy

"The key to being a prolific discoverer is not to run with the pack." -- Steven Magee

"How come we never know what what we want until we find it?" -- Kate McGahan

"Search engines are for finding things I know exist, Libraries for things I do not." -- Clif

"The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge." -- Daniel J. Boorstin

"The hardest thing of all is to find a black cat in a dark room, especially if there is no cat." -- Confucius

 

Cataloging:
Each value in Jane will have three parts: The value itself, units, and classification. Generally we use "units" as an attribute of a number, and classification as an attribute of "a body of work". A body of work, in computer terms is generally a file, or portions of a file. Jane shall consider every value a body of work.

A = 5' as height;  units = feet, class = height

B = c:/documents/coronavirus.pdf
as Medical.Research.Virus:corona-19;

class  = "RMV:corona-19 by John Hopkins, Published May 5, 2020, Topic: Safety and Health Tips";

The R (research) and M (medical) are swapped by the system. This is the dynamic search partitioning logic of the cataloging system. The call letters are universally known. The specifics ":corona-19" is a qualifier to reduce the number of fixed subject headings, and to permit infinite classifications. The Publication Date, Publisher, Author, Topic, Illustrations, Unique Catalog ID, and Pages. Basically the information we need to locate a body of work with only a vague idea of what we are looking for. The Jane retrieval can go further by search the content of the records found by classification.

The following is an example of a catalog search augmented with a content search. The result contained in "A" is a list of all catalog entries found in the local web site. 

A = http://localhost/[MRV:corona-19 last 30 days] where the content contains the word "Safety";


Browsing:
The result of a catalog search should be the same as browsing the shelves of a library. You have complete access to every piece information. You can think of it as a file system with each folder being a subject (as in "Research/Medical"), all the other folders are collapsed (Research/History, Research/Math,...)

The big difference in Jane is that the order of classification is independent of its hierarchical order. So you can reverse the order "Medical/Research" the sibling folders will then be "Medical/Anatomy, Medical/Organisms,..." , which the system will assume you are more interested in Medical than in Research and provide a listing appropriate to your given interest.  So you can browse related material you many not know even exists.

Searching for "Science Fiction" on a search engine result in 300 million results, with access to a few hundred random results. With cataloging we could limit the result to just books, organized by author, sorted by title, publisher, date, or topic. It would also be able to collapse all redundant entries. (people its software...)

 
Jane Search Engine, using a Catalog System:
Jane will have web search engine built upon the Jane Cataloging System.  Looking at the current term based search engine we see huge failure in being able to obtaining meaningful content.  The current classification scheme is to place all web pages from a site into "Commercial", "Organization", "Government", "Education", which are defined by the domain of the website, which is no indicator to a web page's subject matter.

The cataloging system Jane will be able to partition content into areas of interest and provide filtered results. The classifications would divide each web page by subject matter. Jane will also define the classification for every file, database table, and programming variable.

The classification search scheme will be divided (somewhere) along the lines of universally known subject headings and subject matter specific subject headings. An example: "medical:M" as a universal classification and NLM classifications would take over from here. This immediately causes a major problem of any hope of a stable classification system, too many cooks.  


conclusion:
Not knowing that something exists is a terrible waste. Weather it be customers, equipment, vendors, parts, research, inventions, contracts, patents, applications, or a million other beneficial pieces of knowledge, we need this information to run a business. What we need is an organized list by subject matter. The list should be everything, starting at the group of items that you are interested in. Organization cannot be derived.

Search engines are a good example of word search, which does not work. It will give you specifics but not general collaboration on a body of knowledge.  The difference is that a search engine give you a selection of items by content word match, a catalog system returns all items and place you at the location were your subject matter items can be found.

The Jane cataloging system is designed to be universal, integrated into the compiler, operating system, databases, application, file system, and totally transparent to the general users. The catalog system being software, the amount of logic from a set of well engineered fact for organization will provide everything missing is our current technology, which is to take the take knowledge out of standalone applications and make it known everywhere.

Author: Clif Collins

http://CollinsSoftware.com
Houston, Texas
May 15, 2020

email: web1@collinssoftware.com