Information Format

Data Structures for all information will be structured purely for the purpose of management, editing, storage, compiling, and retrieval by "Jane". This is my initial design for every piece of information maintained by Jane.  The primary structure is that:
  • All values are text based
  • All words, terms, phrases, and characters are externally stored
  • All values can have
    • A type value attached (by index)
    • A value indexed by type
    • A format indexed by type
    • A Units / Classification indexed by type
  • All values are variable length

Jane shall store and index every word, phrase, and character.  Jane shall maintain a universal dictionary of words, phrases and characters. Jane shall create and maintain user dictionaries when requested. All documents will maintain an internal dictionary, or may opt to have an internal global dictionary added.

  4 Bit value Introduction 4 bit byte length of "Index"    index / length of "length"  length  value
1 0001 (value) 0001 00000001 0011 1100 ... phrase "internal text"
2 0001 (value) 0001 00000001 0010 0100 ... word "clif"
3 0001 (value) 0001 00000001 0011 0011 ... format "proper case, blue, bold, onclick..."
4 0001 (value) 0001 00000001 0001 1101 ... units "user's first name"
5 0010 (index of type) 0001 00000001    
6 0011 (value indexed by type) 0001 00000010    
7 0110 (format indexed by type) 0001 00000011    
8 0111 (units indexed by type) 0001 00000100    
9 1111 (terminator)        


The only limitation is that a type index length is restricted to an integer length of 15 bytes or 120 bits.  This is a large number of data types.

 Jane must have complete knowledge about all information that she is asked to manage. Each value has a real world purpose. This purpose must be known to Jane, and managed in the best possible way. So we are looking at the creation of information in a way that reduces the overhead in its creation. The overhead in parsing, and the overhead in its visual display and hard copy logic.  Also the search and replace logic must be universal for all knowledge known about every word, character, number using its data type, format, and classification.

find first proper case user's first name clif that is bold and not underlined with an on click event

This statement is parsed into indexes and therefore the search is optimized to extremely simple index compares. Replacement is also optimized.

find type index 1 value index 2 and units index 4 and (format value containing index 34 and 25 and not 22)

This format is optimized for compiler parsing, storage and transporting. No compression scheme should be necessary, especially if we utilize global dictionaries. Even with using internal dictionaries, compression should be comparable to the best compression schemes that we now utilize.