Source Code to Parse
|
Output Tokens
|
(Aug 31, 2021)
Jane's purpose is to have knowledge of all existing technologies. So knowledge of text and binary formatted information structures is one of the foundation blocks of Jane. Part of every application is to be given instructions and information in either or both of these forms. The binary structures is already handled by Jane. Text based information handling starts with tokenizing the text into known units of information. So these examples represent some of the popular computer languages and formatted text based information structures and producing an array of tokens are to be processed by an application. I will not explain how these are used here, only to say they produce all of our programs, display instructions, mathematics, and our databases. In essence Binary and Text represent the format of the working information and actions of our computers.
Things I Learned About Computer Languages
(so far)
Conclusion
The current compiler technology path is a dead end. The existing compilers are fixed. "This is software people, we can do anything we want". I will divert from this path by writing my own editor. Write in words, phrases, clauses, sentences and paragraphs. Change from a "character" based technology to a "word" and "term" based technology. All of this starts from the parser. Get rid of reserved words. The parser technology should be a basic capability of the compiler, permitting user defined syntax parsing, and language extensions. I will move away from the current "function" based programming paradigm , and move to table driven logic, and natural language instructions. In these examples I used a table driven approach to call specialized code. I have a program to generate the code required to call the functions. The output code is one large "case" statement. The parsers can be changed during compile time. I will define some syntax that will perform this operation, probably something like "Compiler, FORTRAN follows."
The approach to parsing text is infinite. But I found that in these twenty some, popular languages, that it is easier to hard code the logic instead of using a scripting language approach (i.e. Backus-Naur Form, or a RegExp syntax analyzer). There was only a few thousand lines of code, and made it easy to debug and fix. There are so many special cases that a fallback to special case handling would be required anyway. It really does not matter which approach is used, all of this is independent. The system will may wind up using any number of approaches in the future.