SCOUG-Programming Mailing List Archives
Return to [ 22 | 
May | 
2005 ]
 
 
 
Content Type:   text/plain 
Thanks to an email from Greg Smith in the morning with   
respect to GCC 4.0 I felt an reprieve from presenting anything   
on LEX and YACC.  I needed one.  I was ill-prepared to do the   
topic justice.  
 
Next month Greg Smith will introduce the group to GCC 3.x   
and the Watcom C compiler.  He hopes to take a LINUX open   
source program and recompile it for OS/2 using both compilers.    
While GCC is the odds on favorite in the open source   
community, the Watcom package (as opposed to compiler   
only) offers a better, more comprehensive development and   
debugging environment.  
 
Besides as a group we need to take on one or more open   
source projects for OS/2.  Two that come to mind of particular   
importance are GCC 4.0 and Mozilla.  As the Watcom compiler   
is also open source, if we do the one (GCC 4.0), we should   
have the means to do the other as well.  That assistance   
should come as a blessing to the sole developer of Watcom at   
the moment, who is threatening to stop.  
 
So instead of LEX and YACC, about which I have minimal   
knowledge, I chose to present on something which I did know   
something about as its author: the Data Directory/Repository   
(DDR).  In this forum we have argued about language   
decisions.  Getting bogged down in that doesn't go to the   
heart of the problem in software development and   
maintenance: its ever spiraling cost, longer maintenance   
cycles, and inability to reduce (if not eliminate) backlogs of   
change requests.  
 
That's not a language issue, though language choice does   
have some importance, it's a people problem.  It's a people   
problem to develop software, to maintain it (the larger   
problem), and more importantly to document it.  More to the   
point it's a people productivity problem, one not addressed by   
the choice of a language or compiler.  
 
I operate under the general maxim (as it comes closer to my   
name of Maxson) that productivity improvements come   
directly from process improvements.  In the IT profession we   
have through software increased client productivity, reduced   
per transaction costs, through process improvements based on   
"letting users do what software cannot and software what   
users need not".  In short shift as much clerical effort as   
possible to software.  Reduce clerical effort.  
 
As the primary clerical effort in IT is writing, creation and   
maintenance of source code and text, we can attempt to   
reduce it in two ways: reduce the total people writing and   
reduce the total different writing.  That reduction reaches a   
minimum with one source code language and one source text   
language.  Then using software assistance with both sources   
to create (generate) and maintain all reportable results, e.g.   
UML ("Unified, not "Universal", Modelling Language)   
documentation, user guides, programming reference manuals.   
web-based tutorials, etc.  
 
Now how do you get that "software assistance" that makes   
this possible?   The Data Directory/Repository (DDR).  Having   
watched a previous attempt fail in the IBM earlier project of   
AD/Cycle (Application Development/Cycle) wherein IBM and   
the rest of the "cooperating" vendors expended millions of   
man-hours and hundreds of millions of dollars, I want to offer   
you a more complete solution to anything that exists as well   
as lower cost.  
 
To do that I need to introduce you to A. Upton, author of   
"Design for Thinking" and to Richard Samson workbook   
co-author with Upton of "Creative Analysis" and sole author   
of "The Mind Builder".  Specifically within this set of work the   
presentation on analysis.  
 
Upton and Samson offer us three possible types of analysis:   
classification (taxonomy), structure, and operation (process).    
Three.  Three only covering all possible analysis.  These three   
all have a basis on the question(s) you can ask about some   
"thing".  
 
To understand the questions you can visually represent the   
"things" either in your mind or in a physical drawing by a large   
circle which contains within it three non-overlapping smaller   
circles.  Do you have that picture in mind?  OK.  
 
In a classification analysis we can start with a smaller circle   
and ask "What is this a kind of?", i.e. to what larger circle (or   
circles) does it belong?  Or inversely we can ask, "What are   
the kinds of this?", i.e. what smaller circles does it contain?    
This is classical taxonomy of biology.  
 
Similarly in structure analysis we can ask, "What is this a part   
of?" or "What are the parts of this?".  Continuing on to   
operation analysis, "What is this a stage of?" or "What are the   
stages of this?".  
 
Now note here that our visual representation, in your mind or   
on paper, has to different "levels of abstraction", a higher   
level (the larger circle) and a lower (the smaller circles).    
Each of these can themselves be "contained" in a higher level   
or can "contain" lower levels.  In any case we have a classic   
set (higher level) and member (lower level) relationship.  
 
More importantly all types of analysis (classification,   
structure, and operation) have this basic "hierarchical"   
relationship in common.  That means we need only one   
method, one process, one means of definition for this   
"common" feature.  
 
What separates the three types, what separates classification   
analysis from structure and operation analyses is that in   
classification analysis the order of the lower levels of   
abstraction is unimportant (unordered) while in structure and   
operation analyses order is important.  
 
Thus to handle all possible classification analyses the   
"common" method suffices.  To handle all possible structure   
and operation analyses we need an additional "method" which   
allows us to depict order among the lower levels of   
abstraction (the smaller circles).  
 
Now we cannot refer to anything in the directory or store it in   
the repository without giving it a "name" either user- or   
software-defined.  If we implement our DDR as a database,   
specifically a relational database, then we need have only a   
single table with a row for each "unique" name.  This brings   
up our first problem to resolve: homonyms (same name,   
different referents, i.e. things, level of abstraction)  
 
To solve this we borrow from among many sources, but   
particularly from General Semantics, the use of indexing.  This   
means that any "stored" (unique) name has two parts, a   
proper name (text) part appended with an index part.  For   
purposes here we will assume a proper name part of 16 bytes   
and an index part of 4 bytes, thus a 20-byte unique name.  
 
That's our first table though we have some further definitions   
(attributes) to add later.  We use a second table to store our   
"common" hierarchical (two-level of abstraction) relationship   
that exists among the three types of analyses.  Each row in   
our table with contain the unique name of the higher level   
appended in turn with each unique name of the lower levels,   
one row for each pairing.   Thus each row with have 4 fields   
and total 40 bytes with the entire row acting as its "primary"   
key.  
 
Now we need to define a table to depict the ordering which   
occurs among the lower level of abstractions (smaller circles)   
within the higher level (larger circle).  Here with will have the   
higher level unique name appended by the unique name   
pairing (one row for each such pairing) of the lower levels of   
abstraction in their predecessor, successor order.  This   
includes when a lower level is "paired" with itself   
representing "recursion".  This row then contains a sequence   
of three unique names, 60 bytes in length, all of which make   
up the "primary" key for that table.  
 
Now these three tables--unique name table, hierarchical   
table, order table--allow us to represent all possible results   
of classification, structure, and operation analyses.  Our two   
component (proper text name, index) unique name allows us   
to resolve homonyms (same (proper) name, different   
referent).  Now we need an additional table to resolve   
synonyms (different names, same referent).  
 
To do that we need to agree that only one of the possible   
unique names refers to our referent.  With this determined   
then we need to define a synomym or "alias" table.  Each row   
of that table will consist of a "synomym(alias)" unique name   
appended by the "real" unique name, a row for every such   
pairing.  Again the entire row of two unique names make up   
the "primary" key for that table.  
 
So now we have defined four (4) tables.  These allow us   
access to every unique and "proper text" names; to the   
two-level hierarchical relationships common to classification,   
structure, and operation analyses; to the ordered relationships   
with structure and operation analyses; and finally to resolving   
issues related to homonyms and synonyms.  Four tables.  
 
So far we have not stored, only named, referents and   
references.  We have created the tables necessary for storing   
their relationships depending upon the type of analysis in   
effect.  Before we do that we need to borrow a page from   
the manufacturing industry.  
 
In manufacturing we deal with only two types of referents:   
raw material and assemblies.  Each of these represent a "level   
of abstraction".  In this scheme we have only one lowest level   
of abstraction, that of "raw material".  All higher levels must   
ultimately resolve to these lowest level elements.  
 
Assemblies themselves may contain other assemblies and raw   
material.  Thus higher level assemblies may contain lower   
level assemblies or lowest level raw material.  Of course the   
same applies for lower level assemblies (except lowest level)   
as for higher level.  Every level from the highest assembly   
under consideration to the lowest (raw material) exists as a   
different level of abstraction.  
 
The tables we have defined thus far allows us to represent   
all levels of abstraction from the highest to the lowest.  Each   
level represents either a named referent, e.g. data element   
(raw material) or aggregate (assembly), or a named   
reference (source statement or sentence (raw material) or   
assembly, e.g. for text a paragraph, section, or chapter;for   
program a control structure, function, module, or procedure.  
 
Thus we have provided the means for dealing with source   
code only, source text only, or either in any combination as   
well as all possible elements (raw material) and aggregates   
(assemblies) of referents and references.  
 
All referents, either elements or aggregates, have   
user-defined names in source code and in source text.  Not all   
references have user-defined names, only some of them, e.g.   
procedure or chapter names.  That means the software must   
assign names to those element and assembly references not   
assigned by the user.  
 
Here we employ the use of content-based names, namely the   
first sixteen (16) bytes of an element reference as the   
"proper text" name within the "unique" name.  If the element   
reference is less than 16 bytes, we must fill it on the right   
with blanks, and note its shorter length in an additional field.    
For element references then we have two fields for the   
unique name, one for the proper text name length, and one   
for the entire element text reference if greater than 16 bytes.  
 
Now note that this works for both source code and text.  Thus   
we need to define only one table to store both code and text   
element references.  That takes care of references, leaving us   
only with referents to resolve.  
 
Element referents we can store in another table using the   
same rules as we did for references with respect to proper   
text names less than 16 bytes.  We will need additional   
attribute fields which provides a detailed definition of a data   
element.  
 
Now we need to do a bit of clean up.  In the unique name   
table we need to denote if a proper text name is less than 16   
bytes, if the name is an element (raw material) or an   
aggregate (assembly), if it is a synonym, if it is a referent or   
reference, and if a reference, if source code or text.  
 
Now if we understand that we can subject non-text referents,   
e.g. image, bit maps, etc., to the same analysis, the same   
naming conventions, and the same element and aggregate   
definitions, we only have to add those tables with their row   
definitions while increasing our options in the unique name   
table to account for the additional non-text data types.  
 
In short we have in relatively few tables in a relational   
database the capability to create and maintain all the   
information in use within an enterprise.  All.  All with unlimited   
capability to mix, match, and merge.  Moreover we need only   
a single tool, a single interface to do this.  
 
If you have this in mind or will have given time to mull it over   
(and over), we have one last item to note based on our   
homonym (same name, different referent) support.  What two   
different things commonly have the same name?  Answer:   
different versions of the same thing (only different).  
 
The homonym support makes versioning a builtin (as opposed   
to separate) feature of the DDR.  It occurs at all levels of   
abstraction for all referents and references.  It short it's   
unavoidable and more comprehensive than current, commonly   
used versioning tools like CVS.  
 
Just to remind you why we went through this in the first   
place all of this, all the clerical work is done by the software:   
the unique name, the creation and maintenance of source   
elements and assemblies, and, well, you name it.  All because I   
bothered to read a book.  Oddly enough it was an IBMer,   
another staff person at Western Data Processing Center at   
UCLA in 1962 who had taken a course from Upton at Whittier   
College, who introduced me to these authors and their books.  
 
Power to the people.  
 
 
=====================================================  
 
To unsubscribe from this list, send an email message  
to "steward@scoug.com". In the body of the message,  
put the command "unsubscribe scoug-programming".  
 
For problems, contact the list owner at  
"postmaster@scoug.com".  
 
=====================================================  
 
  
Return to [ 22 | 
May | 
2005 ] 
  
  
The Southern California OS/2 User Group
 P.O. Box 26904
 Santa Ana, CA  92799-6904, USA
Copyright 2001 the Southern California OS/2 User Group.  ALL RIGHTS 
RESERVED. 
 
SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group.
OS/2, Workplace Shell, and IBM are registered trademarks of International 
Business Machines Corporation.
All other trademarks remain the property of their respective owners.
 
 |