SCOUG-Programming Mailing List Archives
Return to [ 22 |
May |
2005 ]
Content Type: text/plain
Thanks to an email from Greg Smith in the morning with
respect to GCC 4.0 I felt an reprieve from presenting anything
on LEX and YACC. I needed one. I was ill-prepared to do the
topic justice.
Next month Greg Smith will introduce the group to GCC 3.x
and the Watcom C compiler. He hopes to take a LINUX open
source program and recompile it for OS/2 using both compilers.
While GCC is the odds on favorite in the open source
community, the Watcom package (as opposed to compiler
only) offers a better, more comprehensive development and
debugging environment.
Besides as a group we need to take on one or more open
source projects for OS/2. Two that come to mind of particular
importance are GCC 4.0 and Mozilla. As the Watcom compiler
is also open source, if we do the one (GCC 4.0), we should
have the means to do the other as well. That assistance
should come as a blessing to the sole developer of Watcom at
the moment, who is threatening to stop.
So instead of LEX and YACC, about which I have minimal
knowledge, I chose to present on something which I did know
something about as its author: the Data Directory/Repository
(DDR). In this forum we have argued about language
decisions. Getting bogged down in that doesn't go to the
heart of the problem in software development and
maintenance: its ever spiraling cost, longer maintenance
cycles, and inability to reduce (if not eliminate) backlogs of
change requests.
That's not a language issue, though language choice does
have some importance, it's a people problem. It's a people
problem to develop software, to maintain it (the larger
problem), and more importantly to document it. More to the
point it's a people productivity problem, one not addressed by
the choice of a language or compiler.
I operate under the general maxim (as it comes closer to my
name of Maxson) that productivity improvements come
directly from process improvements. In the IT profession we
have through software increased client productivity, reduced
per transaction costs, through process improvements based on
"letting users do what software cannot and software what
users need not". In short shift as much clerical effort as
possible to software. Reduce clerical effort.
As the primary clerical effort in IT is writing, creation and
maintenance of source code and text, we can attempt to
reduce it in two ways: reduce the total people writing and
reduce the total different writing. That reduction reaches a
minimum with one source code language and one source text
language. Then using software assistance with both sources
to create (generate) and maintain all reportable results, e.g.
UML ("Unified, not "Universal", Modelling Language)
documentation, user guides, programming reference manuals.
web-based tutorials, etc.
Now how do you get that "software assistance" that makes
this possible? The Data Directory/Repository (DDR). Having
watched a previous attempt fail in the IBM earlier project of
AD/Cycle (Application Development/Cycle) wherein IBM and
the rest of the "cooperating" vendors expended millions of
man-hours and hundreds of millions of dollars, I want to offer
you a more complete solution to anything that exists as well
as lower cost.
To do that I need to introduce you to A. Upton, author of
"Design for Thinking" and to Richard Samson workbook
co-author with Upton of "Creative Analysis" and sole author
of "The Mind Builder". Specifically within this set of work the
presentation on analysis.
Upton and Samson offer us three possible types of analysis:
classification (taxonomy), structure, and operation (process).
Three. Three only covering all possible analysis. These three
all have a basis on the question(s) you can ask about some
"thing".
To understand the questions you can visually represent the
"things" either in your mind or in a physical drawing by a large
circle which contains within it three non-overlapping smaller
circles. Do you have that picture in mind? OK.
In a classification analysis we can start with a smaller circle
and ask "What is this a kind of?", i.e. to what larger circle (or
circles) does it belong? Or inversely we can ask, "What are
the kinds of this?", i.e. what smaller circles does it contain?
This is classical taxonomy of biology.
Similarly in structure analysis we can ask, "What is this a part
of?" or "What are the parts of this?". Continuing on to
operation analysis, "What is this a stage of?" or "What are the
stages of this?".
Now note here that our visual representation, in your mind or
on paper, has to different "levels of abstraction", a higher
level (the larger circle) and a lower (the smaller circles).
Each of these can themselves be "contained" in a higher level
or can "contain" lower levels. In any case we have a classic
set (higher level) and member (lower level) relationship.
More importantly all types of analysis (classification,
structure, and operation) have this basic "hierarchical"
relationship in common. That means we need only one
method, one process, one means of definition for this
"common" feature.
What separates the three types, what separates classification
analysis from structure and operation analyses is that in
classification analysis the order of the lower levels of
abstraction is unimportant (unordered) while in structure and
operation analyses order is important.
Thus to handle all possible classification analyses the
"common" method suffices. To handle all possible structure
and operation analyses we need an additional "method" which
allows us to depict order among the lower levels of
abstraction (the smaller circles).
Now we cannot refer to anything in the directory or store it in
the repository without giving it a "name" either user- or
software-defined. If we implement our DDR as a database,
specifically a relational database, then we need have only a
single table with a row for each "unique" name. This brings
up our first problem to resolve: homonyms (same name,
different referents, i.e. things, level of abstraction)
To solve this we borrow from among many sources, but
particularly from General Semantics, the use of indexing. This
means that any "stored" (unique) name has two parts, a
proper name (text) part appended with an index part. For
purposes here we will assume a proper name part of 16 bytes
and an index part of 4 bytes, thus a 20-byte unique name.
That's our first table though we have some further definitions
(attributes) to add later. We use a second table to store our
"common" hierarchical (two-level of abstraction) relationship
that exists among the three types of analyses. Each row in
our table with contain the unique name of the higher level
appended in turn with each unique name of the lower levels,
one row for each pairing. Thus each row with have 4 fields
and total 40 bytes with the entire row acting as its "primary"
key.
Now we need to define a table to depict the ordering which
occurs among the lower level of abstractions (smaller circles)
within the higher level (larger circle). Here with will have the
higher level unique name appended by the unique name
pairing (one row for each such pairing) of the lower levels of
abstraction in their predecessor, successor order. This
includes when a lower level is "paired" with itself
representing "recursion". This row then contains a sequence
of three unique names, 60 bytes in length, all of which make
up the "primary" key for that table.
Now these three tables--unique name table, hierarchical
table, order table--allow us to represent all possible results
of classification, structure, and operation analyses. Our two
component (proper text name, index) unique name allows us
to resolve homonyms (same (proper) name, different
referent). Now we need an additional table to resolve
synonyms (different names, same referent).
To do that we need to agree that only one of the possible
unique names refers to our referent. With this determined
then we need to define a synomym or "alias" table. Each row
of that table will consist of a "synomym(alias)" unique name
appended by the "real" unique name, a row for every such
pairing. Again the entire row of two unique names make up
the "primary" key for that table.
So now we have defined four (4) tables. These allow us
access to every unique and "proper text" names; to the
two-level hierarchical relationships common to classification,
structure, and operation analyses; to the ordered relationships
with structure and operation analyses; and finally to resolving
issues related to homonyms and synonyms. Four tables.
So far we have not stored, only named, referents and
references. We have created the tables necessary for storing
their relationships depending upon the type of analysis in
effect. Before we do that we need to borrow a page from
the manufacturing industry.
In manufacturing we deal with only two types of referents:
raw material and assemblies. Each of these represent a "level
of abstraction". In this scheme we have only one lowest level
of abstraction, that of "raw material". All higher levels must
ultimately resolve to these lowest level elements.
Assemblies themselves may contain other assemblies and raw
material. Thus higher level assemblies may contain lower
level assemblies or lowest level raw material. Of course the
same applies for lower level assemblies (except lowest level)
as for higher level. Every level from the highest assembly
under consideration to the lowest (raw material) exists as a
different level of abstraction.
The tables we have defined thus far allows us to represent
all levels of abstraction from the highest to the lowest. Each
level represents either a named referent, e.g. data element
(raw material) or aggregate (assembly), or a named
reference (source statement or sentence (raw material) or
assembly, e.g. for text a paragraph, section, or chapter;for
program a control structure, function, module, or procedure.
Thus we have provided the means for dealing with source
code only, source text only, or either in any combination as
well as all possible elements (raw material) and aggregates
(assemblies) of referents and references.
All referents, either elements or aggregates, have
user-defined names in source code and in source text. Not all
references have user-defined names, only some of them, e.g.
procedure or chapter names. That means the software must
assign names to those element and assembly references not
assigned by the user.
Here we employ the use of content-based names, namely the
first sixteen (16) bytes of an element reference as the
"proper text" name within the "unique" name. If the element
reference is less than 16 bytes, we must fill it on the right
with blanks, and note its shorter length in an additional field.
For element references then we have two fields for the
unique name, one for the proper text name length, and one
for the entire element text reference if greater than 16 bytes.
Now note that this works for both source code and text. Thus
we need to define only one table to store both code and text
element references. That takes care of references, leaving us
only with referents to resolve.
Element referents we can store in another table using the
same rules as we did for references with respect to proper
text names less than 16 bytes. We will need additional
attribute fields which provides a detailed definition of a data
element.
Now we need to do a bit of clean up. In the unique name
table we need to denote if a proper text name is less than 16
bytes, if the name is an element (raw material) or an
aggregate (assembly), if it is a synonym, if it is a referent or
reference, and if a reference, if source code or text.
Now if we understand that we can subject non-text referents,
e.g. image, bit maps, etc., to the same analysis, the same
naming conventions, and the same element and aggregate
definitions, we only have to add those tables with their row
definitions while increasing our options in the unique name
table to account for the additional non-text data types.
In short we have in relatively few tables in a relational
database the capability to create and maintain all the
information in use within an enterprise. All. All with unlimited
capability to mix, match, and merge. Moreover we need only
a single tool, a single interface to do this.
If you have this in mind or will have given time to mull it over
(and over), we have one last item to note based on our
homonym (same name, different referent) support. What two
different things commonly have the same name? Answer:
different versions of the same thing (only different).
The homonym support makes versioning a builtin (as opposed
to separate) feature of the DDR. It occurs at all levels of
abstraction for all referents and references. It short it's
unavoidable and more comprehensive than current, commonly
used versioning tools like CVS.
Just to remind you why we went through this in the first
place all of this, all the clerical work is done by the software:
the unique name, the creation and maintenance of source
elements and assemblies, and, well, you name it. All because I
bothered to read a book. Oddly enough it was an IBMer,
another staff person at Western Data Processing Center at
UCLA in 1962 who had taken a course from Upton at Whittier
College, who introduced me to these authors and their books.
Power to the people.
=====================================================
To unsubscribe from this list, send an email message
to "steward@scoug.com". In the body of the message,
put the command "unsubscribe scoug-programming".
For problems, contact the list owner at
"postmaster@scoug.com".
=====================================================
Return to [ 22 |
May |
2005 ]
The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA
Copyright 2001 the Southern California OS/2 User Group. ALL RIGHTS
RESERVED.
SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group.
OS/2, Workplace Shell, and IBM are registered trademarks of International
Business Machines Corporation.
All other trademarks remain the property of their respective owners.
|