SCOUG-Programming: Data Repository/Directory

Next Meeting: Sat, TBD
Meeting Directions

Be a Member
Join SCOUG

Navigation:

20 Most Recent Documents
Search Archives
Index by date, title, author, category.

Features:

Mr. Know-It-All
Ink
Download!

Supporting Warpstock Phoenix 2023

Supporting Warpstock Orlando 2022

SCOUG:

Home

Email Lists

SIGs (Internet, General Interest, Programming, Network, more..)

Online Chats

Pictures from Sept. 1999

The views expressed in articles on this site are those of their authors.

warptech
SCOUG was there!

SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.

The Southern California OS/2 User Group
USA

SCOUG-Programming Mailing List Archives

Return to [ 17 | March | 2006 ]

<< Previous Message << >> Next Message >>

Date: Fri, 17 Mar 2006 12:05:52 PST8

From: "Lynn H. Maxson" <lmaxson@pacbell.net >

Reply-To: scoug-programming@scoug.com

To: "SCOUG Programming SIG" <scoug-programming@scoug.com >

Subject: SCOUG-Programming: Data Repository/Directory

Content Type: text/plain

Apparently some have not seen Uncle Lynn's design of the
Data Repository/Directory. Let's rectify that.

If you read the article I referenced in the previous message,
you will note that the subject speaker, McConnell, in his ideas
of an "ideal" system included "The most powerful form of
reuse is full reuse." and "Iteration and incrementalism in
software development are essential."

Full reuse includes individual source statements, which even
McConnell with his OO vision has probably overlooked. Greg
has a penchant for specifications in english, so we might as
well begin there.

In programming and in documentation of programs we have
two entities, referents (things referenced) and references.
Referents include named objects, i.e. data, and named
references. Thus within a reference we can refer to a data
object (variable or constant) or another reference, e.g. rc =
DosOpen(...). Element references occur as source code, i.e.
program statement, or as source text, i.e. documentation
sentence. These constitute the "raw material" of references,
into which all other references must ultimately decompose.

Non-element references we call "assemblies". An assembly
then contains one or more references to assemblies and raw
material. This allow us to treat source code and source text
in a purely manufacturing manner based on the use of raw
material in bills of material involving ever higher level of
assemblies.

This implies that all raw materials, referents and references,
have a name as do all their assemblies. Thus every
component of our universe of source code (programs) and
source text (documentation) has a name. That name exists in
the data directory, which in our system is the only means of
access.

Now we cannot have a name in a database manager separate
from any other will making it unique. That means we have to
have a means of distinguishing between homonyms (same
name, different referents). We do this by appending an index
to a "proper", text-based name. If we arbitrarily designate a
text-based name as 16-bytes in length right-filled if
necessary with blanks and append it with a full-word binary
index (32-bits or 4-bytes) we can create a 20-byte unique
name for all our referents and references.

Thus we have a unique name which we can designate as
source data (D), source code (C), and source text (T). Now
we simply need to designate if the name refers to a raw
material or element (E) or an assembly (A). This covers the
universe of possibilities for all referents and references.

The directory then consists of two columns for the unique
name, one 16-byte, fixed length character value, and the
other a 4-byte binary value. In addition it contains two
single-character columns, one to designate source type (D, C,
or T) and the other to designate source structure (E or A).

A by-product of this lies in having a builtin versioning system
down to the element level for all source types as versions
occur in practice as homonyms. This exceeds the scope of
any other existing versioning system in software AFAIK.

Now having solved the problem of homonyms we can turn to
that of synonyms (different name, same referent). That
means we need an "alias" or "AKA" (Also Known As) table.
We contruct this with a 4-column row consisting of a
"designated" primary unique name and an "alternate" unique
name with a row for all such associations.

Now we have solved the homonym and synonym problems
which beset other implementations of a data dictionary and
specifically most CASE tools. We'vd done it with only two
tables.

Now we have to deal with assemblies. This relies on the fact
that in software (and essentially everywhere else) we have
only three forms of analysis: classification, structure, and
operation (a structure as a process). In each of these we
have only two questions to answer. For classification, we
have, "What is this a kind of?" and "What are the kinds of
this?". For structure we have, "What is this a part of?" and
"What are the parts of this?". For operation we have, "What
is this a stage of? and "What are the stages of this?".

If you need any textbook references on this, I refer you to
Upton's "Design for Thinking" and Upton's and Samson's
"Creative Analysis", both excellent sources for those who
want to improve their thinking in english...sentences.

As it turns out the three forms of analysis have only two
visual possibilities. Classification has a two-level "unordered"
hierarchy (container to contained) as given in any taxonomy
textbook. Structure and operation offer an "ordered"
hierarchy of the "contained" while retaining that of
classification relative to container to contained.

Thus we need only two additional tables, one for the common
unordered container to contained enumeration; the other for
the enumerated set of contained to contained (ordered)
relationships. The "container" table like the "alias" table
consists of 4 columns of two unique names with a row for
each individual container to contained relationship. The
"contained" table consists of 6 columns with a "unique" name
composed of combining two-unique names of a
"container-contained" with a "followed by" contained name.
This allows for immediate loops (recursion), a contained name
"followed by" itself as well as higher-level loops.

Now let me see. That's four tables, one for a directory,
another for an alias, and two for storing relationships. Now
we need one for storing source statements which consists of
two columns for the unique name, one for the statement
length (possibly 16 bytes or less) and one variable-length for
the statement (if greater than 16 bytes).

OK, that's five. Now we need one to store source data
elements. Again with the same number of columns as for
source statements.

OK, that's six...as promised. We might as well get generous
here an include a seventh table for source text with the same
number as for source statements. Or we could choose to use
the same table as source statements for source text.

As source code and source text use (or should) the same data
names in their references we have a means of combining their
use when desired as in literate programming or separately as
in code generation or user documentation. It gives us an
easier means of synchronizing changes globally throughout all
source code and text.

Now we do have a "twitchy" part when it comes to data,
specifically data assemblies (aggregates) which may be
homogenous, e.g. arrays, or heterogenous, e.g. structure, or
list or any combination thereof. This means additional
columns for our data referent (object) table. That completes
our data repository with directory.

Now save this so I don't have to repeat it in the future. I will
just in case I do.

That I think completely covers "The most powerful form of
reuse is full reuse." Probably more so that has occurred to
McConnell. Full reuse has to occur at the statement level. If
it does, it automatically occurs at all higher assembly levels.

Note that all assemblies occur as "lists" of names, whether
source code, text, or data. Note that the software will supply
names where the developer does not. The software will
automatically do versioning without developer intervention in
some instances, e.g. change to a source statement, and with it
in other, e.g. when the developer wants to reflect a change
globally.

Now as to "Iteration and incrementalism in software
development are essential" we will save for later discourse,
although I did touch upon this in a previous response to Greg
relative to immediate feedback "brainstorming" input.

=====================================================

To unsubscribe from this list, send an email message
to "steward@scoug.com". In the body of the message,
put the command "unsubscribe scoug-programming".

For problems, contact the list owner at
"postmaster@scoug.com".

=====================================================

<< Previous Message << >> Next Message >>

Return to [ 17 | March | 2006 ]

The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA