SCOUG-Programming: The Data Respository/Directory/Dictionary (DRDD)--Part two

Next Meeting: Sat, TBD
Meeting Directions

Be a Member
Join SCOUG

Navigation:

20 Most Recent Documents
Search Archives
Index by date, title, author, category.

Features:

Mr. Know-It-All
Ink
Download!

Supporting Warpstock Phoenix 2023

Supporting Warpstock Orlando 2022

SCOUG:

Home

Email Lists

SIGs (Internet, General Interest, Programming, Network, more..)

Online Chats

Pictures from Sept. 1999

The views expressed in articles on this site are those of their authors.

warptech
SCOUG was there!

SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.

The Southern California OS/2 User Group
USA

SCOUG-Programming Mailing List Archives

Return to [ 22 | April | 2007 ]

<< Previous Message <<

Date: Sun, 22 Apr 2007 13:24:43 -0700

From: "Lynn H. Maxson" <lmaxson@pacbell.net >

Reply-To: scoug-programming@scoug.com

To: "SCOUG Programming SIG" <scoug-programming@scoug.com >

Subject: SCOUG-Programming: The Data Respository/Directory/Dictionary (DRDD)--Part two

Content Type: text/plain

Now lets understand what it means to apply manufacturing
methods to programming. Any manufactured product has two
components, raw materials and higher-level assemblies.
Every assembly consists of zero, one, or more raw materials
plus zero, one, or more assemblies. Ultimately every assembly
decomposed downward through through all lower levels
results in a set of raw materials. In manufacturing this is
known as a "bill of material", specifying everything necessary
in terms of parts to produce all assembly levels up to the
highest.

In programming we have equivalents in that our raw materials
are elements and assemblies, aggregates. The same set of
rules apply. Every aggregate consists of zero, one, or more
elements plus zero, one, or more aggregates. Eventually all
aggregates decompose from the highest level on down to a
set of elements.

Just like manufacturing.

We have a twist however. In software we have three
element forms with corresponding aggregates. We have data
elements and data aggregates. We have source statements
(code) and their aggregates (control structures, functions,
programs). We have source statements (text) and their
aggregates (paragraphs, sections, chapters, books).

The data elements and aggregates represent our objects:
referents. The source statements, code or text, provide our
references for their use. We have a need that changes either
to a referent (data) or references (source code or text)
remain synchronized. We need with minimal effort and
maximum software support to maintain this synchronization.

We begin with a name. By necessity all data elements and
aggregates have a people-assigned name. Beyond that
programs have names, sub-routines (APIs), and even some
statements have labels. Therein lies the first thing we have
to resolve: we must have names for source statements
(elements and aggregates) just as we have for data.

While we can accept and use the people-named aggregates
for source statements, we must rely on the software to
automatically provide names for source statements
themselves. We use the source statements themselves, at
least a prefix of them, to form a content-based name.

Thus at this point every element (data, code, text) now has a
name as does every corresponding aggregate (data, code,
text). Moreover this must exist as a "unique" name. We
ensure this by creating a unique name made up of two parts,
a proper name appended with an index value.

So if we limit our proper name length to 32 characters (dcl
proper_name char (32);) and index value to 32 bits (dcl
index_value fixed bin (32) unsigned;), togethere we have a
36-byte unique name. It's not important if the real name for
which we assign a proper name is less than, equal to, or
greater than the proper name. The software can compensate
for this.

What is important lies in the fact now every data, code, and
text element and aggregate has a software-maintained name.
So any name not supplied by the person gets supplied by the
software on a content-based method. It's importance extends
to a common problem encountered with names, that of
homonyms, the same name for different referents. The
addition of the index value to every proper name solves this
in an absolute sense.

For example different versions of the same program have the
same name. We have a separate software process called
version control, e.g. CVS, for this. However, as it can only
deal with people-assigned names it has nowhere near the
same applicability to a system in which every element (data,
code, text) as well as their aggregates has a name. Thus we
allow a "universal" version control unavailable currently
elsewhere.

Moreover you cannot invoke it as a separate process. It is
integral, integrated, and inherent in the software. It's
universal, complete at this moment, so there is not this version
of a version control or that...not now, not ever.

Now having solved the homonym problem, which blows away
most data dictionaries and caused me to lose a $25,000
contract to write a redbook about my use of a data
dictionary, we move on to the second problem, that of
synonyms (different names, same referent).

If we have multiple names for the same referent, which by
the way can only occur for people-assigned names, we can
arbitrarily select one as the "source" name, relegating the
others to "alias" status. We need only a means to
cross-reference aliases with source name. We should note
that any entry in the cross-reference provides a need to
agree upon "the" source name within an enterprise and
change all other alias use instances to that source name. We
can direct the software to do that for us automatically.

Now what can we say about a name? What attributes can we
assign? Well, we can give it a unique name (proper name plus
index value). We can say whether it is an element,
aggregate, or alias. We can say whether it is data, code, or
text. That basically makes up the directory through which
the software uses to access and maintain data, code, and
text.

Because the tool is available I've chosen an relational
database as the means of implementation. I will continue this
throughout for now. Eventually once we are in a position to
do so for performance reasons I will use a hierarchical
database manager based on IMS and VSAM. Right now I will
stick with something that you can use immediately.

So we have defined one table for the directory and another
for the cross-referencing of aliases to source. Two tables.
As no effective differences exists between source code and
text we need only a single table for the storage of both. The
temptation remains to use this table for storing source data,
but for the moment we will use a separate table.

At this point we have four tables, which covers the directory,
aliases, source code and text elements, and source data
elements. We have then a complete directory and repository
for data, code, and text elements. That leaves us only to
account for their aggregates.

Stay tuned.

=====================================================

To unsubscribe from this list, send an email message
to "steward@scoug.com". In the body of the message,
put the command "unsubscribe scoug-programming".

For problems, contact the list owner at
"postmaster@scoug.com".

=====================================================

<< Previous Message <<

Return to [ 22 | April | 2007 ]

The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA