Next Meeting: Sat, TBD
Meeting Directions

Be a Member
Join SCOUG

Navigation:

20 Most Recent Documents
Search Archives
Index by date, title, author, category.

Features:

Mr. Know-It-All
Ink
Download!

Supporting Warpstock Phoenix 2023

Supporting Warpstock Orlando 2022

SCOUG:

Home

Email Lists

SIGs (Internet, General Interest, Programming, Network, more..)

Online Chats

Pictures from Sept. 1999

The views expressed in articles on this site are those of their authors.

warptech
SCOUG was there!

SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.

The Southern California OS/2 User Group
USA

February 2003

Issues and Answers for Open Source

by Lynn Maxson

The Issues

The success of open source depends upon resolving three issues:

the timely receipt of source contributions,
how to increase the number of contributors, and
how to maximize contributors' productivity.

Timely Receipt of Source Contributions

Both open source and closed source have the same two challenges. One, to provide in terms of data types and operators the closest match between the software, the Solution Set (SS), and the real world, the Problem Set (PS): SS = PS. Two, to provide the closest match possible between changes in the Problem Set reflected in the Solution Set: dSS/dt >= dPS/dt.

Two situation stages occur in the lifecycle of software. The first occurs in development when no existing source exists. The second occurs in maintenance when existing source exists. This says that the development stage ceases when a change request occurs in the Problem Set modifying existing source. In effect development exists as the initial state of maintenance, lasting only until the first change request impacts existing source.

We can freeze change requests from entering a development or maintenance cycle. Doing this creates a backlog for the next. Thus freezing change requests runs counter to our need to have the solution set match as closely as possible to changes in the problem set. This means we need to implement changes as close as possible to their submission. That excludes freezing along with tools and methods that cause it.

Every change request translates into a set of one or more specifications. All these aggregate in a specification pool. From this pool we extract a set representing a version of an application system, i.e. one or more programs. If we can use the specification language as our programming language, we can go directly to producing an executable version of our application system.

We need to minimize the interval from time of entry of change request to translation into specifications to their inclusion in an executable. The shortest interval occurs when we have the same language for both specification and programming.

The timely receipt of source contributions then depends upon the rate at which we can translate change requests into specifications. To avoid backlog means we must have the capability to translate faster than the change rate. That in turn means having sufficient people resource to allow parallel translation.

How to Increase the Number of Contributors

You cannot increase the number of people working in parallel beyond the number you have. That number depends upon the change rate in the problem set which the solution set must match or else fall behind, i.e. create a backlog. That number depends also upon the change rate capacity of each individual: the rate at which they can translate user requirements into the different formats necessary to produce executables.

Each format has a language associated with it. For each language we have at least one tool which processes it. Each tool has a user interface which the user must master to some degree to use properly. So we have the language which the tool processes as well as that we use to communicate with the tool.

We use people to translate user requirements into specifications. In this manner people act as the tool. If we can go directly from specifications to executables, then we only need master one additional language, that of the tool interface. So we have a minimum of three languages, that used by the user requirements, that used in specifications, and that used to interface with the tool.

If we assume that the people already know the language of the user requests, then they need only learn two additional languages: one for writing specifications; the other, for interfacing with the tool. In pre-object-oriented programming we had the language of the user request, of analysis, of design, of construction, and of testing. Further in construction aside from the compiler we had language source and interfaces with additional tools like make utilities, linkers, debuggers, etc..

In object-oriented programming, analysis and design went from two languages (dataflows and structure charts) to as many as fourteen in UML (Unified Modeling Language). The principle of logical equivalence says that nothing exists in all these different source language forms not present in the specifications themselves. Moreover instead of using people to translate specifications into UML, we have no reason not to do this with software.

In the end we can increase the number of source contributors by reducing the number of different sources to two: the informal language of user requests and the formal language of specifications. This means we can reduce the number of tools to one, that which translates source into executables. That we require more in current methods says that we have ignored already existing advances, not that we have something yet to discover.

You increase the number of source contributors by minimizing the number of different languages they need to learn. This in turn minimizes the number of different tools to learn. This lowers the barrier to learning to within the comfort zone of more people. Now you need only to increase their productivity.

How to Maximize Contributors' Productivity

We measure productivity as the ratio of work-out over work-in. We can increase productivity in two ways:

increasing the work-out for the same work-in, or
decreasing the work-in for the same work-out.

In either case we increase the ratio of work-out over work-in. In developing and maintaining software our productivity gains or losses occurs with the tools we use. These tools include languages used, software tools that process the languages, and the methods that combine them into a process.

We gain in productivity by shifting more of the work-in from people to our software tools that produce the work-out. Our basic guide for this lies in letting people do what software cannot and software what people need not. Basically this occurs by shifting as much of the clerical processes from people to software. No better example of this exists than in the evolution of programming languages.

Evolution of Programming Languages

We have four generations of programming languages:

actual or machine language,
symbolic assembly plus macro,
imperative higher level languages (HLLs), and
declarative HLLs.

Actually the first three generations represent progressive forms of imperative languages while the fourth generation carries the progression one step farther from imperative to declarative. This progression comes from reducing the "what" and "how" logic of imperative languages to the "what" language of declaratives. This effectively shifts the writing of the "how" the logical organization of the source over to the software. This not only reduces the amount of manual writing, but more significantly that of rewriting, i.e. reflecting changes in the logical organization.

The second generation, symbolic assembly plus macro, introduced the use of mnemonics for actual op codes and symbolic operands. The first eased the task of reading the source while the second allowed the resolution of operand addressing to shift from data entry, the time of writing the source, to compile time, after writing the source. The macro introduced the instruction aggregate, using one instruction to replace the use of several others. The first, shifting of the binding time from data entry to compile, significantly reduced the amount of rewriting while the second reduced the amount of writing necessary.

While the second generation maintained the machine-dependency of the first, third generation took advantage of the second's macro form to introduce the use of machine-independent expressions. These occur in an assignment statement, e.g. "a = b + c;". This effectively says evaluate the expression "b + c" and assign the result to "a", i.e. replace the value of a. HLLs like APL use the left arrow ("<-") instead of the equal symbol for assignment, e.g. "a <- b + c".

While the second generation introduced an instruction aggregate, the third generation introduced the data aggregate in the forms of arrays, structures, and lists. Some programming languages, notably APL and PL/I, offered operator support for aggregate operands. This shifted responsibility for writing the underlying element by element processing from the programmer to the software.

The third generation also introduced an alternate to the compiler: the interpreter. The interpreter combines the functions of the editor and compiler into a single tool with a single interface. This makes it intrinsically an IDE (Integrated Development Environment), not as the "add on" when used with a compiler.

The fourth generation introduced a significant paradigm shift from imperative to declarative mode. We need to insure a better understanding of both modes to make distinct the differences.

The Imperative Mode

In grammar we associate the imperative mode with the issuing of commands like "come here," "shut up," "step back" and so on. Machine instructions have the same imperative purpose in stating what action takes place. The difference between English and machine imperative grammar lies in the details. You can tell someone to "Go to the store. Buy some bread. Return home." We expect that someone is then capable of determining how to execute these three commands. No such luck in programming.

A machine only executes low-level instructions. We must translate any higher level instruction like "Go to the store" into an ordered set, a sequence, of low-level instructions (or commands). The machine follows them exactly as written. While each instruction has its own internal (local) logic relative to its successful execution, the global logic falls entirely on the programmer.

In first and second generation languages, machine and symbolic assembly, the programmer sequences the global logic on an instruction-by-instruction basis. The writing process incorporates this global logic. For that reason we have said this form exists as "logic in programming," i.e. part of the writing process.

In third generation languages we move from instruction-by-instruction logic to one based on statements: statement-by-statement. Even though we now have an enhanced macro facility allowing us to write assignment statements like " a = (b + c)/(d - e);", the programmer still determines the global logic covering the sequencing of such statements. If a change in global logic occurs, the programmer must do the necessary rewriting.

To ease this task somewhat and to move away from "spaghetti" code, the early 70's saw the introduction of structured programming. This introduced the concept of "control structures" which in terms of control flow had the common connection feature of "one in/one out." In theory then these control structures of sequence, decision, and iteration became pluggable or reusable units in a manner similar to the "one in/one out" interface of subroutines.

The global logic then involved the sequencing or ordering of these control structures. If something necessitated a change in the global logic, the programmer had to make the changes to the ordering of the control structures. As these normally did not exist as reusable components, i.e. named files within an %include statement, this meant physically rewriting the source.

Understand that we had a capability of reuse of control structures that was never implemented as such. The question is, "Why?" The answer lies in the use of a compiler and editor as separate processes. In an editor you write a %include statement, but the actual loading of the file, i.e. its inclusion, does not occur until later during compilation. In theory we could implement each control structure as a separate file, creating each in turn as a separate process with the editor. Having done this we could once more use our editor to create a source program file containing no more than %include statements arranged in the proper sequence.

This means, of course, that unless we opened up separate instances of the editor to allow us to view the files, which we nevertheless now had to piece together in our heads, the only time we would see the source in its entirety occurred after the compile. We could, of course, enhance the editor by allowing it to recognize the %include statement on input, i.e. from within a file, or upon data entry from the keyboard. It could then do as the compiler does about retrieving the named file and inserting it in place within the source.

Unfortunately editors know nothing of compilers and vice versa. Perhaps no better example of this exists than the LPEX editor IBM packages with its VisualAge products. LPEX (for Live Parsing EXecutive) performs syntax checking on opening a file or data entry, using a colorization process to denote the parsed elements by type, e.g. variable, constant, operator. As LPEX can parse and thus recognize a %include statement, it should not take more effort to then load the file named in the %include statement and parse it as well. In this way the entire source would appear as it would appear after the compile.

In order to parse the source statements, i.e. perform syntax analysis, on completion of each statement (as well as the completion of parsing all statements) the editor could submit the parsed elements for semantic analysis. This would allow the programmer to check for spelling errors, any one of which will cause a severe error during compilation causing, it in effect to abort. Then once completing semantic analysis it could submit the results for code generation, eliminating the need for a separate compilation step. By putting all these functions within the editor makes it an IDE without implementing it as another function within the edit-compile-link-execute package.

Before leaving this we should note that what we have described here represents a tool, not a language failure. It is our tools which place restrictions on our use of what the language makes available.

The Declarative Mode

No better example of the advantages of fourth generation, declarative languages exists than SQL. It is probably used more than any other language worldwide by people who don't regard what they write as programming. It has an imperative form, e.g. "SELECT...FROM...WHERE...", from the outside where the "programmer" determines the order of the output (SELECT clause), the source(s) of the input (FROM clause), and the conditions under which selection occurs (WHERE clause). The "programmer" then leaves it up to the database manager to supply all the logic necessary to satisfy the query.

If we change anything of what we want to see or the order in which we want to see it, of which input sources to use, or conditions to apply, the database manager ignores what it did with the previous query in satisfying the new one. In short, it generates the code on an individual query basis.

It does this through performing the three steps of syntax analysis, semantic analysis, and proof theory. In imperative languages the proof theory consists of code generation only. In declarative the proof theory decomposes into two stages: a completeness proof and an exhaustive true/false proof. The completeness proof verifies that it has enough information to satisfy the request. It returns "false" if it doesn't; otherwise "true". If "true", it then imposes a logical organization based on the information. It then generates the code based on this logical organization.

At this point it has an executable form: the "end result" of a "true" completeness proof. It then enters the second stage: the exhaustive true/false proof. Here it evaluates each instance of input data, in this instance each row of the table(s) specified in the FROM clause, according to the conditions set out in the WHERE clause now incorporated within the executing logical organization. It includes each "true" instance, one or more, in the output according to the specifications of the SELECT clause.

It may happen that no "true" instance exists. In that case the query returns "false": no "true" instances. It cannot do this without having evaluated every input instance, i.e. each row in the table. Thus the description of "exhaustive true/false proof."

Now SQL by its name, Structured Query Language, is not free form. Every SQL statement has a specifically ordered form. Other declarative languages, e.g. Prolog or Trilogy, have no such restrictions. They essentially allow the input source statement groups to appear in any order, i.e. unordered. Thus the programmer does not have to apply a logical organization, i.e. global logic, on the input. That responsibility for logical organization transfers from the programmer to the software as part of its completeness proof.

This allows for named control structures on input. With the exception of COBOL with its paragraph reuse through the PERORM verb, almost all other users of other imperative languages regard such reuse through the %include statement as impractical. Yet such reuse is inherent in declarative languages other than SQL.

Basically this means that the programmer needs only to concern himself with the local or internal logic of a control structure. Now control structures themselves may contain other control structures or at least references to them. However, the programmer need only concern himself with the logical organization of the "containing" control structure, having presumably done the same for each "contained" control structure.

Now as control structures contain logic their application implies rules. Declarative languages allow the explicit declaration of rules separate from explicitly written control structures. For example, suppose we have the following data declaration:

dcl easy fixed bin (31) range (0...99999);

This specifies that easy is a fixed binary number, 31 bits plus sign, whose only allowable, i.e. "true," values range from 0 to 99999. The responsibility for applying this rule in every possible instance lies with the software and not with the programmer.

By explicitly incorporating the rules governing data usage within the declaration of that data, some hundreds and thousands of which may occur affecting the different applications used within an enterprise, we give to the software, which is not subject to memory lapses or ignorance, the responsibility for their application. We have the net effect of increased granularity of reuse down to the statement level with software assistance as well as enforcement. This does not exist in imperative languages, even those like object-oriented who boast reuse as a major feature.

Progress, not Regress

We have to ask ourselves, "Why?".

Considering what we had achieved in third generation languages like APL, LISP, and PL/I prior to 1970, why did we ever regress to a starting level of C? Why in the 30+ years since have we not reached the same capability we achieved in the ten years prior? That includes the heavy investment in object-oriented technology and languages like C++ and JAVA. It's not that we didn't try. Much of C++ tried to address deficiencies in C. Even doing so still has not reached the level of PL/I prior to 1970.

If we value reuse so highly, why do we not accept granularity down to the statement level? Why do we insist in involving people in implementing rules when we have software that will do it for us, providing physical replication wherever needed without failure?

We could make the list of such questions considerably longer. After we worked our way through them it will not turn out that our languages have failed us. We can change them and at the same time insure full backward compatibility: no conversion necessary. Instead our problems lie with our software toolset. Our editors. Our compilers.

We have four generations of programming languages. Each successive generation has provided productivity gains over the previous. Each time those gains occurred by shifting the clerical effort from people to software. We went from the low level instruction writing of machine language to the introduction of the higher-level macro in symbolic assembly. We went from the macro to the higher-level assignment statement of third generation. We went from ordered input of third generation to unordered of fourth, allowing software instead of people to construct the global logic. In doing so we saw our granularity of reuse extend down to the statement and control structure level.

We know that a programmer can make three types of errors, excluding, of course, the decision to become a programmer. Errors of syntax. Errors of semantics (spelling). And errors of logic. The first two we can catch at data entry. The last during testing. We know that operating interpretively we can test closer to the point of data entry than we can in a following compile step. Moreover, operating interpretively we can test segments in isolation regardless of the presence of other source. We can do it without even having a minimal complete program as required when using a compiler.

Speaking of testing, we have a form of fourth generation languages using predicate logic. This differs from the clausal logic used in SQL, which requires the existence of test data, i.e. the table(s) referenced in the FROM clause. Predicate logic allows us to perform an exhaustive true/false proof on code from the statement level on up using automatically created test data based on data value rules assigned to each variable. This again shifts clerical effort from people to software.

So how do we maximize contributors' productivity? By improving our toolset with proven technology.

The Answers

We need to move to fourth generation languages. In doing so we need to retain the assignment statement of the third. This allows us to have full imperative capability within a declarative language. With that we have the capability of completely writing the tools for the language in the language.

This becomes more meaningful when we recognize that Intel in volume 2 (Instruction Set) of its Pentium reference manuals offers three generational forms (1:1:1) for each instruction: machine, symbolic, and HLL. This means in essence that we can include machine or assembly language capability in an HLL form, requiring only a simple translation to machine language during code generation.

Having then defined our fourth generation, declarative language, we can then look at our toolset written entirely with it. We know from our experience with interpreters that we need only one tool and one interface. Thus only one tool language to learn.

We also know that interpreters and compilers share the same syntax and semantic analysis, differing only in their proof theory (code generation). In our fourth generation language prior to the code generation phase of the completeness proof we can indicate which of the two executable forms we desire. In effect we incorporate the complete compiler functions within an interpreter.

Take a look at the historical five stages of development:

specification,
analysis,
design,
construction, and
testing.

Using a fourth generation language shifts responsibility for analysis, design, and construction to the software as part of the completeness proof. As the completeness proof if "true" contains the global logic organization of the unordered source, we can use this now ordered source as the input to our CASE tools now integrated within our interpreter/compiler tool. This means we have only one source, the unordered input, we have to maintain. The software tool then generates the ordered source and from this can produce any of the CASE outputs: flowcharts, structure charts, UML documents.

This capability essentially eliminates the principal arguments for recent RAD (Rapid Application Design) approaches like Extreme Programming and Agile Modeling. These argue against having to produce and maintain separate source files for each document as done currently by CASE tools. The argument disappears once you have only one source with a choice of generated outputs. This ability makes every RAD slow by comparison.

Finally the use of the exhaustive true/false proof of declarative languages in conjunction with predicate logic to automatically generate any range of test data allows a level of testing not possible with the current system of beta testing regardless of the number of beta testers (users). This not only eliminates the need for beta testers, but provides more immediate feedback of verified ("true") results.

We can begin with any third generation language, any open source editor, and any open source compiler. We can incrementally enhance each, using each step along the way as empirical proof of increased productivity. In short we can take full advantage of known technology and then build from there: add some technology of our own.

The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA