SCOUG Logo


Next Meeting: Sat, TBD
Meeting Directions


Be a Member
Join SCOUG

Navigation:


Help with Searching

20 Most Recent Documents
Search Archives
Index by date, title, author, category.


Features:

Mr. Know-It-All
Ink
Download!










SCOUG:

Home

Email Lists

SIGs (Internet, General Interest, Programming, Network, more..)

Online Chats

Business

Past Presentations

Credits

Submissions

Contact SCOUG

Copyright SCOUG



warp expowest
Pictures from Sept. 1999

The views expressed in articles on this site are those of their authors.

warptech
SCOUG was there!


Copyright 1998-2024, Southern California OS/2 User Group. ALL RIGHTS RESERVED.

SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.

The Southern California OS/2 User Group
USA

October 2001


A Simple, General Command Line Filter In REXX

Useful for Parsing

by Dallas Legan

My 'PARSE' program began with a post on the POSSI mailing list. Someone was asking about creating a list of files containing one of a set of strings. I thought of a way of doing this with the Seek and Scan (PMSeek) utility. This lead to the question of stripping out just the file names by themselves, minus the other data generated by the Seek and Scan utility. Further reflection raised a similar need, but differing slightly in requirements, for file lists generated by the OS/2 'dir /n', a directory command listing with the 'new' (versus old DOS) format. As I thought about this more, the question of writing a REXX script came up, but one general enough to handle all these minor variations.

One of my minor complaints about REXX has been that it doesn't facilitate writing brief, less than one line filters for use in command line "pipelines." If you're not familiar with pipelines, they are constructs on the command line which string several 'filter' programs together, the standard output of one going into the standard input of the next in the pipe. The symbol to connect the programs is the '|', and initial input for the first program in the pipe might be from a file, symbolized with '<'. Final output, if you don't want it dumped to the console screen, may be redirected to a file using the '>' symbol. Commands that come with OS/2 that function in pipes are FIND, SORT, MORE. As an example, you might do:

DIR /N | FIND "garble" > garbled.lst to find the files with the string 'garble' in their name and store them in a file named garbled.lst. To create a list sorted in the reverse order you might do: SORT /R < garbled.lst > rgarbled.lst

REXX has always had the ability to be used for these filter programs but generally, you have to stop and fire up an editor to use it for this. In my opinion, this has held up the acceptance of REXX, where many current tools for use as filters allow you to write short segments of instructions, with several tools on one or more lines interactively at the command prompt. Commonly used tools for this sort of thing are (besides FIND, SORT, and MORE) grep, cut, sed, awk and perl.

At this point I decided to try my hand at a general purpose filter written in REXX, that would allow entry of short bits of code to allow it to be applied for a particular task. The loop for reading in each line of input, and printing each result out, would be taken care of automatically, allowing you to concentrate on what this particular use would do in the guts of the loop.

How REXX Could Be Used

Having reached this point in my thinking, I started going over what special features REXX could bring to the table for use as a filter. I think of REXX as having three main concepts that need to be understood to use it.

First is the DO loop construct.
A lot of the functionality of the language is stuffed into the REXX DO loop, a lot that many languages break out into several types of iteration constructs. There would have to be a DO loop of some kind in this program to begin with, but I didn't see any other general use for the DO loop in this quest. The conclusion here was to just take advantage of whatever the DO loop offered, but it wouldn't be the central idea for the program.

Second, the compound or stem variable.
This is similar to associative or hash variables in such languages as Perl or Korn shell, and while useful, didn't strike me as an idea that would make a focal point for a command line filter tool.

Lastly, the REXX PARSE command.
A tool for splitting up strings and storing the results in variables, this seemed to fit the spirit of a command line pipe tool.

What It Should Do

The main focus of the tool would be to first allow a template to be entered for splitting up each line of input, and then allow possible further manipulation before it was printed out. Undoubtedly, as many features as could be imagined could be added to the filter program, but I decided to keep this first version as simple as possible, and save the the complications for more sophisticated programs in the future.

After some experimentation, the program I call my PARSE tool was settled on, for a while anyway. I've tested it with OS/2 Classic REXX, and the Regina REXX interpreter running on Linux. Within the code are indicated the minor changes needed for tuning it to run on various platforms. Some changes were made to take care of differences in how the tested platforms handled Linein function failures, null inputs, and quoted parameters. Other minor changes may be needed on yet other platforms.

Basically, the program accepts no command line switches, just a quoted program/script, and must take input either from '<' redirection or a pipe. I found that it could be made to 'crash' to the help routine when '-h', '-?' or such were entered as 'switches' by means of an appropriate SIGNAL command setting. (Actually these are invalid REXX PARSE command templates.)

If performance is really critical or you find yourself using a particular script a lot, it might be worthwhile to create a dedicated script or a compiled program for the particular task. This would eliminate such obvious inefficiencies as the necessary INTERPRET statement in this 'PARSE' program. The point is that the users time is most valuable, not the computers, and being able to rapidly experiment and evolve a quick one time solution to a problem without even having to bring up an editor, much less a compiler, gives the user maximum access to computer resources.

You Can Use It

Some people might think of this program as something of a scam, leaving so much of what it does up to the user. The idea is simply to take care of boilerplate issues so the user can experiment/improvise at the command line, or possibly include a bit REXX inside a batch file or script written in some other language. Almost anyone seeing the idea may have their own ideas of how to construct such a piece of boilerplate, what features it should have, what defaults, switches or subroutines might fit their own methods of tackling problems. For these reasons I've been somewhat reluctant to put any copyright or license on the program, but after some thought am sticking an essentially BSD type open source license on it - modify anyway you want, for any purpose you want, but if any of it is recognizable as derivative from this program, please mention me as the original author.

To use this yourself, download a zip file with the REXX script and numerous usage examples.

My general impression of this script is that it is somewhere between 'cut' and 'awk' in capabilities, with a dash of 'xargs' or "glue"-type language tossed in. (Which is logical since REXX originated as a "glue" language for connecting programs/utilities/applications together.)

Examples of Use

For the rest of this article, I will present an example of input to the filter along with some simple changes that can be made and explanations of the brief filter scripts. Many of these tasks could be done with other, in many instances specialized or faster tools, but are shown here simply to show the generality of the 'PARSE' program tool. (To keep things realistic and simple, I'm including some 'ugly' parts of this listing, that clearly don't relate to my original task of getting just the file names, but further filtering or editing of a dump file could easily take care of.)

  1. First, a few lines of test input, generated on my computer using DIR /N:

    The volume label in drive C is ACC2. The Volume Serial Number is 25C0:E415. Directory of C:\OS2\APPS 5-09-97 9:04a <DIR> 0 . 5-09-97 9:04a <DIR> 0 .. 5-09-97 9:04a <DIR> 0 DLL 10-31-94 7:45p 55076 0 EPM.EX 10-31-94 7:45p 57321 0 EXTRA.EX 10-31-94 7:45p 32480 0 EPM.EXE 10-31-94 7:45p 21328 0 EPMHELP.QHL

  2. DIR /N | parse or DIR /N | parse ";" The volume label in drive C is ACC2. The Volume Serial Number is 25C0:E415. Directory of C:\OS2\APPS 5-09-97 9:04a <DIR> 0 . 5-09-97 9:04a <DIR> 0 .. 5-09-97 9:04a <DIR> 0 DLL 10-31-94 7:45p 55076 0 EPM.EX 10-31-94 7:45p 57321 0 EXTRA.EX 10-31-94 7:45p 32480 0 EPM.EXE 10-31-94 7:45p 21328 0 EPMHELP.QHL

    Notes: The program was designed so that with no script it would simply echo the input to output.

  3. DIR /N | parse ". =41 US" . .. DLL EPM.EX EXTRA.EX EPM.EXE EPMHELP.QHL

    Notes: Here it is doing what this program was originally intended for, to strip out the file names from a listing that include other columns of data that is not wanted. The script consists of a simple PARSE template that splits the data in each line at column 41, with the '=41' part of the template. All data before column 41 is pitched into a throw away 'black hole', '.', and that after column 41 is stored in variable 'US', the parse program's default output variable. The name US was chosen as an acronymn for 'UnderScore' ('$_') from Perl, where it is the default for many operations when a variable isn't specified.

  4. DIR /N | parse ". . . . US" drive C is ACC2. is 25C0:E415. . .. DLL EPM.EX EXTRA.EX EPM.EXE EPMHELP.QHL

    Notes: This is an alternate way of doing the same thing as in case 3 above. The template is simply to put the first four columns of non-blank data in '.'s, and the fifth in US for output. If there had been trailing data after the file names, it could have been disposed of with a template of ". . . . US .", assuming no file names had embedded blanks (a very bad practice).

  5. DIR /N | parse ";us=Word(us,5)" drive is . .. DLL EPM.EX EXTRA.EX EPM.EXE EPMHELP.QHL

    Notes: Yet another way of performing the original task. In this case, the parse command template defaults to storing the input into variable US, and the REXX standard library function Word strips out the fifth blank delimited column (on one line a 'word') of the input data.

  6. DIR /N | parse ";us=SubWord(us,5,1)" drive is . .. DLL EPM.EX EXTRA.EX EPM.EXE EPMHELP.QHL

    Notes: And yet another way, using a slightly more versatle function Subword, which could have included more than one blank delimited column if it had been desired (and more available).

  7. DIR /N | parse "; US='>' US;" > > The volume label in drive C is ACC2. > The Volume Serial Number is 25C0:E415. > Directory of C:\OS2\APPS > > 5-09-97 9:04a <DIR> 0 . > 5-09-97 9:04a <DIR> 0 .. > 5-09-97 9:04a <DIR> 0 DLL > 10-31-94 7:45p 55076 0 EPM.EX > 10-31-94 7:45p 57321 0 EXTRA.EX > 10-31-94 7:45p 32480 0 EPM.EXE > 10-31-94 7:45p 21328 0 EPMHELP.QHL

    Notes: Here, leading '>' characters were prefixed to each line, as might be done when quoting email for a reply. No template was provided, so each line of input was put into variable US by default (nothing preceeds the first ';' in the script.). Then character string '>' was concatenated with the blank concatenation operator ' ' to the value of string US and stored over the original value in US.

  8. DIR /N | parse ";US=NR US;" 1 2 The volume label in drive C is ACC2. 3 The Volume Serial Number is 25C0:E415. 4 Directory of C:\OS2\APPS 5 6 5-09-97 9:04a <DIR> 0 . 7 5-09-97 9:04a <DIR> 0 .. 8 5-09-97 9:04a <DIR> 0 DLL 9 10-31-94 7:45p 55076 0 EPM.EX 10 10-31-94 7:45p 57321 0 EXTRA.EX 11 10-31-94 7:45p 32480 0 EPM.EXE 12 10-31-94 7:45p 21328 0 EPMHELP.QHL

    Notes: Here, the line number (NR) was simply put in front of each line of output.

  9. DIR /N | parse ";US=Right(NR,3) US;" 1 2 The volume label in drive C is ACC2. 3 The Volume Serial Number is 25C0:E415. 4 Directory of C:\OS2\APPS 5 6 5-09-97 9:04a <DIR> 0 . 7 5-09-97 9:04a <DIR> 0 .. 8 5-09-97 9:04a <DIR> 0 DLL 9 10-31-94 7:45p 55076 0 EPM.EX 10 10-31-94 7:45p 57321 0 EXTRA.EX 11 10-31-94 7:45p 32480 0 EPM.EXE 12 10-31-94 7:45p 21328 0 EPMHELP.QHL

    Notes: Here, the line number was padded out to 3 characters with the default character of space, with the Right string function, so things line up nicer.

  10. DIR /N | parse ";us=Left(us,75) Right(NR,3);" 1 The volume label in drive C is ACC2. 2 The Volume Serial Number is 25C0:E415. 3 Directory of C:\OS2\APPS 4 5 5-09-97 9:04a <DIR> 0 . 6 5-09-97 9:04a <DIR> 0 .. 7 5-09-97 9:04a <DIR> 0 DLL 8 10-31-94 7:45p 55076 0 EPM.EX 9 10-31-94 7:45p 57321 0 EXTRA.EX 10 10-31-94 7:45p 32480 0 EPM.EXE 11 10-31-94 7:45p 21328 0 EPMHELP.QHL 12

    Notes: This time, trim or pad each line to 75 characters from the left with the Left function, and pad the record numbers to three characters from the right, concatenate them with the space concatenation operator, so that the line numbering is on the right end of each line.

  11. DIR /N | parse ";IF NR < 8 THEN;ITERATE;" 5-09-97 9:04a <DIR> 0 DLL 10-31-94 7:45p 55076 0 EPM.EX 10-31-94 7:45p 57321 0 EXTRA.EX 10-31-94 7:45p 32480 0 EPM.EXE 10-31-94 7:45p 21328 0 EPMHELP.QHL

    Notes: Skip the first 8 lines before printing them out.

  12. DIR /N | parse ". . a b .; us=a b" label in Serial Number C:\OS2\APPS <DIR> 0 <DIR> 0 <DIR> 0 55076 0 57321 0 32480 0 21328 0

    Notes: Parse out only the two columns for data and extended attributes size in listing and assign them to variable 'us' for output.

  13. DIR /N | parse ". . a .;IF datatype(a,'N') THEN;t.1=t.1+a;END;DO;us=t.1;" 34932530

    Notes: Parse out each file size, from the appropriate column. Check it to see if it is a number. If it is, add it to the total stem variable. End the loop before outputting anything on each loop iteration. On a final one pass grouping loop, assign the tally value to the default output variable. Then the tally is automaticly printed out.

  14. DIR /N | parse " a =26 b;us=reverse(a)||reverse(b);" evird ni lebal emulov ehT.2CCA si C rebmuN laireS emuloV ehT.514E:0C52 si SPPA\2SO\:C fo yrotceriD RID< a40:9 79-90-5 . 0 > RID< a40:9 79-90-5 .. 0 > RID< a40:9 79-90-5 LLD 0 > 7055 p54:7 49-13-01XE.MPE 0 6 2375 p54:7 49-13-01XE.ARTXE 0 1 8423 p54:7 49-13-01EXE.MPE 0 0 2312 p54:7 49-13-01LHQ.PLEHMPE 0 8

    Notes: Split each line in two at column 26. Glue the two halves back together after reversing them, with no intervening space, with the '||' operater. (a touch of arbitrary surrealism!)

  15. DIR /N | parse "a . . . b; us =a b;" The drive C is ACC2. The is 25C0:E415. Directory 5-09-97 . 5-09-97 .. 5-09-97 DLL 10-31-94 EPM.EX 10-31-94 EXTRA.EX 10-31-94 EPM.EXE 10-31-94 EPMHELP.QHL

    Notes: Strip apart the first and fifth columns of the listing and glue them directly together for output.

  16. DIR /N | parse "; IF NR > 3 THEN; ITERATE;" The volume label in drive C is ACC2. The Volume Serial Number is 25C0:E415.

    Notes: Here, 'ITERATE' the boilerplate DO loop on all lines after the third, before they are printed out, clipping them from the printout.

  17. DIR /N | parse "; US = US || RS || RS;" The volume label in drive C is ACC2. The Volume Serial Number is 25C0:E415. Directory of C:\OS2\APPS 5-09-97 9:04a <DIR> 0 . 5-09-97 9:04a <DIR> 0 .. 5-09-97 9:04a <DIR> 0 DLL 10-31-94 7:45p 55076 0 EPM.EX 10-31-94 7:45p 57321 0 EXTRA.EX 10-31-94 7:45p 32480 0 EPM.EXE 10-31-94 7:45p 21328 0 EPMHELP.QHL

    Notes: Two additional record seperaters ( 'RS' ) are concatenated onto each line before it printed out, causing the output to be triple spaced.

  18. DIR /N | parse ". . . t.1 us;IF t.1 < 400 THEN;ITERATE;" drive C is ACC2. is 25C0:E415. PICVIEW.EXE KAPP.CMD PU.CMD PARSETST.CMD LOADLIB.CMD ATLST0.CMD ATLST.CMD ATLST1.CMD ATLST2.CMD CALENDAR.CMD HEX.CMD

    Notes: Here, parse the value of the bytes in the extended attribute part of the directory listing into stem variable t.1, and iterate if it is less than 400 bytes. This leaves, basically, a listing of files with more than 400 bytes of extended attributes.

  19. DIR /N | parse ". . . t.1 us;IF t.1 < 400 THEN;ITERATE;'copy 'us' a:*.*';" Wed 10-03-2001 | 4:54:22.68 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy drive C is ACC2. a:*.* SYS1003: The syntax of the command is incorrect. drive C is ACC2. Wed 10-03-2001 | 4:54:22.89 | The Operating System/2 Version is 3.00 (1)[C:\OS2\APPS]copy is 25C0:E415. a:*.* SYS1003: The syntax of the command is incorrect. is 25C0:E415. Wed 10-03-2001 | 4:54:23.34 | The Operating System/2 Version is 3.00 (1)[C:\OS2\APPS]copy PICVIEW.EXE a:*.* 1 file(s) copied. PICVIEW.EXE Wed 10-03-2001 | 4:54:33.85 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy KAPP.CMD a:*.* 1 file(s) copied. KAPP.CMD Wed 10-03-2001 | 4:54:37.53 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy PU.CMD a:*.* 1 file(s) copied. PU.CMD Wed 10-03-2001 | 4:54:41.53 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy PARSETST.CMD a:*.* 1 file(s) copied. PARSETST.CMD Wed 10-03-2001 | 4:54:46.69 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy LOADLIB.CMD a:*.* 1 file(s) copied. LOADLIB.CMD Wed 10-03-2001 | 4:54:50.90 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy ATLST0.CMD a:*.* 1 file(s) copied. ATLST0.CMD Wed 10-03-2001 | 4:54:55.05 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy ATLST.CMD a:*.* 1 file(s) copied. ATLST.CMD Wed 10-03-2001 | 4:54:59.85 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy ATLST1.CMD a:*.* 1 file(s) copied. ATLST1.CMD Wed 10-03-2001 | 4:55:05.01 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy ATLST2.CMD a:*.* 1 file(s) copied. ATLST2.CMD Wed 10-03-2001 | 4:55:10.43 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy CALENDAR.CMD a:*.* 1 file(s) copied. CALENDAR.CMD Wed 10-03-2001 | 4:55:15.18 | The Operating System/2 Version is 3.00 (0)[C:\OS2\APPS]copy HEX.CMD a:*.* 1 file(s) copied. HEX.CMD

    Notes: This time, copy the files with greater than 400 bytes extended attributes to the a: floppy disk. The string 'copy 'us' a:*.*' , with variable US concatenated by abuttal with two literal strings, since it doesn't make up part of any recognizable REXX command or expression, is passed to the invoking environment for interpretation.


    The Southern California OS/2 User Group
    P.O. Box 26904
    Santa Ana, CA 92799-6904, USA

    Copyright 2001 the Southern California OS/2 User Group. ALL RIGHTS RESERVED.

    SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.