roff tools are basicly for non-WYSIWYG
(What You See Is What You Get) word processing.
One usefull tool for converting these things to
plan text files, if you have it available,
is the 'col' utility, originating from the UNIX world.
According to my notes, 'col -b' is a good way to
convert man pages to plain, readable text.
'-b' flag is to eliminate all back spaces from the input.
here is a sed script I have used to convert
man pages to plain text:
# unman.sed D.E.L. 28/10/98 modified from:
#
#sedman -- deformat nroff-formatted man pages
# from "SED and AWK", 1rst edition,
# by Dale Dougherty, O'Reilly & Associates, Inc.
# Chapter 5, Stripping Out Non-Printable Characters from nroff Files,
# pages 94-96
# D.E.L. 28/10/98
#
s/_.\([A-Za-z]\)/\1/g
# s/_^H\([A-Za-z]\)/\1/g
# trying to get rid of underlining in man pages. D.E.L.
#
s/\(.\).\1/\1/g
# s/\(.\)^H\1/\1/g
# not in D**2, but seems to be needed to eliminate overstricken
# characters, otherwise show up as doubled. D.E.L.
#
s/.//g
# s/^H//g
#
s/.9//g
# s/^[9//g
#
s/ ]*//g
# s/^[ ^I]*//g
#
s/ / /g
# s/^I/ /g
#
I've spelled out some of the control characters in the comments,
(lines beginning with '#'), in case any are mangaled in transit
or your breaking this out into a file, so you'll know what they are
are supposed to be.
Some of the exetensions past the original source
(the O'Reilly Sed & Awk book) were things I found
usefull for a more complete conversion from man/nroff
to plain text.
Anyway, this is what I can contribute to this subject right now.
Regards,
Dallas E. Legan II / leganii@surfree.com / dallasii@kincyb.com
Powered by......Lynx, the Internet at hyperkinetic speed.
=====================================================
To unsubscribe from this list, send an email message
to "steward@scoug.com". In the body of the message,
put the command "unsubscribe scoug-help".
For problems, contact the list owner at
"rollin@scoug.com".
=====================================================
>> Next Message >>
Return to [ 15 |
April |
2002 ]
The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA
Copyright 2001 the Southern California OS/2 User Group. ALL RIGHTS
RESERVED.
SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group.
OS/2, Workplace Shell, and IBM are registered trademarks of International
Business Machines Corporation.
All other trademarks remain the property of their respective owners.