SCOUG Logo


Next Meeting: Sat, TBD
Meeting Directions


Be a Member
Join SCOUG

Navigation:


Help with Searching

20 Most Recent Documents
Search Archives
Index by date, title, author, category.


Features:

Mr. Know-It-All
Ink
Download!










SCOUG:

Home

Email Lists

SIGs (Internet, General Interest, Programming, Network, more..)

Online Chats

Business

Past Presentations

Credits

Submissions

Contact SCOUG

Copyright SCOUG



warp expowest
Pictures from Sept. 1999

The views expressed in articles on this site are those of their authors.

warptech
SCOUG was there!


Copyright 1998-2024, Southern California OS/2 User Group. ALL RIGHTS RESERVED.

SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.

The Southern California OS/2 User Group
USA
Free Software -
Come And Get It

Hobbes site administrator changes look and feel

Builds a better search engine


by Peter Skye

L
AS CRUCES, NEW MEXICO --- It's an awesome responsibility for a 19-year-old.  Being site administrator for the largest public OS/2 repository in the world could make a young man feel a little queasy.  Sort of like that feeling you get right before finals.

           Josh Shagam (that's pronounced "shag 'em") is a full-time undergrad at New Mexico State University, where he studies Computer Science.  He's also a part-time student assistant in the Information Services department of the Computing and Networking department of the Systems and Engineering division of NMSU, which just thinking about can make you queasy.  And it is in that very position of student assistant that Josh administers and modifies the well-known Hobbes OS/2 site with an incessant urge to improve it.

           "I report to Dave Rocks," says Josh.  "He's the head of Information Services, and an OS/2 fan to the extreme degree.  Dave is one of the reasons Hobbes is around and as supported as it is."

           "The Hobbes site used to be maintained by Curtis Ewing, but he was overburdened so the task was given to me.  Someone else had it before him and just handed it over to him, and he just didn't have the time to work on it.  He's completely off the Hobbes project now, except for maintaining the hardware.  Curtis works in the Systems and Engineering division, so he gets to be bothered at 2 a.m. when Hobbes catches on fire or whatever."

Sharks In The Creek

           Where do we start.  The history?  The hardware?  The database engine?

           Let's start with the lawsuit.

           "As for competing repositories, the most dangerous one is Walnut Creek's OS/2 archive, also called Hobbes," ventures Josh.  "It's not dangerous in terms of files -- it only has 900 megs or so -- it's dangerous because it confuses people."

           "I know I'm going to take a lot of heat for talking about this, but it's something that has to be said.  Walnut Creek's Hobbes has nothing to do with NMSU's.  A few years ago, Walnut Creek wanted to mirror Hobbes and put out a quarterly CD-ROM compilation for profit."

           "Mirrors aren't a problem -- we've got a good dozen official mirrors, and some unknown number of 'leech sites'."  Josh stops to explain that "leech sites" are run by people who mirror Hobbes and leech, or use up, its bandwidth by downloading everything in sight but don't inform him of their presence, so he can't put them on his mirror list for other people to use.  "Nobody knows about them, so they just end up eating up bandwidth and not really contributing anything to OS/2 users."

           "The problem with Walnut was the 'for profit' thing; either Walnut Creek was going to make a lot of money off of University work, or the University was going to get a cut of it, which is also verboten since NMSU is a non-profit organization.  Except for football, but we won't get into that issue," he adds with a laugh.

           "Anyhow, Walnut Creek mirrored us one time anyway, called their site Hobbes, and pretended to have nothing to do with us, accepted their own submissions, all that.  Later on, they decided to try to sue us for stealing the name and idea from them."

           Josh confesses that the details on all this are sketchy, since this was before his time.  But he's nonplused that Walnut Creek continues to call their Hobbes the biggest OS/2 archive out there when "it's 900 megs which includes their lackluster Java and Multimedia archives, whereas the real Hobbes is 2,500 megs of OS/2 alone.  And they don't even have a web-based interface."  (Walnut Creek's download site is ftp only, while NMSU's Hobbes is http and has a search engine.)

           "Fortunately," he concludes, "I've managed to clear up most of the confusion so people don't ask me 'When's the new CD coming out?' too often anymore.  Also, by strictly enforcing the format of the uploads' .txt files and the placement directories, I can be sure that uploaders know which Hobbes they're dealing with."

Almost Twenty Years Ago

           Josh Shagam was born June 14, 1978 in Tucson, Arizona and raised in Albuquerque, New Mexico.  His family still resides there; his older sister is a graduate student at the University of New Mexico in Albuquerque, and his younger brother is in high school.  The family home is graced with three cats.

           "Oh, one little ironic point I'd like to bring up," says Josh.  "My high school was about a block away from where Microsoft was founded."

           "And now I run the world's largest OS/2 archive.  Albuquerque produced both Microsoft and the Hobbes archiver.  Does that make Albuquerque good, or bad?  Or both?"

           Josh chose NMSU because of its good Computer Science department and low tuition for New Mexico residents.

The History Of Hobbes

           "The Hobbes system itself is an old RS6000-power2 box running AIX, and parts of it are intermittently failing.  It sits in the NMSU Computing and Networking machine room along with all of the important servers, and when I visit it I swear I can sometimes hear it complaining about a panel of diodes down its left side."  AIX is an IBM version of Unix.  "I do all the work on Hobbes from pc-rocks5, a P166 running OS/2.  They're both hooked up to the Internet via the campus network."

           "Officially, Hobbes is named after the philosopher.  This is to avoid copyright issues, as it was originally one of a pair of NeXT cubes, which were named Calvin and Hobbes.  Calvin hosted a multimedia archive (the details are sketchy on that, since it was so long ago), and when the cubes died their files were transplanted onto the single RS6k.  I've asked many times about the full history of Hobbes, and nobody seems to remember anymore, but from what I understand, the original NeXT cube was online in 1992 or 1993 or so, and the RS6k transmogrification occurred in 1995.  Hobbes isn't really a high spending priority for the department, which is why it's made out of pretty-much discarded hardware."

Head Out On The Highway

           Mr. Shagam is an easygoing young man with a congenial wit, and I take note of his age.  He's not quite 20, yet he's finished his first three years of college.  That would make him a 16-year-old high school graduate.  Clearly, somebody thinks he has something on the ball.

           "My two major interests in high school were computers and not getting beat up," he says.  He's a bicycling enthusiast, using his politically correct two-wheeler for both business and pleasure.  "I ride my bike pretty much everywhere, both as my primary means of transportation and as recreation."  Let's see, attach a little generator to the front wheel hub, lash a portable onto the handlebars, and Josh could compute while he's wheeling.  And a cellular phone would give him access to the Internet.  Not a bad idea, and big downloads would really keep you in shape.

The Hobbes Database Engine

           "My favorite part of the Hobbes site is the killer database engine which runs it," says Josh.

           "Why," I ask?

           "Because I wrote it," replies Josh.

           "The old Hobbes database engine, which was called 'dls', was incredibly slow and excruciatingly buggy.  You should see some of the the bug workarounds I had in the shell scripts."

           "When dls took out the entire database again last December, I decided that was the last time.  I spent my entire winter vacation slamming out new specs and code.  I wrote the new database engine completely from scratch in C++."

           Josh rattles off his favorites of the well over a dozen programming languages he knows.  "C/C++, Java, Pascal, Modula-2, Prolog, Scheme/LISP, I like a wide variety of programming languages."  That's a good attitude to have, I note next to his remark on my page.  An open mind leads to better software.

           "The new database engine has been in place since January, and it was an instant improvement over the old engine.  The old engine only had one attribute for each file (the description), and even that usually got messed up."

           Josh tells me that the engine runs on Hobbes itself, and that it's a custom extension to the existing file system that he sort of squeezed in where it would be transparent.  "Each directory has two index files, the attribute index and the search index.  The search index is just the attribute index of the current directory combined with the search index of all the subdirectories (I like recursive definitions).  The search indexes are all rebuilt once every 15 minutes to keep them current.  Part of the power in the engine is that since it is in C++, everything is modularly flexible in ways that make the search indexes very fast to rebuild."

           "The reason that I like it so much is that it's mine," he concludes.  "Coder's pride -- never buy something worse than what you can code yourself."

Hobbes and the Hornet

           Josh drew a lot of inspiration for the new Hobbes engine from a site I'd never heard of, The Hornet Archive.  "It's the world's largest demoscene website," says the engine man.  "It has several gigabytes of programmer 'demos' where they show their skill and experiments in creative coding, as well as tracked music.  To keep the Hornet site's maintainers sane, they have a large PERL-based database system, and it is from it that I drew much of the inspiration for the Hobbes engine."

           Being an inquisitive reporter, I run a "who-is" on the Hornet url.  Back comes Hornet's address . . . in Walnut Creek.  And their servers?  Walnut Creek's.  My, it's a small world.

           "Hornet has features which the Hobbes engine doesn't have, such as the ability to tag multiple files for download and having links to the files on mirror sites.  I prefer the Hobbes engine, however.  For starters, it's a lot faster, even though we're only running an ancient RS6000 and they've got a dual-Pentium Pro 200 or something."  PERL, the Hornet site's language choice, is an interpreted language.  C++ is compiled to machine code.

           "Other things I prefer about Hobbes' engine is that it's self-cleaning and completely dynamic.  Hornet rebuilds the index.html files once a day, whereas with Hobbes you're almost always seeing the database as it is at that very second.  Plus, the Hobbes engine has plenty of redundancy.  For example, if one of the file databases gets corrupted, you can rebuild it from one of the search databases.  The search databases themselves are rebuilt every 15 minutes, and the main one is backed up daily with a week's worth of backlogs."  Josh deserves a consultancy in disaster planning.  "This is all just paranoia on my part," he continues, "after some bad experiences with the old database engine.  In truth, we've not had a single corruption since the new engine was installed."

Administrator Josh

           Josh took on the Hobbes administrator's job last January, and immediately got to work on improvements.  With his new database engine tucked into the aging RS6000 and his list of ideas being steadily converted from stardreams to substance, the site has changed dramatically.

           "I've known UNIX for quite some time," Josh notes (Hobbes, again, is an AIX box), "and so within two days I was improving the process."

           "The first thing that went was the process by which files are archived; each file originally went through a good 20 steps, taking up to 5 minutes for each file.  Not fun.  I wrote a series of shell scripts to reduce that to four steps per file."

           "The next thing to go was the horrible file browser.  It just wasn't very nice to look at, was a pain to use, and excessively buggy (though a lot of that was the fault of the dls engine)."  The new browser, I note later, does sport a nice look.

           "I also noticed that about 99% of the email I got from users was people asking me if I had such-and-such a file to do such-and-such a thing, so I wrote a search engine (which the browser was actually a part of)."

           "Next to go was the horrible homepage.  It took several minutes to load and was covered in pointless images and client-side imagemaps, and at the same time that I was overhauling the homepage I was also overhauling the directory structure.  I took a lot of heat for the new directory structure at first, of which I think most of it was unfounded; it seemed like these users actually preferred to have to look in /os2/wpsutil, /os2/textutil, /os2/network/tcpip, and a few other nonsensical places for an FTP client, and to have to browse through hundreds of files per directory and sort through it themselves.  Now you just go to /pub/os2/apps/internet/ftp/client.  Yes, it's longer directory names, but they actually make sense, and when you have 3 gigabytes of files and you're growing every day, you kind of need more than 10 directories."

Growing Every Day?

           I asked Josh for some statistics about his site, both historical and current.  Unfortunately, the prior administrators didn't keep much in the way of "metrics" for future generations, but the current stats look good indeed.

           "I have about 70 FTP users and maybe 100 HTTP users on at any given time," Josh responds to my question.  "I haven't checked the transfer statistics lately, but it's several thousand hits a day."  Let's see, "several" might mean "4", so 4000 divided by 170 is, where's my slide rule, log of 4000 minus log of 170, that's 24 sessions per day.  That puts the average Hobbes camper at about one hour.  That's good, very very good.  It means that when they come, they download, they don't just browse.  They're finding what they're looking for.

           The biggest file on Hobbes is the 37+ MB /pub/os2/apps/misc/warpdemo.zip, and the smallest is "a many, many-way tie."  And what files are popular?  "It depends on what's going on with OS/2 at the time.  When people are getting fed up with Netscape it's os2/apps/internet/www/browser.  When the latest Win32-OS/2 converter comes out, it's os2/util/convert (there's a lot of deprived Quake fans out there).  Really, it changes all the time.  If you want to see a list of all the files, http://hobbes.nmsu.edu/pub/00global.gz is your answer."

           And the future?

           "Hobbes is constantly growing in size.  The only time I've trimmed files other than getting rid of old versions was when I was finally finishing up the directory restructuring.  There were a few hundred files which were from the NeXT-cube days which nobody knew what they were and nobody cared, and I was fed up with still having the /old/os2 directory structure around.  So I deleted them."

Welcome Back

           If you haven't visited Hobbes lately, take a look.  It looks a little different and acts a little different, and the freshness is alluring.  You'll find the old files you never bothered to download and new versions of the ones you did, and perhaps one or two you didn't even know about.

           Josh Shagam built the new engine for you, and he redesigned the site for you, and he processes the daily uploads for your benefit, not for his.  He's waiting for you, and Hobbes is waiting for you.

           It's yours for the taking.


References

Josh Shagam, archiver@hobbes.nmsu.edu

Hobbes, http://hobbes.nmsu.edu

The Hornet Archive, http://www.hornet.org

New Mexico State University, http://www.nmsu.edu

Walnut Creek CDROM, http://www.cdrom.com, ftp://ftp.cdrom.com/

Whois, http://rs.internic.net/cgi-bin/whois


Section List

Sharks In The Creek
Almost Twenty Years Ago
The History Of Hobbes
Head Out On The Highway
The Hobbes Database Engine
Hobbes and the Hornet
Administrator Josh
Growing Every Day?
Welcome Back



The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA

Copyright 1998 the Southern California OS/2 User Group. ALL RIGHTS RESERVED.

SCOUG is a trademark of the Southern California OS/2 User Group.
OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation.
All other trademarks remain the property of their respective owners.