Introduction
Web Servers are special purpose transaction processing systems that are ubiquitous in today's data processing environment. Like all transaction processing systems these have several distinct performance bottlenecks that need to be considered during application design and production roll out. This paper discusses the ramifications of these bottlenecks as web server throughput requirements increase.
Today's computing platforms vary in size from small, single application servers (PC's) handling several transactions per week up to mega-application servers (Mainframes) handling tens of thousands of transactions per second while simultaneously running hundreds to thousands of other applications. Choosing the best computing platform for a web application can save money and reduce production problems. Choosing the best application architecture for a web application can save any company lots of money during development and maintenance.
Web server software and the much newer application server software interact with the hardware and software systems which embody today's computing platforms. Understanding how these interactions work is key to choosing the best architecture and computing platform for a web application.
Web Server Basics
In their simplest form, web servers are bulk data transfer programs. Their user interface is a web browser running on someone's PC. Whenever a web browser requests a web site, or a hyperlink is activated, a request is sent to the web server. This request has four parts. The first defines the type of file request. The second is the name or address of the web server and an optional TCP/IP port number. The third is the name of the file on the web server. The fourth is a set of parameter information. The third and fourth parts are optional.
HTTP://www.destination.com:optional port number/optional filename?optional data
or
HTTP://IP Address:optional port number/optional filename?optional data
When the request reaches the web server it performs three simple actions. It first determines the file type. Then it reads the contents of the file. Finally it sends the contents of the file back to the web browser. If the file name was left out of the request the web server substitutes a default file name such as Welcome.html, Index.html or another installation-defined name.
There are several important points about this simple process.
- Most data travelling between the web browser and the web server is text data.
- The requests going to the web server from the web browser are very small, usually on the order of 100 or less bytes.
- The data returned to the web browser from the web server is usually large, ranging from 1,000 to 100,000 bytes.
- While most requests sent to a web server have a destination name in them, TCP/IP requires an IP address to direct the request transfer. The destination name must be translated into an IP address before the request is sent.
These points have some interesting ramifications. The first point about the data being textual means that only about 1/3 of the bandwidth of the communication line is being used. This is because text data accounts for only about 1/3 of the bit combinations of a byte. Therefore, 2/3 of each byte sent between web browsers and web servers is empty. Thus, web applications require 3 times more communications speed than binary data would. A 378k bit per second communications line will run with the effective speed of a 128k bit per second communications line.
Small requests coming out of the web browser and large data amounts coming into the web browser mean that web servers experience small requests coming in and large data amounts going out. This difference in communication amounts mean that asymmetrical communications capacities make sense for the web environment. However, what is good for the web browser is bad for the web server. Many companies offer consumers high download and low upload speeds. Web servers need the opposite. Web servers need high upload speeds while they can get by with low download speeds.
Finally, using a destination name, such as www.destinationname.com turns a simple browser request into two browser requests. The first request has to go to a name server whose IP address is already known to the web browser. This name server accepts the destination name as input and returns the destination's IP address. If the name server does not contain an entry for the destination name it must make a request to the next name server up in its hierarchy, and so on until the name is finally found.
One of the most important benefits of using a computer is how quickly they get things done. Let's take a look at how long it takes to perform each part of a typical web browser request, starting from the time the user strikes the enter key or clicks on a hyper link. Let's further assume that the web browser request is typical, consisting of a destination name, not an IP address, and a file name, but no optional data.
browser request: HTTP://www.destination.com/GiveMeThisFile.html
- Convert www.destination.com name to IP address. Send the name to the Domain Name Server. If the name is known (let's assume it is for this scenario), return the IP address. The DNS usually maintains the destination name / IP information in memory so no disk access is required.
- For a business user the Domain Name Server is usually on the same LAN as they are.
- If the LAN is based on Token Ring technology and there are less than 256 users or the LAN is based on 100 Mbps Ethernet and there are less than 256 users then the communication time across the LAN and back is usually less than a couple of milliseconds (ms); a millisecond is 1/1000 of a second.
- If the LAN is 10 Mbps Ethernet and there are less than 32 users then the communication time across the LAN and back is usually less than a couple of ms. If there are more than 32 users the communication time back and fourth is usually less than 10 ms.
- For the consumer user the Domain Name Server is usually across the Internet. For a consumer user the time to get to the DNS and back is usually about 60 ms.
- Send the web browser request to the web server. These requests are usually very small. The request in this example is 46 bytes. This small size means that the time taken is almost entirely dependent on getting the request to the server and is not dependent on the amount of time it takes to send the contents of the request itself.
- If the web server is on an Intranet (an internal company internet) riding on top of the company LAN then this request will usually take less than 10 ms.
- If the request is travelling over the Internet then this request will take from 15 to 200 ms. with a reasonable average time of about 60 ms.
The total time taken through the completion of this step is about 12 ms for a company and 120 ms for a consumer.
At this point the consumer user is already starting to run as if his web browser is a floppy disk based application. Floppy disks rotate at 360 RPM and each request for data from them waits an average of 1/2 rotation before the data can be read or written. 360 RPM means that each rotation takes 167 ms. 1/2 rotation takes 84 ms. Translating the destination name to an IP address would take 84 ms and writing the web request to the floppy disk would take another 84 ms for a total of 168 ms. So, it is fair to say that the Internet, when seen from a consumers viewpoint, is like a floppy disk based application.
- Receive the web browser request at the web server and open the file specified in the request. Receiving the request can be considered to take no time. To open the file the disk subsystem has to (1) be interrogated for the location of the file on the disk subsystem, (2) reserve the file name so that no other process on the web server's computing platform can write to or delete the file while the web browser is reading its contents, and (3) allocate memory buffers to hold the file data.
If the file location information is not already in memory a physical disk I/O must take place to the file allocation table, the system catalog, or their equivalent. If the file location is already in memory then there is only a CPU and memory cost associated with the request. The CPU cost associated with opening a file usually runs about 100,000 machine instructions or generally less than 1 ms on most of today's computers. The memory buffers should normally be less than 8 or 9 blocks or clusters (or about 50 sectors ) in size. If a disk I/O is required the time to service this step goes up.
Up until a couple of years ago almost all hard disks spun at 3600 RPM. 3600 RPM means 60 spins per second, each spin therefore takes 17 ms. Each request to read data from a hard disk usually has to wait for 1/2 of a spin, or about 8.5 ms. If the read head had to be moved to position itself above the location of the data on the hard disk that adds another 5 to 12 ms. Typically, the read head has to move, so the average read request usually takes 13.5 ms. The average disk drive now spins at 5400 or 7200 RPM, so the spin time has dropped to 8.5 ms (at 7200 RPM), and the average wait time has dropped to 4.25 ms. This gives the average disk read a time of 4.25 ms + 5 ms = 9.25 ms. Today's high performance hard drives spin at 10,000 RPM which drops the spin time to 6 ms, the average wait time to 3 ms, and the average read time to 8 ms. For this discussion we can assume a 7200 RPM hard drive with an average read time of 9.25 ms.
- Read the file contents. The average response returned to a web browser is getting larger every day. But today it is between 2,000 and 50,000 bytes. Using the disk information described in the previous step we can assume that the file contents can be accessed in one spin of the disk, after the read has started. This works out to another 9.25 ms to get to the start of the data file on the disk and another 8.5 ms for the disk to spin around once. The total time required is 9.25 + 8.5 = 17.75 ms. If the file is fragmented on the disk then the cost would be another 17.75 ms for each fragment.
At the end of this step the accumulated time used by this request is 29.75 ms for the business user and 137.75 ms for the consumer.
- Return the file contents to the web browser. Assuming 25,000 bytes are being returned to the web browser, it will take 25 ms for the business user, 25 ms for the consumer using a cable modem, and 4.7 seconds for a modem user.
The total time accumulated for the business user is 54.75 ms. The total time for the consumer ranges from 162.75 ms to 4838 ms (4.8 seconds), depending on their communications link.
Summarizing the Basic Steps
If we only consider the web server part of this sequence, we see that the web server component only takes 1 ms + 9.25 ms + 17.75 ms = 28 ms with 27 ms of this taking place on the hard disk. If nothing else was going on in this computer then the web browser could support 1000 ms / 28 ms = 35 web browser requests per second, assuming that the files being retrieved from disk were not fragmented and there was only one disk drive. If these files were cached in memory then this number could go up to 40 per second before the LAN bandwidth saturated. If additional physical LAN's were implemented than each LAN could handle 40 web server requests per second. Eventually the CPU would saturate or the number of LAN adapters would reach the hardware limit.
There are two important bottlenecks identified here. The first is the 35 web requests per hard disk. The second is the 40 web requests per LAN. If the web requests could be spread across two or more disks than each disk could support 35 requests per second.
If the web server resides on a PC it generally has IDE drives. IDE drives have the simple restriction that only one drive can be accessed at a time, and only one I/O request can be handled at a time. With Enhanced IDE (or ATA) if the PC has four hard drives, there could be two I/O requests active at one time, depending on operating system support. The PC based web server is limited to 70 web requests per second (based on 2 I/O request active at one time), regardless of the number of hard drives in the computer.
Servers usually have SCSI drives. SCSI drive subsystems have the simple restriction that only one I/O request can be active per drive. But each drive can have its own separate I/O request in progress simultaneously. Today's servers can have up to 15 drives per SCSI adapter and up to 4 SCSI adapters. So a server can handle up to 35 web requests per disk with up to 60 disks, for a total web request processing capacity of over 2100 per second. If the server is using the new 10,000 RPM drives it could execute 47 requests per second per drive for a total exceeding 2820 per second. Of course, the reader must bear in mind that the communication capacity is still restricting the server to 40 web requests per second per LAN adapter.
If the web server is running on a mainframe the hard drives will be spinning at 10,000 RPM and each drive will be able to support four I/O requests simultaneously, so they could support 47 x 4 = 188 web requests per second per hard drive. A typical mainframe could support over 18,800 web requests per second based on its hard disk capacity alone.
In summary, using these numbers, a typical PC based web server could support 70 web transactions per second, for an aggregate rate of 252,000 requests per hour. The average Server can support up to four LAN adapters for a total of 160 web requests per second, for an aggregate rate of 576,000 web requests per hour. The average mainframe can support as many hard drives and LAN adapters (or communications links and speeds) as are needed for an aggregate rate of 58,000,000 web transactions per hour.
If a simple PC can handle over 6,000,000 web requests per day why does anyone run into performance issues with web servers? The reason is that only the mainframe based web servers can actually achieve these web request rates. The other computing platforms: PC's, Servers and Minicomputers run into throughput constraints based on attributes other than disk and communication channel speeds.
Fundamental, But Often Overlooked, Bottlenecks
There are four fundamental bottlenecks in PC, Server and Minicomputer systems. These are the hardware bus structure, hard disk congestion, poor task switching algorithms and poor I/O scheduling algorithms. Hardware buses provide a single path between all computer peripherals, memory and the processor. This single path constrains the amount of data that can flow within a computer system. Hard disks cannot achieve their rated capacity. Empirical studies have shown that once a disk drive reaches 18% of capacity its throughput degrades quickly and significantly. Poor task switching and I/O scheduling algorithms constrain the speed at which programs can execute and data can be moved. Together, these four significantly reduce the mount of work that can be performed on their computing platforms.
Bus architectures are a serial resource, much like a communications line. Serial resources, if not operating under strict rules of engagement, suffer from congestion which significantly reduces their theoretical capacity. Today's fastest PC and Server architectures uses PCI based devices for maximum throughput, ISA devices for general compatibility, and a separate memory bus to speed processor capacity. The PCI bus has a 33 MHz clock speed and handles 4 bytes at a time for an aggregate data rate of 132 MB per second. The ISA bus has a clock speed of 8 MHz and handles 2 bytes at a time for an aggregate data rate of 16 MB per second. The memory subsystem runs at 100 MHz and is 4 bytes wide for a maximum data rate of 400 MB per second. Each of these three buses are competing with each other for access to the CPU. Like Ethernet and hard disk drives, when the load on these buses exceeds 15% to 18% their throughput degrades, quickly and significantly.
Since disk drives suffer throughput degradation when their usage reaches 18% of capacity a PC running one IDE drive can be expected to support 6.3 web requests per second. Adding a second disk raises this to 12.6 web requests per second, a third drive takes it to 18.9 and a fourth, the maximum, takes it to 25.2. A PC system could handle 25.2 web requests per second if all the file accesses were uniformly spread throughout four IDE disk drives and no other disk I/O was taking place. Since the average PC usually keeps its data on one drive, the actual data rate should be considered to be 6.3 web requests per second.
All the computing platforms suffer from this throughput degradation of 5.5 times from the theoretical maximum. Mainframe web servers can bull their way through it because the numbers of disk drives they can support far exceeds that of the other computing platforms. Mainframes can simply add more disk drives if they want more performance.
Poor task switching algorithms and poor I/O scheduling algorithms are a matter of public knowledge. PC, Server and Minicomputer (usually UNIX) operating systems are designed to serve one person at a time, or multiple people as if they are the only person on the machine. These operating systems are designed around the response time needs of a person. These operating systems are not capable of driving the CPU close to 100% of its capacity. In fact, the UNIX based operating systems start to saturate at around 35% of the hardware capacity while the PC and Server operating systems probably saturate at under 30%. Mainframe operating systems such as OS/390 run the hardware to over 99% of capacity before saturating.
Task switching and I/O scheduling can be improved by adding more memory to a computer system and this is the primary approach taken to alleviate performance problems on PC, Server and Minicomputer systems. Large memory areas increase the probability that information required by the web server will already be in a buffer somewhere, and thus the physical disk I/O can be avoided. This strategy works as long as a large percentage of the web server data can fit into the memory buffers. Unfortunately, this rarely occurs in real life.
Avalanching Web Requests
The above example described the time it takes to return one file requested by a web browser. This is not a realistic example because most web requests initiated by the user implicitly invoke additional web requests, often 10 or more. Each of these web requests will take about the same amount of time as the first. So even though the business user may be seeing a 55 ms response time for the original web request, in total they may be seeing greater than 1 second response times because of the number of implied web requests generated by their initial request. Similarly, the consumer has the same problem.
Finally, web servers see all of these implicit web requests in addition to the original explicit request made by the user. So, even if the web server disk drives could handle 35 web requests per second, the communication link running at 40 web requests per second can only handle about four explicit requests per second. Thus the perceived throughput of the web servers is actually only about 10% of the work they are performing.
Web Servers As Transaction Processing Systems
Web servers also run application programs. When the file name portion of the web browser request contains a valid program name the web server will ask the operating system to execute that program. To protect against program faults, web servers originally caused these programs to execute in their own process (or address) space. When a program request came from the web browser the web server created a process space and instructed it to run the program. Note that programs are files and incur the same file I/O processing time as the original file request that was discussed above (28 ms to load). The program is executed and the results from the program execution are sent back to the web server and from there returned to the web browser.
Programs run by web servers can be written in any language. Many are written in interpretive languages such as AWK, REXX and JAVA. Before discussing the implications of the languages used, the cost of any program, regardless of language, needs to be investigated. Programs use memory, CPU, and I/O resources. The typical program contains the following three steps: Initialization, Read Write Looping, and Termination. During the initialization processing files are opened, databases are connected to, and memory areas are acquired and initialized. During the Read Write Looping records and database rows are read and written and web output is generated until some criteria is satisfied. During the termination processing files are closed, databases are disconnected and memory is freed.
Each file open uses about 1 ms of CPU time and takes about 10 ms of I/O time. Each physical file I/O performed by the program takes about 10 ms. Each file close takes under 1 ms. If a program opens many files or reads and writes many records it can easily use up a lot of time. The reason why lots of record or byte I/O usually does not eat up a lot of time is that if the I/O is sequential then the file buffer probably already contains the next record, so physical I/O's are not required very often. Depending on the buffer size (and the track capacity of the disk) it might be possible to process 40 or 50 records per each physical file I/O request. If, on the other hand, the I/O requests result in random record accessing then the number of physical I/O's will be very high, regardless of buffer size. In fact, buffering large amounts of data may actually increase the physical I/O's performed, so buffer sizes for random or indexed files should be chosen carefully.
Database connects are especially expensive to create. Instead of taking 1 ms, a database connect can take hundreds of milliseconds. Relational database I/O requests can explode into many different physical I/O requests too. Whenever a relational database is being used in a web application there must be considerable analysis into the performance issues of the database design and the computing platform.
The large overheads associated with process creation, file opens and database connects substantially restrict the amount of throughput that can be received from programs running under the control of a web server. To address these performance issues most web servers have implemented a fast path program execution facility. This fast path facility depends on the programs being reliable and bullet proof. The fast path facility almost always runs the programs in the web server's process space. There is no overhead associated with creating the process space and the web server can cache the program files so that they don't always have to be read from disk for each execution request. But, since they are in the web servers process space they could bring the web server down if they terminated abnormally. Using this facility requires that the programs using it be well architected and developed. Unfortunately, the fast path facility does not address the issue of file opens and database connects.
Application Servers
Online transaction processing systems such as IBM's CICS define the architecture to which most web server environments aspire. CICS is an execution environment where all databases are pre-connected, all files are pre-opened and all programs are pre-located. CICS also provides task switching, I/O control, full recovery from programs abnormally terminating, 24 hour by 7 day processing, and scalability to as many computing engines, memory and disk storage amounts as the user needs for their applications. Application servers are the first step towards a CICS-like execution environment. In fact, most of the operating systems running web servers already provide a gateway into CICS so that high performance web server requirements can be met. But for those without access to CICS or those who require a complete write once, run anywhere attribute, the Application Server is the vehicle to use.
Application Servers have several main features that are of interest to performance and scalability.
- They maintain a pool of open connections to the database managers.
- They talk to the web server directly through TCP/IP so there is no need for file I/O during the connection.
- They support some transaction back out facilities.
- Fourth, they provide easy gateways to CICS when it is available.
- They attempt to preload or cache program files so that the program startup time is minimized as much as possible.
These features are all designed to minimize the amount of time it takes to execute the web request program. The Application server runs in a separate address space, possibly even on a separate computer, and this minimizes its ability to interfere with the web servers main job of supplying files to the web browsers connected to it.
Language Considerations
Most Application Servers are JAVA based. JAVA is an interpreted language. This means that they take a CPU hit when they execute. Interpreted languages use from 10 to 100 times more CPU cycles than compiled languages. If the web application is constrained by CPU resources it might be helpful to change to a compiled language. Or alternately, the application server programs could be converted to wrappers for CICS transactions, and get an overall performance boost.
Program size is another consideration. The more memory a program takes, or the larger the number of bytes it occupies on a disk drive, affects performance. Large programs require more operating system overhead to load and initialize. They also take more time to load. Today's disk drives hold from 30,000 to 60,000 bytes of data per track. If a program is twice that size then it will take at least two spins of the disk to load. Two spins of the disk will take about 17 ms.
Mainframe programs written for OS/390 require much less memory than those for UNIX, Windows NT, Windows 98 and OS/2. A representative program on OS/390 takes 14,000 bytes, while its UNIX and Windows counterparts take over 100,000 bytes.
Assembler programs and interpreted programs use even less bytes than mainframe programs. Assembler programs run at CPU speed, so they are the highest performance alternative. Interpreted programs run 100 to 1000 times slower than assembler programs, but they load faster than compiled programs on UNIX and Windows systems, so their short load time might offset their high CPU time. More importantly, with multiple CPU servers available, the CPU time footprint of a program may be less important than the memory and disk load footprints.
Active Server and Dynamic HTML Pages
One other alternative to standard web processes and application servers is the use of Active Server Pages and Dynamic HTML. Both of these allow the developer to place program code inline with the HTML data. This inline code is executed by the web server instead of (1) firing up another complete process or (2) shipping the request off to an application server. Active Server Pages can perform SQL requests, among others, so they can get direct access to the web servers relational database.
The price paid for this additional flexibility and function is the increased CPU and I/O wait time in the web server process. Application servers and CICS transactions can run on any machine, not just the machine that the web server is running on. This is an important consideration for PC, Server and UNIX platforms, but it is not a consideration for mainframe platforms.
Cost Issues
Many Server and UNIX based web servers handle increased demand by adding processors in the web server box and adding additional computing systems. This impacts the web server cost equation in three ways. First, there is the obvious cost of the new processor or server. While new processors are relatively inexpensive to acquire, new servers run in the $50,000 to $80,000 price range. Additional disk drives might be required to address excessive I/O queueing times. Adding servers also requires that an additional load balancing server be added to the configuration simply to balance the web transaction load over the working web server platforms. Second, there is the increased software licensing costs associated with adding processors and servers. Increasing the number of processors in a server can easily trigger an additional licensing fee for the operating system and database managers. The database managers can increase in cost by $35,000 just by adding one more processor, or one more serve
r. Third, adding servers will increase the support staff requirements and thus increase the people cost of running the servers.
Mainframes are the cheapest platform on which to run web servers. While their up front cost will start in the $200,000 to $300,000 range, people with existing mainframes will incur no additional costs to implement a web server. Mainframes already run in a 24 x 7 environment, and their people costs, as regards supporting the computing platform, will not increase. As more demand is added to the web server there is no need to add more servers, unless the demand increase is really great. Mainframes scale without adding boxes. Worst case, the existing mainframe will be maxed out on processor engines and will require a model upgrade. This upgrade can be accomplished over a weekend and a complete cut over made without ever taking the web servers down.
Summary
Understanding the performance profiles of the different hardware platforms and web server architectures is a necessity when designing high demand web server applications. The days when someone could build a web application in the evening and go live the next day are long gone. Every day there are more stories about web based services coming down for days at a time as their transaction volume hit unexpected peaks.
Many web applications can run on a simple Server based system. As more web applications are added to the server its ability to handle the work load may deteriorate.
Mainframe based web servers are virtually free to implement and operate. There are no additional licensing costs, generally there are no additional hardware costs, and there are no additional people costs. Plus, mainframe web servers can scale past any Server or Minicomputer based web server. Finally, mainframe based servers already have access to the relational database engines and data farms upon which the web application will be based.
Copyright August 1999, Robert B. Chapman; Institute of Advanced Development Strategies, Inc., 11 Mareblu, Suite 130, Aliso Viejo, CA 92656.
|