 |
 |
|
 |
 |
 |
performance: HTTP Compression
Owners:
John Giannandrea (jg@netscape.com),
Eric Bina (ebina@netscape.com)
Last Updated: 15-September-1998
Getting the Apache module source.
This project aims to improve real and perceived web browsing
performance by having the server send compressed HTML files to the
browser, and having the browser uncompress before displaying.
Assuming fast enough processors on most machines these days, the
user should end up seeing the document sooner this way than sending
uncompressed HTML. Also, since a majority of network traffic these
days is HTTP traffic, compressing all HTML sent via HTTP should recover
a significant amount of wasted network bandwidth.
Stage 1 - Content-Encoding: gzip
- Status: Complete
-
The current Mozilla source already sends Accept-encoding: gzip
and can do a streaming decompression of HTML data received with
Content-encoding: gzip. All that is needed is a server
set up to serve this data to mozilla, while maintaining backwards
compatibility with browsers that can't handle the compressed data.
To this end a new Apache 1.3 server module has been written.
It is activated on a per-directory basis with a command in
the access.conf file of the format:
CompressContent Yes
When activated, and only if an Accept-encoding: gzip header
is received, all requests for files from that directory
will be redirected to requests for an equivalent compressed file
from that directory if one exists. In essence if you ask for
foo.html and both it and foo.html.gz exist
then those requests with an appropriate Accept-encoding will
get the compressed file, and other requests will get the uncompressed
file.
This neatly solves the backwards compatibility problem for the browser,
but creates a maintenance problem on the server end. One would need to run
some sort of automated script to regularly maintain up to date compressed
versions of files in the directories that needed them. For a solution
to this maintenance problem, see Stage 2 below.
- Results:
-
Here is an optimal case where all images are in the cache.
| Local | ISDN 64 kbits/sec | 28.8 |
| No GZIP | GZIP | No GZIP | GZIP | No GZIP | GZIP |
| 56.9 sec | 61.0 sec | 105.1 sec | 83.2 sec | 327.9 sec | 121.8 sec |
| 7% Slower | 21% Faster | 63% Faster |
Notes:
- For the Local run both the client and server are running
on the same machine, so we are seeing both the overhead for client unzip,
and the slight extra overhead for the server to locate and send the
gzipped content. (an extra call to stat() a file)
A more realistic workload was then generated simulating a user starting
with an empty cache, and visiting the CNN site to read in order:
Main Page, World, U.S., U.S. Local, Weather, Sci-Tech, Entertainment,
Travel, Health, Style, and In-Depth.
| Local | ISDN 128 kbits/sec | 28.8 | 14.4 |
| No GZIP | GZIP | No GZIP | GZIP | No GZIP | GZIP | No GZIP | GZIP |
| 53.0 sec | 53.2 sec | 82.1 sec | 77.6 sec | 264.7 sec | 184.4 sec | 474.1 sec | 307.7 sec |
| 0.4% Slower | 5.5% Faster | 30% Faster | 35% Faster |
Notes:
- A much more realistic set of data with a mix of image hits and
misses after the first CNN page.
- Note that the gzip cost on the local system is basically lost in the
noise.
- Also all the image loads make the apparent gain at 28.8 much lower.
- It is curious that the 14.4 load doesn't show a greater speedup.
These results seem promising enough to warrant moving on to implementation
of Stage 2.
Stage 2 - Transfer-Encoding: gzip
- Status: Begun
-
Here we hope to use the new HTTP1.1 TE: gzip header to request
compressed versions of HTML files. Then the server would need to do
streaming compression to generate the results. To minimize the overhead
on the server it should keep a cache of the compressed files
to quickly fill future requests for the same compressed data.
The current Mozilla source can already accept and decode
Transfer-encoding: gzip data, but does not currently
send the TE: header. Work has begun on implementing
the streaming compression in the latest Netscape Enterprise
Server. (General call for volunteers to implement
this as a module for Apache 1.3).
Stage 3 - Other compression types
The previous two stages all dealt only with gzip as a form of
compression. While a great general compression scheme, we probably want to
negotiate compression type based on the data type requested. For example
if the client requested with a TE: gzip header data that turned out
to be a JPEG image, the server probably should know not to try to
transfer-encode this with gzip.
Comments etc.
Any comments/questions or any volunteers to do the TE-aware Apache module,
or other work, contact: Eric Bina.
|
 |
 |