What is gzip? – the compression tool in focus
The data compression tool gzip was developed by Jean-Loup Gailly and Mark Adler, who had the task of developing a powerful alternative to the Unix-written programme, compress. Its functions and behaviors are defined in the POSIX standard, which enables file compression using the adaptive Lempel-Ziv algorithm. However, this algorithm, as well as its extension by Terry Welch, was protected by US patents for many years (until 2013), which was one of the reasons for working on an adequate replacement. Furthermore, the focus was on making the gzip compression much more efficient than the Unix tool – with success.
How does gzip work?
Gzip is the abbreviation of 'GNU zip' and is based on the freely usable deflate algorithm, which is a variation of the data decompression method LZ77 (Lemprl-Ziv 77), as well as the Huffman coding. Using these techniques, gzip files scan for duplicate data strings. If the programme encounters these recurring sequences, it replaces them with a link to the string that first appears. The length of these sequences is usually limited to 32,000 bytes. If a character string does not appear in the previous 32,000 bytes, it is stored (without being compressed) in the gzip file that receives the .gz ending. The procedure is limited to individual files, which is why the pack programme, tar, is needed to create so-called tarball archives with the endings: .tar, .gz, and .tgz.
By default, the source file is deleted after gzip compression. However, you can disable this automatic function with the optional parameter (-k).
To unpack compressed packets, you can either use the gunzip application or the corresponding gzip command. The properties and structure of the gzip format into which files are converted when compressed, are described as 1951 and 1952 RFC specifications. This includes, for example, a 10-byte header with version number and time stamp, optional additional headers, which store the original file name of the source file, and an 8-byte footer, which contains a checksum for detecting errors, among other things.
When is the compression tool used?
Gzip was originally developed for the UNIXoid platform GNU, but is now used across practically all platforms as long as the GPL license selected for the project is considered. For example, on Linux systems, the compression tool is usually installed automatically, or alternatively contained in the package management and ready to be installed. In addition to various versions of older operating systems, there are also versions for macOS and Windows on the official website. Added to this is the fact that web server applications, such as Apache, have been mastering gzip compression for years – even if the function isn’t always used – as well as the ability of modern browsers to interpret the compressed files and unpack them during web page rendering. In web development, gzip fully shows off its strengths: when the process is activated, the web server automatically initiates the compression of website elements that have been uploaded in the webspace as well as those that have been dynamically created. In this way, the loading time of the website can be significantly reduced for visitors. Since users only have to load the compressed data packets, the pages also load considerably faster. The browser takes over the decompression in the background without requiring additional bandwidth. Users of mobile devices, in particular, benefit from this performance boost, which indirectly has a positive impact on search engine ranking.
gzip: syntax and command overview
Even if there are graphical interfaces for different platforms, gzip’s operation via the terminal or the input request is, of course, easily possible. This won’t be a big challenge for beginners, since gzip is optimised as a classic command line tool for this purpose. The general syntax has the following form:
gzip [OPTION]…[FILE]…
Specifying the options is by no means mandatory. If the field remains empty, gzip simply reverts to the default settings. For example, this simple command is enough:
gzip example.txt
to create a compressed version of the text file example.txt. However, to unpack files later, or to provide specific instructions on the compression ratio, the filing location, or the handling of the original file, the commands require the right specifications. The following table provides an overview of the most important gzip commands:
Option | Description |
---|---|
-1 … -9 | Defines the compression ratio (1-9), with 1 being the weakest and faster compression, and 9 being the best, but slowest compression; the default value is 5 |
-r | Searches the directory (including all subdirectories) recursively and compresses or decompresses all contained files |
-f | Forces the gzip compression and overwrites already existing files with the same file name, if necessary |
-d | Unpacks the selected file into the current directory |
-k | Prevents deletion of the original file |
-l | Reveals information such as the compression ratio of a packed file |
-c | Issues the compressed file in the standard output; usually the screen connected to the command line |
-q | Disables all gzip notifications |
-t | Tests the integrity of the packaged file |
-h | Lists all available options |
How to use gzip compression for your Apache web project
Web servers usually offer the practical compression process in the form of a module, which must also be activated. Nowadays, many webhosting providers share this feature, but in the past, it wasn’t the case. The reason was probably because the compression process required additional processor power. If you’re unsure whether gzip is authorised by your host, you can either contact your host directly or alternatively undertake a manual query. For example, with an Apache web server, check the module settings using a simple phpinfo() output. The entry HTTP_ACCEPT_ENCODING tells you which compression method is selected.
If gzip is available, you have several options in using the compression assistant for whatever purpose.
Activating Gzip compression in the .htaccess file
You can use an .htaccess file to carry out directory-specific settings (these apply to the current directory and all subdirectories) and configure your web server in real-time. This makes it possible for the configuration file, which is typically found in the root directory, to be read out automatically to every request that reaches the server. With some webhosting providers, the .htaccess file, is, however, stored in a different folder, hidden, or even blocked from access. In this case, the only option you have is to contact the host and ask for access. If you can carry out the configurations, turn on the gzip compression (mod_gzip) or the deflate algorithm (mod_deflate) by adding the following code to the .htaccess file:
<IfModule mod_gzip.c>
mod_gzip_on Yes
mod_gzip_dechunk Yes
mod_gzip_item_include file .(html?|txt|css|js|php|pl)$
mod_gzip_item_include handler ^cgi-script$
mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime ^application/x-javascript.*
mod_gzip_item_exclude mime ^image/.*
mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</ifModule>
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
</IfModule>
You need to allow access to the mod_gzip and mod_deflate modules, which you can do through your webhost.
Enable gzip compression via PHP
It is possible to activate the compression process using a simple PHP command. But there’s a catch: the code must be entered individually for each PHP document. Therefore, you should only use this option if you don’t have the necessary rights to edit the .htaccess file. The code line you must place at the beginning of each line is as follows:
<?php
ob_start("ob_gzhandler");
?>
Implement gzip via CMS plugin
In addition to these two manual solutions, there is also a variant that requires only minimal effort to set-up: activating the gzip compression using a plugin for the content management system you are using. Such useful extensions, which you can embed within a few minutes and adapt to your needs, are primarily for CMSs like WordPress, which are based on PHP. The following list contains three of the most popular plugins for the weblog software:
- W3 Total Cache: The WordPress plugin W3 TotalCache promises to improve website performance ten-fold. In addition to various caching mechanisms and special mobile support, the SEO and usability suite also includes options to activate gzip compression.
- Check and enable GZIP compression: This extension, which was also developed for CMS WordPress, enables you to check if gzip compression is enabled for your project. For this purpose, the plugin relies on the online service checkgzipcompression.com. If the compression is turned off, this plugin will help you with the setup.
- WP Performance Score Booster: More than 150,000 downloads and 30,000 active installations can be found in the WP Performance Score Booster extension. Using the plugin, you can easily activate gzip and effectively compress your web project’s contents, such as text, HTML, JavaScript, CSS, XML, and more.
For other systems like Joomla!, you don’t even need an extension. Here, functions for activating compression techniques are already included as standard.
How to activate gzip on your NGINX web server
When you deliver your site content using an NGINX web server, you have the possibility of also using the gzip process to improve your project’s loading time. To do this, you only have to configure the ngx_http_gzip_module module. By default, the 'gzip' directive used to activate or deactivate the compression service is turned off. To change this setting, open nginx.conf and search for the 'gzip' directive. Then just change 'gzip off' to 'gzip on'. The following table shows the meaning and possibilities of other directives for configuring NGINX gzip compression:
Directive | Syntax | Standard Setting | Description |
---|---|---|---|
gzip_buffers | gzip_buffers - number and size; | gzip_buffers 32 4k, 16 8k; | Defines the buffer number and size for the compression process |
gzip_comp_level | gzip_comp_level – compression ratio; | gzip_comp_level 1; | Specifies the compression ratio; possible values: 1–9 |
gzip_min_length | gzip_min_length – minimum length; | gzip_min_length 20; | Specifies the minimum length of a packages file in bytes |
gzip_http_version | gzip_http_version – version number; | gzip_http_version 1.1; | Specifies the HTTP version from when a request should be answered with a compressed response |
gzip_types | gzip_types – content type; | gzip_types text/html; | Regulates which content types the compression should apply to (also possible: CSS, JSON, XML, …) |
How you can test compression
If you have configured the gzip compression for your web presence, you can use various online tools to configure the process to your liking, so that your web server delivers compressed content when you request it. Above all, we recommend the Google app PageSpeed Insights. After entering one of your website’s pages into the search field, the tool automatically analyses the content and then informs you about your website’s performance strengths and weaknesses. You will receive information on whether gzip compression is enabled and you can also run a simple gzip test using the HTTP Compression Test on WhatsMyIP.org.
Since a single test won’t necessarily run flawlessly, it is recommended that you always check several pages of your web project.