What is a hash function? Definition, usage, and examples

Given the massive increase in the amount of data being processed by local and global data networks, computer scientists are always looking for ways to speed up data access and ensure that data can be exchanged securely. One solution they use, alongside other security technologies, is the hash function. This article explains the properties of hash functions and how they are used.

The meaning of the verb ‘to hash’ – to chop or scramble something – provides a clue as to what hash functions do to data. That’s right, they ‘scramble’ data and convert it into a numerical value. And no matter how long the input is, the output value is always of the same length. Hash functions are also referred to as hashing algorithms or message digest functions. They are used across many areas of computer science, for example:

  • To encrypt communication between web servers and browsers, and generate session IDs for internet applications and data caching
  • To protect sensitive data such as passwords, web analytics, and payment details
  • To add digital signatures to emails
  • To locate identical or similar data sets via lookup functions
Definition

A hash function converts strings of different length into fixed-length strings known as hash values or digests. You can use hashing to scramble passwords into strings of authorised characters for example. The output values cannot be inverted to produce the original input.

What are the properties of hash functions?

Hash functions are designed so that they have the following properties:

One-way

Once a hash value has been generated, it must be impossible to convert it back into the original data. For instance, in the example above, there must be no way of converting ‘$P$Hv8rpLanTSYSA/2bP1xN.S6Mdk32.Z3’ back into ‘susi_562#alone’.

Collision-free

For a hash function to be collision-free, no two strings can map to the same output hash. In other words, every input string must generate a unique output string. This type of hash function is also referred to as a cryptographic hash function. In the example hash function above, there are no identical hash values, so there are no ‘collisions’ between the output strings. Programmers use advanced technologies to prevent such collisions.

Lightning-fast

If it takes too long for a hash function to compute hash values, the procedure is not much use. Hash functions must, therefore, be very fast. In databases, hash values are stored in so-called hash tables to ensure fast access.

What is a hash value?

A hash value is the output string generated by a hash function. No matter the input, all of the output strings generated by a particular hash function are of the same length. The length is defined by the type of hashing technology used. The output strings are created from a set of authorised characters defined in the hash function.

The hash value is the result calculated by the hash function and algorithm. Because hash values are unique, like human fingerprints, they are also referred to as ‘fingerprints’. If you take the lower-case letters ‘a’ to ‘f’ and the digits ‘0’ to ‘9’ and define a hash value length of 64 characters, there are 1.1579209e+77 possible output values – that’s 70 followed by 24 zeros! This shows that even with shorter strings, you can still generate acceptable fingerprints.

The hash values in the example above can be generated with just a few lines of PHP code:

<?php
echo hash('sha256', 'apple'); 
?>

Here, the ‘sha256’ encryption algorithm is being used to hash the input value ‘apple’. The corresponding hash value or fingerprint is always ‘3a42c503953909637f78dd8c99b3b85ddde362415585afc11901bdefe8349102’.

Hash functions and websites

With SSL-encrypted data transmission, when the web server receives a request, it sends the server certificate to the user’s browser. A session ID is then generated using a hash function, and this is sent to the server where it is decrypted and verified. If the server approves the session ID, the encrypted HTTPS connection is established and data can be exchanged. All of the data packets exchanged are also encrypted, so it is almost impossible for hackers to gain access.

Session IDs are generated using data relating to a site visit, such as the IP address and time stamp, and communicated with the URL. One common use of session IDs is to give unique identifiers to people shopping on a website. Nowadays, session IDs are rarely passed as a URL parameter (for example, as something like www.domain.tld/index?sid=d4ccaf2627557c756a0762419a4b6695). Instead, they are stored as a cookie in the website header.

Hash values are also used to encrypt cached data to prevent unauthorised users from using the cache to access login and payment details or other information about a site.

Communication between an FTP server and a client using the SFTP protocol also works in a similar way.

SSL Certificates
Be secure. Buy an SSL certificate.
  • Secures data transfers
  • Avoids browser warnings
  • Improves your Google ranking

Protection of sensitive data

Login details for online accounts are frequently the target of cyber-attacks. Hackers either want to disrupt operation of a website (for example, to reduce income generated by traffic-based ads) or access information about payment methods.

In the WordPress example above, you can see that passwords are always encrypted before they are stored. Combined with the session IDs generated in the system, this ensures a high level of security. This is especially important for protection against ‘brute force attacks. In this kind of attack, hackers use their own hash functions to repeatedly try out combinations until they get a result that allows them access. Using long passwords with high security standards makes these attacks less likely to succeed, because the amount of computing power required is so high. Remember: Never use simple passwords, and be sure to protect all of your login details and data against unauthorised access.

Digital signatures

Email traffic is sent via servers that are specially designed to transmit this type of message. Keys generated using hash functions are also used to add a digital signature to messages.

The steps involved in sending an email with a digital signature are:

  • Alice (the sender) converts her message into a hash value and encrypts the hash value using her private key. This encrypted hash value is the digital signature.
  • Alice sends the email and the digital signature to the recipient, Bob.
  • Bob generates a hash value of the message using the same hash function. He also decrypts the hash value using Alice’s public key and compares the two hashes.
  • If the two hash values match, Bob knows that Alice’s message has not been tampered with during transmission.

Please note that a digital signature proves the integrity of a message but does not actually encrypt it. If you’re sending confidential data, it’s therefore best to encrypt it as well as using a digital signature.

How can hash functions be used to perform lookups?

Searching through large quantities of data is a very resource-intensive process. Imagine you’ve got a table listing every inhabitant of a big city, with lots of different fields for each entry (first name, second name, address, etc.). Finding just one term would be very time-consuming and require a lot of computing power. To simplify the process, each entry in the table can be converted into a unique hash value. The search term is then converted to a hash value. This limits the number of letters, digits and symbols that have to be compared, which is much more efficient than searching every field that exists in the data table, for example, for all first names beginning with ‘Ann’.

Summary

Hash functions are used to improve security in electronic communications, and lots of highly sophisticated standards have now been developed. However, hackers are aware of this and are constantly coming up with more advanced hacking techniques.

Was this article helpful?
Page top