What is long short-term memory?
Long short-term memory (LSTM) is a way of training neural networks and storing important information for the long term. The technology uses short-term and long-term memory for this purpose and is of crucial importance for the further development of artificial intelligence.
What is long short-term memory (LSTM)?
Long short-term memory (LSTM) is a computer science technique that’s used to store information within a neural network over a longer period of time. This is particularly important in the processing of sequential data. Long short-term memory allows the network to access previous events and take them into account for new calculations. This distinguishes it in particular from Recurrent Neural Networks (RNN) or can ideally complement them. Instead of a simple ‘short-term memory’, the LSTM has an additional ‘long-term memory’ in which selected information is stored over a longer period of time.
Networks with long short-term memory can therefore store information over long periods of time and thus recognise long-term dependencies. This is particularly important in the field of deep learning and AI. The basis for this is the gates. We’ll explain how these work in more detail later in this article. The networks provide efficient models for prediction and processing based on time series data.
- Get online faster with AI tools
- Fast-track growth with AI marketing
- Save time, maximise results
Which elements make up an LSTM cell?
A cell that has a long short-term memory consists of different building blocks that offer the network various options. It must be able to store information over a long period of time and link it to new information as required. At the same time, it’s important that the cell independently deletes unimportant or outdated knowledge from its ‘memory’. For this reason, it consists of four different components:
- Input gate: The input gate decides which new information is to be added to the memory and how.
- Forget gate: The forget gate determines which information should be stored in a cell and which should be removed.
- Output gate: The output gate determines how values are output from a cell. The decision is based on the current status and the respective input information.
The fourth component is the cell interior. This is subject to its own linking logic, which regulates how the other components interact and how information flows and storage processes should be handled.
How does long short-term memory work?
Similar to the aforementioned Recurrent Neural Network or the simpler Feedforward Neural Network (FNN), cells with long short-term memory also act in layers. Unlike other networks, however, they store information over long periods of time and can subsequently process or retrieve it. To do this, each LSTM cell uses the three gates mentioned above as well as a type of short-term memory and long-term memory.
- Short-term memory, i.e., the memory in which information from previous calculation steps is stored for a short time, is also known from other networks. In the case of long short-term memory, it’s called hidden state. Unlike other networks, however, an LSTM cell can also retain information in the long term. This information is stored in the so-called cell state. New information now passes through the three gates.
- In the input gate, the current input is multiplied by the hidden state and the weighting of the last run. This is how the input gate decides how valuable the new input is. Important information is then added to the previous cell state to create the new cell state.
- The forget gate is used to decide which information should continue to be used and which should be removed. The last hidden state and the current input are taken into account. This decision is made using a sigmoid function (gooseneck function), which outputs values between 0 and 1. 0 means that previous information is forgotten, while 1 means that the previous information is retained as the current status. The result is multiplied by the current cell state. Values with 0 are therefore dropped.
- The final output is then calculated in the output gate. The hidden state and the sigmoid function are used for this. The cell state is then activated with a tanh function (hyperbolic tangent) and multiplied to determine which information should pass through the output gate.
What different architectures are there?
While this mode of operation is similar for all networks with long short-term memory, there are sometimes serious differences in the architecture of LSTM variants. For example, peephole LSTMs are widely used, which owe their name to the fact that the individual gates can see the status of the respective cell. An alternative is peephole convolutional LSTMs, which use discrete convolution in addition to matrix multiplication to calculate the activity of a neuron.
What are the most important areas of application for long short-term memory?
Countless applications now rely entirely or partially on neural networks with long short-term memory. The areas of application are very diverse. The technology makes a valuable contribution in the following areas:
- Automated text generation
- Analysis of time series data
- Voice recognition
- Forecasting stock market developments
- Composition
Long short-term memory is also used to identify anomalies, for example in the event of attempted fraud or attacks on networks. Corresponding applications can also recommend media such as films, series, bands or books based on user data or analyse videos, images or songs. This is a simple way of not only increasing security, but also significantly reducing costs.
Numerous large corporations use long short-term memory for their services and products. Google uses corresponding networks for its smart assistance systems, the translation program Google Translate, the gaming software AlphaGo and speech recognition in smartphones. The two voice-controlled assistants Siri (Apple) and Alexa (Amazon) are also based on long short-term memory, as is Apple’s keyboard completion.