How to program efficiently using Python generators
Python generators are special functions. They generate values step by step and work efficiently with memory.
What are Python generators?
Python generators are special functions that return a Python iterator. Creating Python generators is similar to defining normal functions, however, some of the details are slightly different. Generators have a yield statement instead of a return statement. Like iterators, generators also implement the next() function.
Python generators are one of the more advanced concepts in Python programming. If you are already further along and looking for information that goes beyond the basics covered in Python tutorials for beginners, you might find it helpful to take a look at the following articles:
What is the keyword ‘yield’?
You may already know what a return statement is if you have experience with Python or other programming languages. A return statement is used to pass values calculated by functions to the calling instance in the program code. Once the function’s return statement has been reached, the function is exited, and its execution is terminated. The function can be called again if necessary.
Things are different with yield. This keyword takes the place of the return statement in Python generators. When the generator is called, the value passed to the yield statement is returned. The Python generator is interrupted rather than terminated. This saves the current state of the generator function. When the generator function is called again, it will jump to the saved location.
What can Python generators be used for?
Generator functions are ideally suited for working with very large data sets. This is because Python generators follow the ‘lazy evaluation’ principle, only evaluating values when they are needed.
A normal function loads the entire file contents into a variable, which then goes straight into your memory. Your local memory might not be sufficient for large amounts of data, and as a result, you may end up with a MemoryError. Generators simplify this by reading files line by line. The yield keyword returns the value that you need and then interrupts the function’s execution until the next function call processes another line of the file.
Several web applications need to process large amounts of data. This makes Python a suitable choice for web projects. Deploy Now from IONOS creates web projects quickly by using GitHub for automatic deployment and building.
Not only do Python generators make handling of large amounts of data easier, they also facilitate working with infinity. Since local memory is finite, generators are the only way to create infinite lists or similar structures in Python.
How to read CSV files with Python generators
The following program allows you to read a CSV file line by line in a memory-efficient manner:
import csv
def csv_read(filename):
with open(filename, 'r') as file:
tmp = csv.reader(file)
for line in tmp:
yield line
for line in csv_read('test.csv'):
print(line)
PythonIn the code example above, we first imported the csv module to gain access to Python’s functions for processing CSV files. Next, the Python generator’s definition ‘csv_read’ appears. This starts with the keyword ‘def’ just like function definitions. After the file is opened, the python for loop iterates through the file line by line. Each line is returned using the keyword ‘yield’. Outside the generator function, the lines that the Python generator returns are output to the console one by one. The Python print function is used for this.
How to create infinite data structures with Python generators
As you can imagine, an infinite data structure cannot be stored locally on your computer. However, infinite data structures are essential for some applications. Generator functions are useful for these applications, because they can process each element one by one and do not overrun the memory. The following Python code is an example of an infinite sequence of natural numbers:
def natural_numbers():
n = 0
while True:
yield n
n += 1
for number in natural_numbers():
print(number)
PythonFirst, a Python generator named ‘natural_numbers’ is defined. This sets the initial value for the variable ‘n’. Then, an endless python while loop is started. The variable’s current value is returned with ‘yield’ and the execution of the generator function is interrupted. When the function is called again, the number previously output is incremented by 1 and the generator is run again until the interpreter comes across the ‘yield’ keyword. The numbers generated by the generator are output in the for loop below the generator function. If the program is not manually interrupted, it will run indefinitely.
What is the shorthand notation for Python generators?
Python lists can be created with list comprehensions in just one line of code. A similar shorthand notation for generators also exists. Let’s look at a generator that produces a sequence of numbers from 0 to 9 and increments each number by a value of 1. This example is similar to the generator previously used to generate an infinite sequence of natural numbers.
def natural_numbers():
n = 0
while n <= 9:
yield n
n+=1
PythonUse a for statement in round brackets if you want to write this generator in one line of code, like in the following example:
increment_generator = (n + 1 for n in range(10))
PythonThe following output will appear if you want to output this generator:
<generator object <genexpr> at 0x0000020CC5A2D6C8>
This shows you where the generator object is in your memory. Use the next() function to access the output of your generator:
print(next(increment_generator))
print(next(increment_generator))
print(next(increment_generator))
PythonThis code section provides output showing how numbers from 0 to 2 have been incremented by 1:
1
2
3
What is the difference between generators and list comprehensions?
The shorthand notation of generators is very similar to list comprehensions. The only visible difference is the parentheses. While the square brackets are used for comprehensions, the round brackets are used to create Python generators. But there is a more significant difference: the memory requirements for generators are much smaller than lists.
import sys
increment_list = [n + 1 for n in range(100)]
increment_generator = (n + 1 for n in range(100))
print(sys.getsizeof(increment_list))
print(sys.getsizeof(increment_generator))
PythonThe program above outputs the list’s memory requirements and the memory requirements for the generator:
912
120
While a list requires 912 bytes of memory, a generator only needs 120 bytes. The difference is even greater when there is more data to process.