Python Lecture 3 - Containers and Generators
Learn about Python's container concept, list internals, and efficient iteration with generators
What is a Container?
In Python, a Container is a data structure that can hold other objects.
# Various containers
my_list = [1, 2, 3, 4, 5] # List
my_tuple = (1, 2, 3) # Tuple
my_set = {1, 2, 3} # Set
my_dict = {"name": "Python"} # Dictionary
my_string = "hello" # String is also a container!
They all share one thing in common -- they hold multiple pieces of data.
What Makes a Container
Containers have two important traits.
First, you can test membership.
numbers = [1, 2, 3, 4, 5]
# "Is this value in the container?"
print(3 in numbers) # True
print(10 in numbers) # False
print(10 not in numbers) # True
If you can use the in operator on it, that's a container.
Second, you can iterate over it.
# Walk through elements one by one with a for loop
for num in numbers:
print(num)
Python's Main Containers
| Container | Ordered | Duplicates | Mutable | Purpose |
|---|---|---|---|---|
| list | Yes | Yes | Yes | Ordered data collection |
| tuple | Yes | Yes | No | Immutable data |
| set | No | No | Yes | Unique data |
| dict | Yes (3.7+) | No (keys) | Yes | Key-value pairs |
| str | Yes | Yes | No | Character sequence |
Among these, the one you'll use most is List.
List - The Go-To Container
Lists are the most commonly used container in Python.
numbers = [1, 2, 3, 4, 5]
fruits = ["apple", "banana", "orange"]
mixed = [1, "hello", 3.14, True] # Can hold different types!
How Lists Work in Memory
How does Python actually store a list?
my_list = [10, 20, 30]
Lists are implemented as arrays, but with a twist.
Memory structure:
List object:
โโโโโโโโโโโโโโโ
โ size: 3 โ โ Current element count
โ capacity: 4 โ โ Allocated space size
โ items: โโโโ โ โ Pointer to actual elements
โโโโโโโโโโโโโผโโ
โ
โผ
Element array:
โโโโโโฌโโโโโฌโโโโโฌโโโโโ
โ โข โ โข โ โข โ โ
โโโผโโโดโโผโโโดโโผโโโดโโโโโ
โ โ โ
โผ โผ โผ
10 20 30
A list keeps two pieces of info: size (how many elements are actually in it) and capacity (how much space is allocated).
Why is Capacity Bigger Than Size?
Reallocating memory every time you add an element would be painfully slow. So Python pre-allocates extra space.
my_list = [] # capacity: 0
my_list.append(1) # capacity: 4 (allocates 4 slots at once)
my_list.append(2) # capacity: 4 (still has room)
my_list.append(3) # capacity: 4
my_list.append(4) # capacity: 4
my_list.append(5) # capacity: 8 (out of space, doubles capacity)
This is called a Dynamic Array.
How Fast Are List Operations?
Some operations are fast, some are slow. It's worth knowing which is which.
Fast operations (O(1) -- constant time):
# Index access
value = my_list[2] # Very fast
# Append to end
my_list.append(10) # Fast on average
# Remove from end
my_list.pop() # Fast
Slow operations (O(n) -- proportional to list size):
# Insert in middle
my_list.insert(0, 5) # Slow (must shift all elements)
# Remove from middle
my_list.pop(0) # Slow (must shift all elements forward)
# Search for element
if 10 in my_list: # Slow (searches from start to end)
pass
Copying Lists -- Be Careful
This trips people up all the time.
# This is NOT a copy
original = [1, 2, 3]
copy1 = original # Points to same list!
copy1.append(4)
print(original) # [1, 2, 3, 4] - original changed too!
# This is a real copy
copy2 = original.copy() # or original[:]
copy2.append(5)
print(original) # [1, 2, 3, 4] - original unchanged
With nested lists, it gets trickier.
# 2D list
matrix = [[1, 2], [3, 4]]
shallow = matrix.copy()
shallow[0].append(99)
print(matrix) # [[1, 2, 99], [3, 4]] - inner lists are shared!
# Deep copy
import copy
deep = copy.deepcopy(matrix)
deep[0].append(100)
print(matrix) # [[1, 2, 99], [3, 4]] - original is safe
Iterator and Iterable
To really understand how containers are traversed, you need to know about Iterator and Iterable.
What is an Iterable?
An Iterable is anything you can put in a for loop.
# These are all Iterable
for x in [1, 2, 3]: # List
print(x)
for x in (1, 2, 3): # Tuple
print(x)
for x in "hello": # String
print(x)
for x in {1, 2, 3}: # Set
print(x)
What is an Iterator?
An Iterator is the object that actually retrieves values one by one.
numbers = [1, 2, 3]
# Create Iterator with iter()
iterator = iter(numbers)
# Retrieve values one by one with next()
print(next(iterator)) # 1
print(next(iterator)) # 2
print(next(iterator)) # 3
print(next(iterator)) # StopIteration error!
The Secret of for Loops
The for loop you write every day actually works like this under the hood.
# Code you write
for num in [1, 2, 3]:
print(num)
# What Python actually does
iterator = iter([1, 2, 3]) # Convert Iterable to Iterator
while True:
try:
num = next(iterator) # Get next value
print(num)
except StopIteration: # Stop when no more values
break
Iterable vs Iterator
โโโโโโโโโโโโโโโโ
โ Iterable โ "Iterable object" (list, tuple, str, etc.)
โโโโโโโโฌโโโโโโโโ
โ iter()
โผ
โโโโโโโโโโโโโโโโ
โ Iterator โ "Object that retrieves values one by one"
โโโโโโโโฌโโโโโโโโ
โ next()
โผ
Returns value
The key: Iterable gives you an Iterator when you call iter(). Iterator gives you the next value when you call next().
Building Your Own
class CountUp:
"""Iterable that counts from 1 to n"""
def __init__(self, max):
self.max = max
def __iter__(self):
"""Returns an Iterator"""
return CountUpIterator(self.max)
class CountUpIterator:
"""Iterator that actually returns values"""
def __init__(self, max):
self.max = max
self.current = 0
def __next__(self):
"""Returns next value"""
if self.current >= self.max:
raise StopIteration
self.current += 1
return self.current
# Usage
counter = CountUp(5)
for num in counter:
print(num)
# 1, 2, 3, 4, 5
That's a lot of code for something simple. There's a much easier way -- Generators.
Generator - The Smart Iterator
The Problem: Memory Waste
Say you want to process a million numbers.
# Creating as a list
numbers = [i for i in range(1000000)] # All million in memory!
for num in numbers:
print(num)
This stores all million numbers in memory even though you only need one at a time. That's wasteful.
Enter Generators
A Generator creates values one at a time, only when you need them.
# Generator version
def number_generator():
for i in range(1000000):
yield i # Returns values one by one
for num in number_generator():
print(num)
The difference is dramatic:
| List (Container) | Generator (Iterator) | |
|---|---|---|
| Memory | Stores all values (1 million) | Creates one at a time (1) |
| Speed | Slow initial creation | Starts immediately |
| Reuse | Can iterate multiple times | Can iterate once |
Generators are just an easy way to create Iterators.
The yield Keyword
yield is like return, but it remembers where the function was.
def simple_generator():
print("Creating first value")
yield 1
print("Creating second value")
yield 2
print("Creating third value")
yield 3
gen = simple_generator()
print(next(gen)) # "Creating first value" โ 1
print(next(gen)) # "Creating second value" โ 2
print(next(gen)) # "Creating third value" โ 3
When Python hits yield, it returns the value, pauses the function, and resumes from exactly that spot on the next call.
You Can Even Do Infinite Sequences
def infinite_numbers():
num = 0
while True:
yield num
num += 1
gen = infinite_numbers()
print(next(gen)) # 0
print(next(gen)) # 1
print(next(gen)) # 2
# Can continue indefinitely...
Try doing that with a list. You'd need infinite memory.
Practical Generator Examples
Reading Large Files
Generators shine when reading big files.
def read_large_file(file_path):
"""Generator that reads file line by line"""
with open(file_path) as file:
for line in file:
yield line.strip()
# Process 100GB file without memory burden
for line in read_large_file("huge_file.txt"):
process(line)
Fibonacci Sequence
def fibonacci():
"""Infinite Fibonacci sequence generator"""
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Get first 10 numbers
fib = fibonacci()
for _ in range(10):
print(next(fib))
# 0, 1, 1, 2, 3, 5, 8, 13, 21, 34
Data Filtering
def even_numbers(numbers):
"""Generator that returns only even numbers"""
for num in numbers:
if num % 2 == 0:
yield num
for num in even_numbers(range(10)):
print(num)
# 0, 2, 4, 6, 8
Generator Expressions
These look like list comprehensions but use () instead of [].
# List comprehension (uses lots of memory)
squares_list = [x**2 for x in range(1000000)]
# Generator expression (saves memory)
squares_gen = (x**2 for x in range(1000000))
# Computed only when needed
print(next(squares_gen)) # 0
print(next(squares_gen)) # 1
print(next(squares_gen)) # 4
More Generator Patterns
Conditional Generation
def numbers_until_condition(limit):
"""Generate numbers until sum exceeds limit"""
total = 0
num = 1
while total < limit:
yield num
total += num
num += 1
for n in numbers_until_condition(20):
print(n)
# 1, 2, 3, 4, 5 (1+2+3+4+5=15, adding 6 would make 21, so stop)
Stateful Generators
def countdown(start):
"""Countdown generator"""
current = start
while current > 0:
yield current
current -= 1
yield "Liftoff!"
for count in countdown(5):
print(count)
# 5, 4, 3, 2, 1, Liftoff!
Generator Chaining
def first_n(generator, n):
"""Take first n items from generator"""
count = 0
while count < n:
yield next(generator)
count += 1
# Infinite generator + limit
def all_numbers():
num = 0
while True:
yield num
num += 1
limited = first_n(all_numbers(), 5)
print(list(limited)) # [0, 1, 2, 3, 4]
When to Use What?
Use lists when you need multiple accesses, have small data, need sorting, or need to iterate more than once.
data = [1, 2, 3, 4, 5]
print(data[2]) # Index access
print(len(data)) # Length check
print(data[:3]) # Slicing
for x in data:
print(x)
for x in data: # Can iterate again!
print(x * 2)
Use generators when dealing with large data, iterating only once, working with infinite sequences, or building processing pipelines.
# Large data
huge_data = (x for x in range(10000000)) # Saves memory
# Pipeline processing
data = (x for x in range(100))
filtered = (x for x in data if x % 2 == 0)
squared = (x**2 for x in filtered)
The Big Picture
Here's how all these concepts relate to each other.
graph TD
A["Container
Most general concept
Data structures
list, tuple, set, dict, str"]
B["Iterable
Iterable objects
Can use in for loop
Calling iter() returns Iterator"]
C["Iterator
Retrieves values one by one
Returns next value with next()
Can iterate only once"]
D["Generator
Convenient Iterator
Creates values with yield
Memory efficient, infinite sequences"]
A -->|"All Containers are..."| B
B -->|"iter()"| C
C -->|"Easy way to create"| D
style A fill:#e3f2fd
style B fill:#fff3e0
style C fill:#f3e5f5
style D fill:#e8f5e9
Every Container is Iterable. Calling iter() on an Iterable gives you an Iterator. And Generators are just the easy way to make Iterators. That's the whole hierarchy.