Hash Collisions in Your Freezer

How non-unique pre-printed codes on freezer bags can lead to confusion

Table of Contents

I recently came across the Toppits freezer bags with their 4-digit "pre-printed freshness codes". Together with the Foodsaver App, the idea is that you can organize your freezer and keep track of its and their expiration dates.

Photo of Toppits freezer bags with preprinted 4-digit freshness codes

Toppits freezer bags with freshness codes. Image taken from toppits.de.

The 4-digit freshness code used to identify a bag results in 10,000 possible freshness codes. Naively, one could say: "Nobody is likely to have 10,000 freezer bags in their freezer." But wait: The freshness codes are randomly distributed, so two random bags can have the same code! Furthermore, it's not the case that we have a specific bag with a freshness code and fear about another bag having the same freshness code. Any two bags (a pair) with the same freshness code in our freezer is a problem, no matter which. Or, to put it in computer science terms: How likely is a hash collision?

The birthday problem

Ever heard of the birthday problem? For a group of 23 people, the probability that two people have the same birthday date (day and month) is already over 50%. It's the same here. In general, the problem deals with distributing items (persons) to buckets (birthdays). In our case, we distribute bags (items) to 10,000 freshness codes (buckets).

Analyzing the problem with code

I grabbed the formula from Wikipedia and plotted the probability of a collision for the freshness codes:

from math import factorial
import matplotlib.pyplot as plt

buckets = 10_000
xs = range(1, 250) # Range of items

ys = [1 - ( factorial(buckets) / (buckets**items * factorial(buckets - items)) ) for items in xs]

plt.plot(xs, ys)
plt.title("Freshness code collision probability")
plt.xlabel("Items in freezer")
plt.ylabel("Collision probability")
plt.grid()

This results in the following graph:

Figure of freshness code collision probability with 0% for 0 items, 50% for ~125 items and 95% for ~250 items

Freshness code collision probability being 0% for 0 items, 50% for ~125 items and 95% for ~250 items.

Ok, so for 250 bags, the probability of a collision is nearly 100%. And for around 125 bags, the probability is around 50%. Can we get that more accurately? Sure!

next(xy for xy in zip(xs, ys) if xy[1] > 0.5)
# (119, 0.5058369938385008)

For 119 bags in your freezer, the probability of a collision is over 50%. Better have less than 119 bags in your freezer then!

Making the problem worse

Now let's say you aren't such a power user and only have 30 bags in your freezer.

ys[30]
# 0.04548063432605254

4.5% collision probability. Ok.

But imagine you're a truly loyal user of this freshness code system and use it for 10 years. Therefore, we assume that the whole content of your freezer is replaced every year (or throughout the year). Then, we're searching for the probability of not having a collision with 30 bags 10 times in a row.

collision_probability = 0.04548063432605254
no_collision_probability = 1 - collision_probability
years = 10
no_collision_n_times_probability = no_collision_probability**years
# no_collision_n_times_probability = 0.6278377711556372

So, no collision with 62.8% probability, and a collision with 37.2% probability.

Well, what do we learn from that investigation? Maybe just buy normal freezer bags, grab a marker pen, write a UUID onto them and store the information in a blockchain, so that no one can change the expiration date of your food /s.