Set Theory

A quick introduction to sets, particularly to support learning probability.

Buttons of various colors and shapes.
Photo by Olga Kovalski on Unsplash.

Sets are a common mathematical structure, and oft-overlooked programming data structure, for representing collections of items or elements (numbers, cards, users, etc.).

This page provides a very brief introduction to sets and set notation. A basic understanding of sets is a prerequisite for understanding, among other things, probability.

What Is a Set?

A set \(A\) is an unordered collection of distinct elements. Both of these properties are important:

  • Each element in a set only appears once; if we “add” a duplicate element, the set doesn’t change.
  • Items in a set have no inherent order.

Size and Membership

We denote sets and their contents with squiggly braces, e.g. \(A=\{Y,N\}\) for a set \(A\) containing two members \(Y\) and \(N\).

The size or cardinality of a set is denoted \(|A|\). For \(A\) above, \(|A| = 2\). Sets may have infinite cardinality.

\(\emptyset\) is the empty set (it contains no elements, and has size 0).

An element \(a\) is a member of set \(A\) (denoted \(a \in A\)) if it is in the set.

We sometimes denote a set by \(\{a: \text{<some condition>}\}\), to be the set of all elements \(a\) that satisfy the specified condition. For example, \(E=\{n \in \mathbb{N}: n \operatorname{mod} 2 = 0\}\) is the set of even numbers: natural numbers (\(\mathbb{N}\)) that are divisible by 2 (their remainder, when divided by 2, is 0).

Set Relationships

We also define several relationships between sets:

  • \(A \cup B\) is the union: all elements in either \(A\) or \(B\) (or both).
  • \(A \cap B\) is the intersection: all elements in both \(A\) and \(B\).
  • \(A \setminus B\) is the set difference: all elements in \(A\) but not in \(B\).
  • Set \(A\) is a subset of another set \(B\) (denoted \(A \subseteq B\)) if every element in \(A\) is also in \(B\). It can also be a proper subset (denoted \(A \subset B\)) if \(A\) is a subset of \(B\) and there is at least one element of \(B\) that is not also in \(A\).
  • If \(A\) is a subset of some larger set \(U\) that contains all the possible elements under consideration, then the complement \(A^c = U \setminus A\) is the set of elements not in \(A\).
  • \(\mathcal{P}(A)\) is the power set of \(A\): the set of all subsets of \(A\). For example, if \(A = \{H,T\}\) (the results of a coin flip), then \(\mathcal{P}(A) = \{\emptyset, \{H\}, \{T\}, \{H,T\}\}\).

Numeric Sets

We have standard symbols for a few common sets of numbers:

  • \(\mathbb{R}\) is the set of real numbers.
  • \(\mathbb{N}\) is the set of natural numbers (non-negative integers).
  • \(\mathbb{Z}\) is the set of integers.

Sizes and Kinds of Sets

There are, broadly speaking, three kinds of sets in terms of their cardinality:

  • Finite sets have a finite number of elements. There is some natural number \(n\) such that \(|A| = n\).
  • Countable sets (or countably infinite sets) are infinite sets with the same cardinality as the set of natural numbers (\(|A| = |\mathbb{N}|\)). Formally, there exists an isomorphism (a 1:1 onto mapping) between members of \(A\) and \(\mathbb{N}\). Natural numbers, integers, rationals, and algebraics (rationals and roots) are all countable sets.
  • Uncountable sets are infinite sets whose cardinality is larger than that of the natural numbers. The real numbers (\(\mathbb{R}\)) and the power set of the natural numbers (\(\mathcal{P}(\mathbb{N})\)) are two frequently-encountered uncountable sets.

We also talk about discrete and continuous sets:

  • A continuous set \(A\) with an ordering \(<\) is a set where we can always find an element to fit between any two other elements: for any \(a, b \in A\) such that \(a < b\), there is a \(c \in A\) such that \(a < c < b\).
  • A discrete set is a set that is not continuous: there are irreducible gaps between elements.

All finite sets are discrete. The natural numbers \(\mathbb{N}\) and integers \(\mathbb{Z}\) are also discrete. The real numbers \(\mathbb{R}\) are continuous. Rationals and algebraics are also continuous, but we don’t encounter them very often in applied data science work.

Programming with Sets

Python provides a built-in data structure, set, for computing with sets. It provides the many set operations above.

NumPy also provides methods for working on arrays as if they were sets.

Set operations are also common in SQL (IS IN, etc.)