In combinatorial mathematics, a matroid is a structure that captures the essence of a notion of "independence" (hence independence structure) that generalizes linear independence in vector spaces. There are many equivalent ways to define a matroid; the most significant include those in terms of independent sets, bases, closed sets (flats), the closure operator, circuits (minimal dependent sets), and the rank function.
For convenience and simplicity we begin with the definition by independent sets. In this definition a matroid M on a ground set E is a pair (E, I), where I is a collection of subsets of E (called the independent sets) with the following properties:
The first two properties are simple, but the motivation behind the third property is not obvious. The examples in the example section below will make its motivation clearer.
A set that is not independent is called dependent. A minimal dependent set is called a circuit.
The closure cl(A) of a subset A of E in a finitary matroid M is defined to be A together with every point x in E\A, such that there is a circuit C containing x and contained in the union of A and {x}. This defines the closure operator, from P(E) to P(E), where P denotes the power set.
We can give axioms for the closure operator, thereby defining a matroid in terms of its closure. First, let E be a finite set. A function cl from P(E) to P(E) is a matroid closure if it satisfies the following conditions, for all elements a, b of E and all subsets Y, Z of E:
The theorem is that a matroid closure is exactly the same thing as the closure operator of a finitary matroid as defined by independent sets. Thus, one can choose to define a matroid by its closure instead of its independent sets; and this is often desirable, especially in applications to geometry.
By contrast, the closure operator of a topological space tends to fail the exchange property (2); instead it has a different characteristic property.
Note that an infinite graph determines a cycle matroid in the same way, which is a finitary matroid because all cycles are finite, and is finite dimensional if the number of vertices is finite (even if the number of edges is infinite).
A subset of E is called dependent if it is not independent. A dependent set all of whose proper subsets are independent is called a circuit (with the terminology coming from the graph example above).
We say A, a subset of E, spans M if its closure is E.
A matroid is simple if every 2-element set is independent.
An independent set that is not properly contained in another independent set is called a basis (with the terminology coming from the vector space example above). Hence bases are maximal independent sets, and circuits are minimal dependent sets. An important fact is that all bases of a matroid have the same number of elements, called the rank of M. In general, the circuits of M can have different sizes.
In the linear algebra example matroid above, a basis is a basis in the sense of linear algebra of the subspace spanned by E, and a circuit is a minimal set of dependent vectors of E. In the cycle matroid, a basis is the same as a spanning forest of the graph G, and circuits are simple cycles in the graph. In the example where sets of at most k elements are independent, a basis is any subset of E with k elements, and a circuit is any subset of k + 1 elements.
If A is a subset of E, then a matroid on A can be defined by considering a subset of A independent if and only if it is independent in M. This allows us to talk about submatroids and about the rank of any subset of E.
The rank function r assigns a natural number to every subset of E and has the following properties:
These three properties can be used as one of the alternative definitions of a finite matroid: the independent sets are then defined as those subsets A of E with r(A) = |A|.
If M is a finite matroid, we can define the dual matroid M* by taking the same underlying set and calling a set a basis in M* if and only if its complement is a basis in M. Thus, a set is independent in M* if and only if it is contained in the complement of some basis of M, if and only if its complement spans M. One checks easily that M* is indeed a matroid. One of the difficulties with infinite matroids is to define duality; this has never been resolved satisfactorily.
Very early it was recognized that algebraic independence is a matroid independence. Let L be an extension field of a field K. A finite subset x1, ..., xk of L is algebraically independent if there is no non-zero polynomial f(t1, ..., tk), with coefficients in K, such that f(x1, ..., xk) = 0. Algebraic dependence determines a finitary matroid on the ground set L, called the full algebraic matroid of L over K. The rank of this matroid equals the transcendence degree of L over K. An algebraic matroid is any submatroid of a full algebraic matroid.
Not much later, it was found that transversal theory provides another kind of matroid, now called a transversal matroid. (The original name, due to the founder R. Rado, was independence structure.)
There is a simple algorithm for finding a basis:
The result is clearly an independent set. It is a maximal independent set because if B U {x} is not independent for some subset B of A, then A U {x} is not independent either (the contrapositive follows from the hereditary property). Thus if we pass up an element, we'll never have an opportunity to use it later. We will generalize this algorithm to solve a harder problem.
A weight function w : E → R+ for a matroid M=(E, I) assigns a strictly positive weight to each element of E. We extend the function to subsets of E by summation; w(A) is the sum of w(x) over x in A. A matroid with an associated weight function is called a weighted matroid.
As a simple example, say we wish to find the maximum spanning forest of a graph. That is, given a graph and a weight for each edge, find a forest containing every vertex and maximizing the total weight of the edges in the tree. This problem arises in some clustering applications. If we look at the definition of the forest matroid above, we see that the maximum spanning forest is simply the independent set with largest total weight — such a set must span the graph, for otherwise we can add edges without creating cycles. But how do we find it?
An independent set of largest total weight is called an optimal set. Optimal sets are always bases, because if an edge can be added, it should be; this only increases the total weight. As it turns out, there is a trivial greedy algorithm for computing an optimal set of a weighted matroid. It works as follows:
This algorithm finds a basis, since it is a special case of the above algorithm. It always chooses the element of largest weight that it can while preserving independence (thus the term "greedy"). This always produces an optimal set: suppose that it produces and that . Now for any with , consider open sets and . Since is smaller than , there is some element of which can be put into with the result still being independent. However is an element of maximal weight that can be added to to maintain independence. Thus is of no smaller weight than some element of , and hence is of at least a large a weight as . As this is true for all , is weightier than .
The easiest way to traverse the members of E in the desired order is to sort them. This requires O(|E|log|E|) time using a comparison sorting algorithm. We also need to test for each x whether A U {x} is independent; assuming independence tests require O(f(|E|)) time, the total time for the algorithm is O(|E|log|E| + |E|f(|E|)).
If we want to find a minimum spanning tree instead, we simply "invert" the weight function by subtracting it from a large constant. More specifically, let wmin(x) = W - w(x), where W exceeds the total weight over all graph edges. Many more optimization problems about all sorts of matroids and weight functions can be solved in this trivial way, although in many cases more efficient algorithms can be found that exploit more specialized properties.
Note also that if we take a set of "independent" sets which is a down-set but not a matroid, then the greedy algorithm will not always work. For then there are independent sets and with , but such that for no is independent.
Pick an and such that . Weight the elements of in the range to , the elements of in the range to , the elements of in the range to , and the rest in the range to . The greedy algorithm will select the elements of , and then cannot pick any elements of . Therefore the independent set it constructs will be of weight at most , which is smaller than the weight of .
The concept of a (finite) matroid was introduced by Hassler Whitney in 1935 in his paper "On the abstract properties of linear dependence". The names independence structure (R. Rado; this is a finitary matroid) and combinatorial pregeometry (G.-C. Rota) have been used, although the latter especially is now rare. (Rota called a simple matroid a combinatorial geometry.)
The original interest was in vector, graphic, and algebraic matroids, and later in transversal matroids.
It was not until 1971 that Jack Edmonds connected weighted matroids to greedy algorithms in his paper "Matroids and the greedy algorithm". Korte and Lovász would generalize these ideas to objects called greedoids, which allow even larger classes of problems to be solved by greedy algorithms.
Matroid theory | Geometry | Dimension | Closure operators | Duality theories