Editing Set cover problem

{{Short description|Classical problem in combinatorics}}

[[File:SetCover.svg|thumb|Example of an instance of set cover problem.]]
The '''set cover problem''' is a classical question in [[combinatorics]], [[computer science]], [[operations research]], and [[Computational complexity theory|complexity theory]].  

Given a [[Set (mathematics)|set]] of elements {{math|{1, 2, …, ''n''} }}(henceforth referred to as the [[Universe (mathematics)|universe]], specifying all possible elements under consideration) and a collection, referred to as {{mvar|S}}, of a given {{mvar|m}} subsets whose [[union (set theory)|union]] equals the universe, the set cover problem is to identify a smallest sub-collection of {{mvar|S}} whose union equals the universe. 

For example, consider the universe, {{math|1=''U'' = {1, 2, 3, 4, 5<nowiki>}</nowiki> }} and the collection of sets {{math|1=''S'' = { {1, 2, 3}, {2, 4}, {3, 4}, {4, 5} }.}} In this example, {{mvar|m}} is equal to 4, as there are four subsets that comprise this collection. The union of {{mvar|S}} is equal to {{mvar|U}}. However, we can cover all elements with only two sets: {{math|{ {1, 2, 3}, {4, 5} }‍}}, see picture, but not with only one set. Therefore, the solution to the set cover problem for this {{mvar|U}} and {{mvar|S}} has size 2.

More formally, given a universe <math>\mathcal{U}</math> and a family <math>\mathcal{S}</math> of subsets of <math>\mathcal{U}</math>, a '''set cover''' is a subfamily <math>\mathcal{C}\subseteq\mathcal{S}</math> of sets whose union is <math>\mathcal{U}</math>. 

* In the set cover [[decision problem]], the input is a pair <math>(\mathcal{U},\mathcal{S})</math> and an integer <math>k</math>; the question is whether there is a set cover of size <math>k</math> or less.
* In the set cover [[optimization problem]], the input is a pair <math>(\mathcal{U},\mathcal{S})</math>, and the task is to find a set cover that uses the fewest sets.

The decision version of set covering is [[NP-complete]]. It is one of [[Karp's 21 NP-complete problems]] shown to be [[NP-complete]] in 1972. The optimization/search version of set cover is [[NP-hard]].{{sfn |Korte|Vygen|2012|p=414}} It is a problem "whose study has led to the development of fundamental techniques for the entire field" of [[approximation algorithms]].<ref>{{harvtxt|Vazirani|2001|p=15}}</ref>

== Variants ==
In the '''weighted set cover''' problem, each set is assigned a positive weight (representing its cost), and the goal is to find a set cover with a smallest weight. The usual (unweighted) set cover corresponds to all sets having a weight of 1.

In the '''fractional set cover''' problem, it is allowed to select fractions of sets, rather than entire sets. A fractional set cover is an assignment of a fraction (a number in [0,1]) to each set in <math>\mathcal{S}</math>, such that for each element ''x'' in the universe, the sum of fractions of sets that contain ''x'' is at least 1. The goal is to find a fractional set cover in which the sum of fractions is as small as possible. Note that a (usual) set cover is equivalent to a fractional set cover in which all fractions are either 0 or 1; therefore, the size of the smallest fractional cover is at most the size of the smallest cover, but may be smaller. For example, consider the universe {{math|1=''U'' = {1, 2, 3<nowiki>}</nowiki> }} and the collection of sets {{math|1=''S'' = { {1, 2}, {2, 3}, {3, 1} }.}}  The smallest set cover has a size of 2, e.g. {{math|1={ {1, 2}, {2, 3} }.}} But there is a fractional set cover of size 1.5, in which a 0.5 fraction of each set is taken. {{Covering-Packing Problem Pairs}}

==Linear program formulation{{Anchor|LP}}==
The '''set cover problem''' can be formulated as the following [[integer linear program]] (ILP).<ref>{{harvtxt|Vazirani|2001|p=108}}</ref>
{|
| minimize
| <math>\sum_{s \in \mathcal S} x_s</math>
|
| (minimize the number of sets)
|-
| subject to
| <math>\sum_{s\colon e \in s} x_s \geqslant 1 </math>
| for all <math>e\in \mathcal U</math>
| (cover every element of the universe)
|-
|
| <math>x_s \in \{0,1\}</math>
| for all <math>s\in \mathcal S</math>.
| (every set is either in the set cover or not)
|}
For a more compact representation of the covering constraint, one can define an [[incidence matrix]] ''<math>A</math>'', where each row corresponds to an element and each column corresponds to a set, and ''<math>A_{e,s}=1</math>'' if element e is in set s, and ''<math>A_{e,s}=0</math>'' otherwise. Then, the covering constraint can be written as <math>A x \geqslant 1 </math>.

'''Weighted set cover''' is described by a program identical to the one given above, except that the objective function to minimize is <math>\sum_{s \in \mathcal S} w_s x_s</math>, where <math>w_{s}</math> is the weight of set <math>s\in \mathcal{S}</math>.

'''Fractional set cover''' is described by a program identical to the one given above, except that <math>x_s</math> can be non-integer, so the last constraint is replaced by <math>0 \leq x_s\leq 1</math>.

This linear program belongs to the more general class of LPs for [[covering problem]]s, as all the coefficients in the objective function and both sides of the constraints are non-negative. The [[Linear programming relaxation#Approximation and integrality gap|integrality gap]] of the ILP is at most <math>\scriptstyle \log n</math> (where <math>\scriptstyle n</math> is the size of the universe). It has been shown that its [[Linear programming relaxation|relaxation]] indeed gives a factor-<math>\scriptstyle \log n</math> [[approximation algorithm]] for the minimum set cover problem.<ref>{{harvtxt|Vazirani|2001|pp=110–112}}</ref> See [[randomized rounding#setcover]] for a detailed explanation.

== Hitting set formulation ==
Set covering is equivalent to the '''hitting set problem'''. That is seen by observing that an instance of set covering can
be viewed as an arbitrary [[bipartite graph]], with the universe represented by vertices on the left, the sets represented by vertices on the
right, and edges representing the membership of elements to sets. The task is then to find a minimum cardinality subset of left-vertices that has a non-trivial intersection with each of the right-vertices, which is precisely the hitting set problem.

In the field of [[computational geometry]], a hitting set for a collection of geometrical objects is also called a '''stabbing set''' or '''piercing set'''.<ref>{{Cite journal |last=Nielsen |first=Frank |date=2000-09-06 |title=Fast stabbing of boxes in high dimensions |url=http://www.lix.polytechnique.fr/%7Enielsen/pdf/2000-FastStabbingBoxes-TCS.pdf |journal=Theoretical Computer Science |volume=246 |issue=1 |pages=53–72 |doi=10.1016/S0304-3975(98)00336-3 |issn=0304-3975|doi-access=free }}</ref>

== Greedy algorithm ==

There is a [[greedy algorithm]] for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using a [[bucket queue]] to prioritize the sets.<ref>{{Introduction to Algorithms|edition=3|chapter=Exercise 35.3-3|pages=1122|mode=cs2}}</ref> It achieves an approximation ratio of <math>H(s)</math>, where <math>s</math> is the size of the set to be covered.<ref>{{citation
| last1=Chvatal | first1=V. | authorlink1=Václav Chvátal
| jstor=3689577
| title=A Greedy Heuristic for the Set-Covering Problem
| journal=[[Mathematics of Operations Research]]
| volume=4
| issue=3
| date=August 1979
| pages=233–235
| doi=10.1287/moor.4.3.233}}</ref> In other words, it finds a covering that may be <math>H(n)</math> times as large as the minimum one, where <math>H(n)</math> is the <math>n</math>-th [[harmonic number]]:
<math display=block> H(n) = \sum_{k=1}^{n} \frac{1}{k} \le \ln{n} +1</math>

This greedy algorithm actually achieves an approximation ratio of <math>H(s^\prime)</math> where <math>s^\prime</math> is the maximum cardinality set of <math>S</math>. For <math>\delta-</math>dense instances, however, there exists a <math>c \ln{m}</math>-approximation algorithm for every <math>c > 0</math>.<ref>
{{harvnb|Karpinski|Zelikovsky|1998}}</ref>

[[Image:SetCoverGreedy.gif|frame|Tight example for the [[greedy algorithm]] with k=3]]

There is a standard example on which the greedy algorithm achieves an approximation ratio of <math>\log_2(n)/2</math>.
The universe consists of <math>n=2^{(k+1)}-2</math> elements. The set system consists of <math>k</math> pairwise disjoint sets 
<math>S_1,\ldots,S_k</math> with sizes <math>2,4,8,\ldots,2^k</math> respectively, as well as two additional disjoint sets <math>T_0,T_1</math>,
each of which contains half of the elements from each <math>S_i</math>. On this input, the greedy algorithm takes the sets
<math>S_k,\ldots,S_1</math>, in that order, while the optimal solution consists only of <math>T_0</math> and <math>T_1</math>.
An example of such an input for <math>k=3</math> is pictured on the right.

Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms
(see [[Set cover problem#Inapproximability results|Inapproximability results]] below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly <math>\ln{n} - \ln{\ln{n}} + \Theta(1)</math>.<ref>Slavík Petr  [http://dl.acm.org/citation.cfm?id=237991 A tight analysis of the greedy algorithm for set cover]. STOC'96, Pages 435-441, {{doi|10.1145/237814.237991}}</ref>

== Low-frequency systems ==

If each element occurs in at most {{var|f}} sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of {{var|f}} using [[Linear programming relaxation|LP relaxation]].

If the constraint <math>x_S\in\{0,1\}</math> is replaced by <math>x_S \geq 0</math> for all {{var|S}} in <math>\mathcal{S}</math> in the integer linear program shown [[#Integer linear program formulation|above]], then it becomes a (non-integer) linear program {{var|L}}. The algorithm can be described as follows:
# Find an optimal solution {{var|O}} for the program {{var|L}} using some polynomial-time method of solving linear programs.
# Pick all sets {{var|S}} for which the corresponding variable {{var|x}}<sub>{{var|S}}</sub> has value at least 1/{{var|f}} in the solution {{var|O}}.<ref>{{harvtxt|Vazirani|2001|pp=118–119}}</ref>

== Inapproximability results ==

When <math> n</math> refers to the size of the universe, {{harvtxt |Lund|Yannakakis|1994}} showed that set covering cannot be approximated in polynomial time to within a factor of <math>\tfrac{1}{2}\log_2{n} \approx 0.72\ln{n}</math>, unless '''NP''' has [[quasi-polynomial time]] algorithms. [[Uriel Feige|Feige]] (1998) improved this lower bound to <math>\bigl(1-o(1)\bigr)\cdot\ln{n}</math> under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. {{harvtxt |Raz|Safra|1997}} established a lower bound
of <math>c\cdot\ln{n}</math>, where <math>c</math> is a certain constant, under the weaker assumption that '''P'''<math>\not=</math>'''NP'''.
A similar result with a higher value of <math>c</math> was recently proved by {{harvtxt |Alon|Moshkovitz|Safra|2006}}. {{harvtxt |Dinur|Steurer|2013}} showed optimal inapproximability by proving that it cannot be approximated to <math>\bigl(1 - o(1)\bigr) \cdot \ln{n}</math> unless '''P'''<math>=</math>'''NP'''.

In low-frequency systems, {{harvtxt |Dinur|Guruswami|Khot|Regev|2003}} proved it is NP-hard to approximate set cover to better than <math>f-1-\epsilon</math>.
If the [[Unique games conjecture]] is true, this can be improved to <math>f-\epsilon</math> as proven by {{harvtxt |Khot|Regev|2008}}.

{{harvtxt |Trevisan|2001}} proves that set cover instances with sets of size at most <math>\Delta</math> cannot be approximated to a factor better than <math>\ln \Delta - O(\ln \ln \Delta)</math> unless '''P'''<math>=</math>'''NP''', thus making the approximation of <math>\ln \Delta + 1</math> of the greedy algorithm essentially tight in this case.

== Weighted set cover ==
{{Expand section|date=November 2017}}

[[Linear programming relaxation|Relaxing]] the integer linear program for weighted set cover stated [[#Integer linear program formulation|above]], one may use [[randomized rounding]] to get an <math>O(\log n)</math>-factor approximation. Non weighted set cover can be adapted to the weighted case.<ref>{{harvtxt|Vazirani|2001|loc=Chapter 14}}</ref>

== Fractional set cover ==
{{Expand section|date=September 2023}}
== Related problems ==
* Hitting set is an equivalent reformulation of Set Cover.
* [[Vertex cover problem|Vertex cover]] is a special case of Hitting Set.
* [[Edge cover problem|Edge cover]] is a special case of Set Cover.
* [[Geometric Set Cover Problem|Geometric set cover]] is a special case of Set Cover when the universe is a set of points in <math>\mathbb{R}^d</math> and the sets are induced by the intersection of the universe and geometric shapes (e.g., disks, rectangles).
* [[Set packing]]
* [[Maximum coverage problem]] is to choose at most k sets to cover as many elements as possible.
* [[Dominating set]] is the problem of selecting a set of vertices (the dominating set) in a graph such that all other vertices are adjacent to at least one vertex in the dominating set. The Dominating set problem was shown to be NP complete through a reduction from Set cover.
* [[Exact cover problem]] is to choose a set cover with no element included in more than one covering set.
* Red-blue set cover.<ref>{{Cite book|last=Information.|first=Sandia National Laboratories. United States. Department of Energy. United States. Department of Energy. Office of Scientific and Technical|url=http://worldcat.org/oclc/68396743|title=On the Red-Blue Set Cover Problem.|date=1999|publisher=United States. Dept. of Energy|oclc=68396743}}</ref>
*[[Set-cover abduction]].
* [[Monotone dualization]] is a computational problem equivalent to either listing all minimal hitting sets or listing all minimal set covers of a given set family.<ref>{{citation
 | last1 = Gainer-Dewar | first1 = Andrew
 | last2 = Vera-Licona | first2 = Paola
 | arxiv = 1601.02939
 | doi = 10.1137/15M1055024
 | issue = 1
 | journal = [[SIAM Journal on Discrete Mathematics]]
 | mr = 3590650
 | pages = 63–100
 | title = The minimal hitting set generation problem: algorithms and computation
 | volume = 31
 | year = 2017| s2cid = 9240467
 }}</ref>

==Notes==

{{reflist}}

== References ==

* {{Citation | last1=Alon | first1=Noga | author1-link=Noga Alon | last2=Moshkovitz | first2=Dana|author2-link= Dana Moshkovitz | last3=Safra | first3=Shmuel | author3-link=Shmuel Safra | title=Algorithmic construction of sets for k-restrictions | year=2006 | journal=ACM Trans. Algorithms | issn=1549-6325 | volume=2 | issue=2 | pages=153–177 | doi=10.1145/1150334.1150336| citeseerx=10.1.1.138.8682 | s2cid=11922650 }}.
* {{Citation
 | last1=Cormen     | first1=Thomas H.   | author-link=Thomas H. Cormen
 | last2=Leiserson | first2=Charles E. | author-link2=Charles E. Leiserson
 | last3=Rivest    | first3=Ronald L.  | author-link3=Ronald L. Rivest
 | last4=Stein     | first4=Clifford   | author-link4=Clifford Stein
 | title=Introduction to Algorithms | year=2001 | publisher=MIT Press and McGraw-Hill | location=Cambridge, Mass.
 | isbn=978-0-262-03293-3 | pages=1033–1038}}
* {{Citation | last1=Feige | first1=Uriel | author1-link=Uriel Feige | title=A threshold of ln n for approximating set cover | year=1998 | journal=[[Journal of the ACM]] | issn=0004-5411 | volume=45 | issue=4 | pages=634–652 | doi=10.1145/285055.285059| citeseerx=10.1.1.70.5014 | s2cid=52827488 }}.
* {{Citation
 | last1=Karpinski | first1=Marek | last2=Zelikovsky|first2=Alexander
 | year=1998
 | contribution=Approximating dense cases of covering problems
 | title=Proceedings of the DIMACS Workshop on Network Design: Connectivity and Facilities Location
 | volume=40
 | pages=169–178
 | publisher=American Mathematical Society | url=https://books.google.com/books?id=IMmuF0RZk1MC&q=karpinski+zelikovsky+cover+dense&pg=PA169
| isbn=9780821870846 }}
* {{Citation | last1=Lund | first1=Carsten | author1-link=Carsten Lund | last2=Yannakakis | first2=Mihalis | author2-link=Mihalis Yannakakis | title=On the hardness of approximating minimization problems | year=1994 | journal=[[Journal of the ACM]] | issn=0004-5411 | volume=41 | issue=5 | pages=960–981 | doi=10.1145/185675.306789| s2cid=9021065 | doi-access=free }}.
* {{Citation | last1=Raz | first1=Ran | author1-link=Ran Raz | last2=Safra | first2=Shmuel | author2-link=Shmuel Safra | title=STOC '97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing | publisher=ACM | isbn=978-0-89791-888-6 | year=1997 | chapter=A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP | pages=475–484}}.
* {{Citation | last1=Dinur | first1=Irit | author1-link=Irit Dinur | last2=Steurer | first2=David | title=STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computing | publisher=ACM | year=2013 | chapter=Analytical approach to parallel repetition | pages=624–633}}.
* {{Citation| last = Vazirani | first = Vijay V.
 | author-link = Vijay Vazirani
 | title = Approximation Algorithms
 | year = 2001
 | publisher = Springer-Verlag
 | isbn = 978-3-540-65367-7
 | url = https://www.ics.uci.edu/~vazirani/book.pdf
 }}
* {{Citation| last1 = Korte | first1 = Bernhard | authorlink1=Bernhard Korte
 | last2 = Vygen | first2 = Jens
 | title = Combinatorial Optimization: Theory and Algorithms
 | year = 2012
 | isbn = 978-3-642-24487-2
 | edition = 5
 | publisher = Springer
 }}
* {{Citation| last1 = Cardoso | first1 = Nuno | last2 = Abreu | first2 = Rui | contribution = An Efficient Distributed Algorithm for Computing Minimal Hitting Sets | year = 2014 | title=Proceedings of the 25th International Workshop on Principles of Diagnosis |location=Graz, Austria | contribution-url = http://dx-2014.ist.tugraz.at/papers/DX14_Mon_PM_S1_paper1.pdf | doi=10.5281/zenodo.10037 }}
* {{Citation
 | last1=Dinur     | first1=Irit       | author-link=Irit Dinur
 | last2=Guruswami | first2=Venkatesan | author-link2=Venkatesan Guruswami
 | last3=Khot      | first3=Subhash    | author-link3=Subhash Khot
 | last4=Regev     | first4=Oded       | author-link4=Oded Regev (computer scientist)
 | title=A new multilayered PCP and the hardness of hypergraph vertex cover 
 | year=2003 
 | isbn=1581136749 | publisher=Association for Computing Machinery
 | pages=595–601
 | doi=10.1145/780542.780629 | url = https://doi.org/10.1145/780542.780629
}}
* {{Citation
 | last1=Khot      | first1=Subhash    | author-link=Subhash Khot
 | last2=Regev     | first2=Oded       | author-link2=Oded Regev (computer scientist)
 | title=Vertex cover might be hard to approximate to within 2−<math>\epsilon</math> 
 | year=2008
 | publisher=Journal of Computer and System Sciences
 | url = https://doi.org/10.1016/j.jcss.2007.06.019
 | pages=335–349
| doi=10.1016/j.jcss.2007.06.019 }}
* {{Citation
 | last1=Trevisan   | first1=Luca       | title=Proceedings of the thirty-third annual ACM symposium on Theory of computing | author-link=Luca Trevisan
 | chapter=Non-approximability results for optimization problems on bounded degree instances 
 | year=2001
 | publisher=Association for Computing Machinery
 | chapter-url = https://doi.org/10.1145/380752.380839
 | pages=453–461
| doi=10.1145/380752.380839 | isbn=1-58113-349-9 }}

== External links ==
{{commons category}}
* [http://www.nlsde.buaa.edu.cn/~kexu/benchmarks/set-benchmarks.htm Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination]
* [http://www.csc.kth.se/~viggo/wwwcompendium/node146.html A compendium of NP optimization problems - Minimum Set Cover]

{{DEFAULTSORT:Set Cover Problem}}
[[Category:Families of sets]]
[[Category:NP-complete problems]]
[[Category:Linear programming]]
[[Category:Approximation algorithms]]
[[Category:Covering problems]]