11 Approximation¶

文本统计：约 1774 个字 • 40 行代码

引入近似算法的目的是什么？解决比较困难的问题（NPC）

在解决困难问题的时候，我们找不到一个很好的算法，可以找一些替代的方法

如果数据量比较小，那么 $O(2^N)$ 的算法也是可以接受的
用多项式时间解决一些重要的特殊情况
用多项式时间找到一个近似最优的solution，这个就被称为近似算法。我们这章就主要围绕这一点展开。

11.1 Definition¶

Approximation Ratio (近似比)

An algorithm has an approximation ratio of $\rho(n)$ if, for any input of size $n$, the cost $C$ of the solution produced by the algorithm is within a factor of $\rho(n)$ of the cost $C^*$ of an optimal solution:

\[ max(\frac{C}{C^*},\frac{C^*}{C})\le \rho(n) \]

If an algorithm achieves an approximation ratio of $\rho(n)$, we call it a $\rho(n)$-approximation algorithm.

An approximation scheme（近似方案） for an optimization problem is an approximation algorithm that takes as input not only an instance of the problem, but also a value $\varepsilon$ > 0 such that for any fixed $\varepsilon$, the scheme is a (1+ $\varepsilon$)-approximation algorithm.

We say that an approximation scheme is a polynomial-time approximation scheme (PTAS) if for any fixed $\varepsilon$ > 0, the scheme runs in time polynomial in the size $n$ of its input instance.

若对于每一个固定的 $\varepsilon > 0$ ，算法 A 的运行时间以实例 $I$ 的规模的多项式为上界，则称 A 是一个多项时间近似方案 (PTAS) 。比如说 $O(n^{2/$\varepsilon})$

如果进一步要求算法 A 的运行时间以实例 $I$ 的规模和 $1/\epsilon$ 的多项式为上界，则称 A 是一个完全多项时间近似方案 (FPTAS) 。比如说 $O((1/\varepsilon)^2n^3)$

11.2 Approximate Bin Packing¶

Given N items of sizes $S_1 , S_2 , …, S_N$ , such that $0 < S_i \le1$ for all $1 \le i \le N$ . Pack these items in the fewest number of bins, each of which has unit capacity.

装箱问题是一个NP-Hard的问题

我们一共有三个算法

11.2.1 Next Fit¶

void NextFit ( )
{   read item1;
    while ( read item2 ) {
        if ( item2 can be packed in the same bin as item1 )
    place item2 in the bin;
        else
    create a new bin for item2;
        item1 = item2;
    } /* end-while */
}

【Theorem】 Let $M$ be the optimal number of bins required to pack a list $I$ of items. Then next fit never uses more than $2M – 1$ bins. There exist sequences such that next fit uses $2M – 1$ bins.

近似系数不会超过2，如何证明？

先构造（假设容量是2），只需要把 $\epsilon$ 取得很小就可以了，若一共有 $2M-1$ 个，那么最优的方法就可以只用到 $M$ 个

\[ 1+\epsilon,1+2\epsilon,1-\epsilon,1+3\epsilon,1-2\epsilon,...,1+M\epsilon,1-(M-1)\epsilon \]

再证明最优解至少有 $M$ 个，由于相邻两个箱子的和至少为2，所以这 $2M-1$ 个箱子空间至少需要 $(2M-2)/2=M-1$ ，所以至少需要 $M$ 个箱子

11.2.2 First Fit/Best Fit¶

void FirstFit ( )
{   while ( read item ) {
        scan for the first bin that is large enough for item;
        if ( found )
    place item in that bin;
        else
    create a new bin for item;
    } /* end-while */
}

这个算法可以用 $O(N\log N)$ 来实现（只需要用堆即可）

【Theorem】Let $M$ be the optimal number of bins required to pack a list $I$ of items. Then first fit never uses more than $17M / 10$ bins. There exist sequences such that first fit uses $17(M – 1) / 10$ bins.

这个定理的证明比较繁琐，就不证了

Best Fit 的算法就是找到那个把它放进去后剩余空间最少的盒子。听上去好像应该更优了，实际上近似系数与First Fit类似。这个算法同样可以通过 $O(N\log N)$ 的方法实现。

11.2.3 On-line Algorithms¶

Place an item before processing the next one, and can NOT change decision. 已经放进去的不能再拿出来，我们不知道输入什么时候结束，在线算法永远无法给出最优解

【Theorem】There are inputs that force any on-line bin-packing algorithm to use at least $5/3$ the optimal number of bins. 存在一种输入使任何的在线算法的近似系数大于等于 $5/3$

证明：我们构造一种输入方法，先假设盒子的容量是50

我们先给34个1

如果这个在线算法全把它放在一个盒子中，我们再输入2个17，如果这时候两个17被分别放在两个盒子中，那么我们再输入2个34，那么这两个34又要新开两个盒子，共五个盒子；如果这两个17被放在同一个盒子中，我们再输入3个26，那么我们又要新开3个盒子，共五个盒子。但是这两个情况最优的方法都是3个盒子，于是此时近似系数就是5/3
如果这个在线算法放在两个盒子中，此时的近似系数就大于5/3了，就重新给34个1。

根据相应的在线算法，重复上述操作，就可以使其近似系数大于等与5/3

11.2.4 Off-line Algorithms¶

View the entire item list before producing an answer. 总览全局后再给出答案

由于这个问题最让我们头疼的是体积比较大的物体，于是我们的想法是将物体按照体积从大到小依次排列，然后依次处理。借鉴前面的做法，我们依次加入物体，找到编号最小的能放的箱子放入，这个方法被称为 First Fit Decreasing (FFD)

【Theorem】Let M be the optimal number of bins required to pack a list I of items. Then first fit decreasing never uses more than $11M / 9 + 6/9$ bins. There exist sequences such that first fit decreasing uses $11M / 9 + 6/9$ bins.

这个结论已经被证明不可被优化。

这个定理的证明比较繁琐，此处略。但是我们可以证明一个较为容易的版本

Note

ProblemSolution

The $FFD$ algorithm for bin packing achieves the following bounds: $FFD(L)\le (11/9)OPT(L) + 1$, for all L.

(1)Please show that $FFD(L)\le(3/2)OPT(L)$, for all $L$, with the above inequality.

(2) Prove that the factor $3/2$ is the best possible unless $P=NP$ (note that deciding if two bins are sufficient to accommodate a set of items is NP-complete).

假设箱子的容积是1

(1) 设所有的物品为 $a_1\ge a_2\ge...\ge a_n>0$，考虑第 $j=\lceil\frac{2}{3}FFD(L)\rceil$ 个箱子。

如果它包含一个 $a_i>\frac 12$ 的物体，那么 $B_j$ 前面的箱子中物品的体积都超过 $1/2$, 由于 $a_i$ 前面的物体体积都超过 $1/2$ ，因此至少有 $j$ 个体积超过 $1/2$ 的物品，这些物品都得放在不同的箱子中，于是

\[ OPT(L)\ge j\ge \frac 23FFD(L) \]

如果 $B_j$ 里面没有体积超过 $1/2$ 的物品，那么除了最后一个箱子 $B_{FFD(L)}$，$B_j$ 及其之后的箱子 $B_j,B_{j+1},...,B_{FFD(L)-1}$ 内至少都有两个体积不超过 $1/2$ 的物品（因为$B_j$ 及其之后的箱子里面的物品都不多于 $1/2$，如果只有一件的话，那么后面的箱子不可能放物品，这是 $FFD$ 的特性）。于是至少有 $2(FFD(L)-j)+1$ 个物品都无法放入 $B_1,B_2,...,B_{j-1}$ （因为前面没法放才放后面的），所以

\[ \begin{aligned} OPT(L)&\ge\sum_{i=1}^na_i\\ &>min\{j-1,2(FFD(L)-j)+1\}\\ &\ge min\{\lceil\frac{2}{3}FFD(L)\rceil-1,2(FFD(L)-(\frac 23FFD(L)+\frac23))+1\\ &=\lceil\frac{2}{3}FFD(L)\rceil-1 \end{aligned} \]

从而 $OPT(L)\ge\lceil\frac{2}{3}FFD(L)\rceil\ge \frac{2}{3}FFD(L)$

当然根据题目的意思我们直接根据题目的不等式直接得到

当 $OPT(L)>3$ 时满足

\[ FFD(I)FFD(L)\le (11/9)OPT(L) + 1<(3/2)OPT(L) \]

当 $OPT(L)=3$ 上述不等式得到 $FFD(L)\le 4<4.5$

当 $OPT(L)=2$ 上述不等式得到 $FFD(L)\le 3$ ,亦满足等式

（2）反证法。谬设存在近似比小于 $3/2$ 的多项式时间近似算法 $A$ ，那么直接用它求解装箱问题；如果 $OPT(I)=2$ ，那么 $A(I)<3$ ，于是 $A(I)=2=OPT(I)$ ；如果 $OPT(I)≥3$ ，那么 $A(I)≥3$ 。这样我们就回答了给定物品是否能用两个箱子装下的问题。而它是一个 NP-完全问题，在多项式时间内不可解，矛盾！

绝对近似比与渐进近似比

绝对近似比指的就是前面所说的那个近似系数(也叫近似比),这个要求更加严格，在任何用例下都需要满足，然后渐进近似比就不需要这么严格，只需保证在较大的样例的时候满足系数即可。所以在上面那个用例中我们说绝对近似比不可能小于3/2，但是渐进近似比可以比3/2小，因为这个绝对近似比受限于较小的样例的情况，在这种情况下那个11/9的式子是要更宽的，但是随着样本量的增大，11/9的那个式子则更紧。

11.3 The Knapsack Problem¶

A knapsack with a capacity $M$ is to be packed. Given $N$ items. Each item $i$ has a weight $w_i$ and a profit $p_i$ .

11.3.1 Fractional version¶

If $x_i$ is the percentage of the item $i$ being packed, then the packed profit will be $p_ix_i$ .

An optimal packing is a feasible one with maximum profit. That is, we are supposed to find the values of $x_i$ such that $\sum_{i=1}^np_ix_i$ obtains its maximum under the constrains $\sum_{i=1}^n w_ix_i\le M$ and $x_i\in [0,1]$ for $1\le i \le n$

一个容易的贪心算法就是永远选择性价比最高的那个物体，反正可以放一部分，所以这个算法得到的就是最优解。

11.3.2 0-1 version¶

如果只能要么放入物体，要么不放入物体。这样的问题是NP-hard的

假设我们还使用贪心算法（选择最大价值优先和最高性价比优先两者方法较大的那个），这样的近似比是多少呢？

记 $p_{max}$ 为最优价值的物品的价值，$P_{opt}$ 为最优解得到的价值，$P_{greedy}$ 为贪心算法得到的价值，$P_{frac}$为分数版本的最大价值

于是，我们可以得到下面的不等式

（第一个不等式是因为 $P_{greedy}$ 肯定比最高性价比优先的贪心算法高，而 $P_{frac}$ 就是用最高性价比优先算法的，从而两者的差距就在于分数形式的可以放一部分，而整数形式的就不能放了，故两者的差距不可能差一个最大物品的价格。第二个不等式的原因是 $P_{greedy}$ 肯定比最大价值优先的要大，也比最高性价比优先的要大。）

\[ P_{OPT}\le P_{frac}\le P_{greedy}+p_{max} \]

\[ p_{max}\le P_{greedy} \]

故 $P_{OPT}/P_{greedy}\le1+p_{max}/P_{greedy}\le2$

由此可得近似解不会超过2.

在第八章介绍动态规划的时候，我们的作业题就是0-1背包问题，这个题目也有相应的动态规划做法。

明明已经有动态规划算法了，为什么还是NPC问题

利用动态规划的确是$O(n*w)$的时间复杂度，但是要知道，n的确是输入规模的一部分，输入了n个重量与价值，但是w并不是输入规模，对于一个数W，需要m=$\log w$的位数来表示。因此，m才是输入规模的一部分。所以$O(n*w)=O(n2^m)$，所以是NPC问题。

我们也可以证明这个0-1背包问题有FPTAS的算法（将数据同时除以一个值，降低了精度（可控制））

11.4 The K-center Problem¶

定义 $\text{dist}(s_i,C)=\min_{c\in C}\text{dist}(s_i,c)$ $r(C)=\max\{\text{dist}(s_i,C),i=1,2,..,n\}$

我们要取K个中心点构成集合 $C$ 使得 $r(C)$ 最小。

11.4.1 Naive greedy¶

一个做法是，我们每次都选择最可能成为中心的那个点，具体来说：

如果是第一个点，就选取所有点的中心；
如果不是第一个点，就选取能一个最能让 r(C) 下降的；

这个做法的 bug 比较大，假设我们的点是聚类非常明显的两个点云，那么第一个点就会落在两个点云之间，这很蠢，所以我们不浪费过多时间在这个方法上。

11.4.2 2r-Greedy¶

Centers  Greedy-2r ( Sites S[ ], int n, int K, double r )
{   
    Sites  S’[ ] = S[ ]; /* S’ is the set of the remaining sites */
    Centers  C[ ] = None;
    while ( S’[ ] != None ) {
        Select any s from S’ and add it to C;
        Delete all s’ from S’ that are at dist(s’, s)  2r;
    } /* end-while */
    if ( |C| <= K ) return C;
    else ERROR(No set of K centers with covering radius at most r);
}

我们的大致思路是先猜测一个 $r$ ，然后尝试用半径为 $2r$ 的圆去覆盖（非最优）

【Theorem】 Suppose the algorithm selects more than $K$ centers. Then for any set $C^*$ of size at most $K$, the covering radius is $r(C^*) > r$. 这个结论的意思是我们使用 $2r$ 的圆覆盖如果失败了，那么最优的 $r$ 一定比 $r$ 要大。

证明：我们设最优的中心点集为 $C^*=\{C_1,...C_k\}$，最优半径为 $r^*$，根据这个点集我们对所有的顶点做一个划分 $P^*=\{P_1,P_2,...,P_k\}$，其中 $P_i$ 中的点满足到 $C_i$ 的距离是最短的，如果某个点到几个中心的距离相等，那么放入编号最小的一个。我们可以根据距离的三角形不等式，得出每个划分中的顶点之间的距离不超过 $2r^*$。如果 $r^*\le r$，则 $2r^*\le 2r$，那么根据我们的贪心算法，如果我们选择了某一个划分中的某一个点，那么这个划分中的所有点都会在这个循环中被移除，这样我们就得到相应的合法的 $2r$ 的覆盖，这与题目假设矛盾，所以 $r^*>r$。定理证明完毕。

于是我们可以得到如下结论

如果成功，那么 $r^*\le2r$
如果失败，那么 $r^*>r$

从而我们可以实行二分逼近的方法，一开始取的非常大，然后成功的话就折半，一直到失败为止。同时根据引理可以知道，第一个失败的 $r_0$，一定满足 $r_0<r^*\le2r_0$ （这个 $2r$ 是上一个成功的结果）。

这样我们得到的答案就是 $2r_0$，从而得出近似比就是2

当然我们也可以稍微改进一下上述代码，每个循环中不再选择任意一个点，而选择距离 $C_x$ 最远的那个点

Centers  Greedy-Kcenter ( Sites S[ ], int n, int K )
{  
    Centers  C[ ] = None;
    Select any s from S and add it to C;
    while ( |C| < K ) {
        Select s from S with maximum dist(s, C);
        Add s it to C;
    } /* end-while */
    return C;
}

这个方法的近似比依然还是2，内核相比于改进前的没有什么变化，引理依然是成立的（证明过程没有任何变化）

11.4.3 近似比可以小于2吗¶

答案是否定的

【Theorem】 Unless $P = NP$, there is no $\rho$ -approximation for center-selection problem for any $\rho$ < 2.

DOMINATING-SET problem: 支配集问题，对于图 $G(V,E)$ 来说，称 $V$ 的子集 $V'$ 为支配集，若对 $\forall u \in V,\exists v\in V',s.t. (u,v)\in E$，问个数最小的支配集为多少？

Note

在该问题中，我们可以定义点到点之间的距离为点到点之间最短通路的边数，那么在该问题中距离均为整数

已知这个问题是一个 NPC 问题，如果存在近似比小于2的多项式算法 $A$，那么对于该支配集问题，我们可以（下面会讨论）在多项式时间内解决该问题，这样就产生了矛盾

怎么解决？我们设顶点数为 $n$，那么我们从 $1$ 开始，依次令 $K=1,2,..n-n-1$ ，运用算法 $A$，得到最短的距离。由于这个算法的近似比小于2，那么如果最佳的最短距离为1，由于近似比小于2，那么运用算法 $A$ 得到的最短距离就为1；如果最佳的最短距离不为1，那么算法 $A$ 得到的距离显然不为1。所以我们只要不断递增 $K$, 当第一次得到的最短距离为1的时候，这时的 $K$ 就是支配集问题的解。这个算法是多项式算法。

Summary¶

关于算法的设计，我们考虑这三个维度：

最优性(optimality)：即能求准确解；
高效性(efficiency)：即算法是否高效；
普遍性(all instances)：即算法是否普遍适用于所有的情况；

倘若一个解法：

同时满足最优性和高效性，那么这个算法对特殊情况能高效求准确解；
同时满足最优性和普遍性，那么这个算法对所有情况都能求准确解；
同时满足高效性和普遍性，那么这个算法可能是个近似算法；

就算 N=NP 成立，我们仍然无法保证三个愿望一次满足。

评论区

对你有帮助的话请给我个赞和 star =>

欢迎跟我探讨！！！