You may be familiar with COMIC-CON, held each year in San Diego; you may be less familiar with the short-lived COMIC-CON Glen Burnie (CCB). Although largely forgotten, the conference is remembered for its contribution to Algorithms. Early on the morning of the opening day, fans formed long lines down B&A Boulevard, spilling onto Ritchie Highway. The conference organizers were concerned for the safety of the fans standing along busy roads and were confident that if the crowd could be taught to form a curving line in the large, open plaza rather than a long, straight line down the street, then everyone would be closer to the entrance and much safer. As the organizers looked out the fourth-floor window at the growing crowd, they noticed several strange things:
Two of the organizers, both UMBC CS grads, thought there must be some way to determine the optimal structure of the line given these constraints, and they came up with the following model. Denote the four fan groups by the letters \(H\), \(G\), \(W\), and \(T\): \(H\) for Harry Potter, \(G\) for Game of Thrones, \(W\) for Dr. Who, and \(T\) for Star Trek. The line, then, is represented by a sequence \(A = a_1, a_2, \ldots, a_n\) where each \(a_i\) is one of the characters \(H\), \(G\), \(W\), or \(T\). For example,
A = H, G, G, T, H, W, H, W, W, H, T, Grepresents a line of length 12. Then a line folding on \(A\) is a set of pairs \(S = \{(i, j)\}\) where \(i, j \in \{1, 2, \ldots n\}\) that satifies the following conditions:
Looking at our example sequence, we can write-down a valid line folding:
A = H, G, G, T, H, W, H, W, W, H, T, G A valid line folding: W H H W T-W S = {(1, 12), (3, 10), (4, 9)} G-H G T H-G
Given a line \(A\), we would like to determine the line folding that leaves as few un-matched fans as possible, in hopes that this will be the most "compact" folded line that can be made from \(A\).
There are far too many possible foldings of a long line \(A\) to try all of them, so something more clever is needed if we are to determine the number of matched pairs in an optimal solution. We have some sense that solving the "big" line folding problem could be broken down into smaller sub-problems of a similar type, which suggests the use of dynamic programming. As with many dynamic programming problems, the difficulty is describing subproblems in a way that will allow us to prove there is optimal substructure.
After several failed attempts, our two UMBC grads come up with the following. Let \(OPT(i, j)\) be the number of matchings in an optimal line folding for \(a_i, a_{i+1}, \ldots, a_j\) where \(1 \leq i, j \leq n\). We know that, by the "no sharp turns" condition, whenever \(i\geq j-4\), \(OPT(i, j) = 0\). Now, for any optimal line folding on \(a_i, a_{i+1}, \ldots, a_j\), there are two alternatives:
In both cases, they were able to relate \(OPT(i,j)\) to optimal solutions for smaller subproblems, and, ultimately, write-down a recursive formula for \(OPT(i, j)\). The recursive formula was used to prove the existence of optimal substructure and as the basis for a dynamic programming algorithm. Once they were able to compute \(OPT(i, j)\), they quickly computed the optimal solution for the entire line: \(OPT(1, n)\).
The approach to deriving a recursive expression for \(OPT(i,j)\) is similar to, but more complicated than, the rod-cutting problem. For rod-cutting, we let \(R(n)\) be the maximum revenue from cutting and selling a rod of length \(n\). We know there must be a first cut, so we try all possible first cuts, which leads to the formula $$ R(n) = \max_{1\leq i\leq n} ( p_i + R(n-i) ) $$ where \(p_i\) is the going price for a rod of length \(i\). Since \(n-i < n\), this is a recursive formula for \(R(n)\).
For line folding, we have to consider two separate cases. If \(j\) is not part of a pair in the maximum solution \(OPT(i, j)\), then it is not difficult to see that \(OPT(i, j)\) is also the maximum solution for a particular subproblem (you need to figure out which subproblem). If, on the other hand, \(j\) is paired with some \(t < j - 4\), we can relate \(OPT(i, j)\) to two subproblems: the line of fans in front of \(t\), i.e. (\(a_i, a_{i+1}, \ldots, a_{t-1}\)), and the line of fans behind \(t\), i.e. (\(a_{t+1}, a_{t+2}, \ldots, a_{j-1}\)). In practice, we do not know whether \(j\) is involved in a pair, and, if so, we do not know the value of \(t\), so our recursive expression will involve taking the maximum over three subproblems for a range of \(t\) values, much as the rod-cutting solution required taking a maximum over all possible first cuts.
The basic technique for recovering an optimal \(S\) in addition to its size \(|S|\) is the same as for the other dynamic programming solutions we have looked at: keep track of the optimal choice at each iteration and use these choices to reconstruct \(S\). For example, in rod-cutting, we just had to keep track of the best first cut at each iteration; for LCS, we had to keep track of which of three subproblems produced the optimal path at each iteration.