Due Monday, October 31, by 11:59 pm

Problem Description

You may be familiar with COMIC-CON, held each year in San Diego; you may be less familiar with the short-lived COMIC-CON Glen Burnie (CCB). Although largely forgotten, the conference is remembered for its contribution to Algorithms. Early on the morning of the opening day, fans formed long lines down B&A Boulevard, spilling onto Ritchie Highway. The conference organizers were concerned for the safety of the fans standing along busy roads and were confident that if the crowd could be taught to form a curving line in the large, open plaza rather than a long, straight line down the street, then everyone would be closer to the entrance and much safer. As the organizers looked out the fourth-floor window at the growing crowd, they noticed several strange things:

There were only four types of fans in the crowd: Harry Potter, Game of Thrones, Dr. Who, and Star Trek.
The Harry Potter fans would only talk to Game of Thrones fans, and vice versa. Similarly, the Dr. Who fans would only talk to Star Trek fans, and vice versa.
Two pairs of fans would never talk over each other.
Although with effort the crowd would make a bend in the line, they would never make a very sharp bend.

Two of the organizers, both UMBC CS grads, thought there must be some way to determine the optimal structure of the line given these constraints, and they came up with the following model. Denote the four fan groups by the letters $H$, $G$, $W$, and $T$: $H$ for Harry Potter, $G$ for Game of Thrones, $W$ for Dr. Who, and $T$ for Star Trek. The line, then, is represented by a sequence $A = a_1, a_2, \ldots, a_n$ where each $a_i$ is one of the characters $H$, $G$, $W$, or $T$. For example,

  A = H, G, G, T, H, W, H, W, W, H, T, G

represents a line of length 12. Then a line folding on $A$ is a set of pairs $S = \{(i, j)\}$ where $i, j \in \{1, 2, \ldots n\}$ that satifies the following conditions:

There are no sharp turns. The fans in each pair in $S$ are separated by at least four intervening fans, so if $(i,j)\in S$, then $i < j - 4$.
The elements of any pair in $S$ consists of either $\{H, G\}$ or $\{W, T\}$, in either order.
$S$ is a matching, meaning that no fan appears in more than one pair.
Pairs do not talk over each other. If $(i,j)$ and $(k,l)$ are two pairs in $S$, then we cannot have $i < k < j < l$.

Looking at our example sequence, we can write-down a valid line folding:

   A = H, G, G, T, H, W, H, W, W, H, T, G

   A valid line folding:

       W H    
      H   W     
       T-W       S = {(1, 12), (3, 10), (4, 9)}
       G-H  
       G  T
       H-G

Given a line $A$, we would like to determine the line folding that leaves as few un-matched fans as possible, in hopes that this will be the most "compact" folded line that can be made from $A$.

Dynamic Programming

There are far too many possible foldings of a long line $A$ to try all of them, so something more clever is needed if we are to determine the number of matched pairs in an optimal solution. We have some sense that solving the "big" line folding problem could be broken down into smaller sub-problems of a similar type, which suggests the use of dynamic programming. As with many dynamic programming problems, the difficulty is describing subproblems in a way that will allow us to prove there is optimal substructure.

After several failed attempts, our two UMBC grads come up with the following. Let $OPT(i, j)$ be the number of matchings in an optimal line folding for $a_i, a_{i+1}, \ldots, a_j$ where $1 \leq i, j \leq n$. We know that, by the "no sharp turns" condition, whenever $i\geq j-4$, $OPT(i, j) = 0$. Now, for any optimal line folding on $a_i, a_{i+1}, \ldots, a_j$, there are two alternatives:

$j$ is not involved in any pair, or
$j$ is matched with some $t$ where $t < j - 4$.

In both cases, they were able to relate $OPT(i,j)$ to optimal solutions for smaller subproblems, and, ultimately, write-down a recursive formula for $OPT(i, j)$. The recursive formula was used to prove the existence of optimal substructure and as the basis for a dynamic programming algorithm. Once they were able to compute $OPT(i, j)$, they quickly computed the optimal solution for the entire line: $OPT(1, n)$.

Hints and Suggestions

The approach to deriving a recursive expression for $OPT(i,j)$ is similar to, but more complicated than, the rod-cutting problem. For rod-cutting, we let $R(n)$ be the maximum revenue from cutting and selling a rod of length $n$. We know there must be a first cut, so we try all possible first cuts, which leads to the formula $$ R(n) = \max_{1\leq i\leq n} ( p_i + R(n-i) ) $$ where $p_i$ is the going price for a rod of length $i$. Since $n-i < n$, this is a recursive formula for $R(n)$.

For line folding, we have to consider two separate cases. If $j$ is not part of a pair in the maximum solution $OPT(i, j)$, then it is not difficult to see that $OPT(i, j)$ is also the maximum solution for a particular subproblem (you need to figure out which subproblem). If, on the other hand, $j$ is paired with some $t < j - 4$, we can relate $OPT(i, j)$ to two subproblems: the line of fans in front of $t$, i.e. ($a_i, a_{i+1}, \ldots, a_{t-1}$), and the line of fans behind $t$, i.e. ($a_{t+1}, a_{t+2}, \ldots, a_{j-1}$). In practice, we do not know whether $j$ is involved in a pair, and, if so, we do not know the value of $t$, so our recursive expression will involve taking the maximum over three subproblems for a range of $t$ values, much as the rod-cutting solution required taking a maximum over all possible first cuts.

The basic technique for recovering an optimal $S$ in addition to its size $|S|$ is the same as for the other dynamic programming solutions we have looked at: keep track of the optimal choice at each iteration and use these choices to reconstruct $S$. For example, in rod-cutting, we just had to keep track of the best first cut at each iteration; for LCS, we had to keep track of which of three subproblems produced the optimal path at each iteration.

Deliverables

Using $OPT(i, j)$ as defined previously, derive a recursive formula in terms of one or more subproblems of the same type. Prove that the problem has optimal substructure.
Use the recursive formula to design an algorithm to find the maximum size of a line folding of a line $A$. Provide pseucode and describe your algorithm in clear, complete sentences. Design your algorithm so that you can recover the maximal line folding $S$ in addition to the size $|S|$.
Determine an asymptotic upper bound on the running time for your algorithm; if possible, determine an asymptotic lower bound as well. Prove that your bounds are correct.
Implement your algorithm in Python, C, C++, or Java. Using test data, generate empirical timing data for different length lines. Compare the empirical performance to the theoretical running time; explain any discrepancies.
Write-up your work in a neat, well-written technical report. NO hand-written reports. Turn-in your report and code.