CIS Logo SVC Logo

   Computing & Information Science
   Department

 

Outlook Blackboard Facebook          Search CIS Site       Tutorials

Software Design Using C++



Computer Science Theory


The idea here is to present some basic computer science theory that has practical implications. The areas discussed include unsolvable problems, exponential running times, the classification of problems, and proofs of program correctness.

The General Halting Problem


The question: Is there an algorithm which when given any program and any set of input data for the program, can decide in advance whether or not the program will run forever (i.e. never halt)?

Theorem: There can be no such decision algorithm (assuming Church's thesis, which in practice everyone accepts and which says that every function that is computable by some sort of algorithm is computable by a Turing machine).

Practical Conclusion: There are definite limits to what our computer programs can do. There may be no algorithms possible to do some desirable things. In other words, some problems are simply unsolvable.

Exponential Running Times


We have already learned that exponential running times are "horrible" and polynomial running times are fairly reasonable. It appears that some problems may have only exponential running time solutions. This means that except for extremely simple cases of these problems they cannot be solved in a reasonable amount of time on even the fastest of computers (and even though a correct computer program exists to calculate the solutions).

Example: The Traveling Salesperson Problem (TSP)


The problem: Given n interconnected cities and the distances between them, find the lowest-distance tour of all the cities, starting and ending with a given city. A tour never revisits a city (other than the starting city, which we obviously visit twice).

All known solutions of this problem require exponential running time. Thus as n increases, the running time increases at least as fast as some 2^n function (which may have a positive constant in front of the 2^n). When we have n = 10 cities, the problem is solvable in a reasonable amount of time. However, it has been calculated that by the time we reach just n = 30 cities, it would take about 84 trillion centuries on a supercomputer that could do a billion operations per second!

Other problems that we try to solve by computer may also turn out to require exponential running times and hence only be solvable in a reasonable amount of time for very small values of n.

Problem Classification


[Classes P, NP, etc.]

A nondeterministic algorithm is one that assigns different cases of a problem to different processors. It assumes that as many processors are available as needed (an assumption that is not true of real parallel processing machines since they have a definite fixed number of processors).
  • Class P: Problems solvable in polynomial time on a single CPU
  • Class NP: Problems solvable in polynomial time with a nondeterministic algorithm
Note that P is a subset of NP. It is thought that P and NP are not equal, so that NP is larger than P, though no one has been able to prove this. For example, the TSP is in NP, but as far as we know is not in P.

A problem is NP-complete provided it is in NP and every problem in NP can be reduced to this problem in polynomial time. The TSP has been shown to be NP-complete. This means that if anyone ever does find a polynomial-time solution to the TSP, then all problems in NP will be solvable in polynomial time (hence proving that P = NP after all).

Finally, recall that the General Halting Problem is unsolvable. It, then, gives an example of a problem in the final box of our problem classification scheme.

Conclusion: There are many nasty problems (those not in P) that computers cannot solve at all or cannot solve in a reasonable length of time.

Optional Topic: Proofs of Program Correctness


Various efforts have been made to give formal mathematical proofs of the correctness of programs. This would avoid the problem that testing can never guarantee 100% correctness. The difficulty is that formal proofs are very hard to produce and are so far only possible for very short programs (e.g. a few pages in length). Proofs of correctness for commercial software are not possible in the foreseeable future.

Other drawbacks include:

  1. The proof method assumes that the formal specifications for the problem are correct. If the specifications are wrong, then the program may have been proven to meet these specifications, but still may not do what was intended.
  2. When unexpected input conditions arise, the program may behave in unusual ways, since these conditions were never planned for or covered in the proof. (This is called a design error.)
  3. These proofs usually ignore practical limitations of computers such as round-off error, integer overflow upon addition, running out of usable memory space, etc.
Especially tricky is the fact that programs with loops require the writing of a loop invariant condition describing (in terms of the variables) what remains the same every time around the loop. The inference rule for a while loop goes as follows:


To prove:
   // Precondition      what is true before the loop starts
   while (Condition)
      // Invariant      what is true each time at this point
      body;
   // Postcondition     what we prove is true when the loop quits

It suffices to prove the following 3 steps:

   1) Precondition implies Invariant

   2) Invariant and Condition
      body              means execute the loop body
      Invariant         Show it is true again after 1 execution of body

   3) [Invariant and (not Condition)] implies Postcondition

Example


Suppose that array a contains integers at indices 0, 1, 2, etc. with -1 marking the last entry. An example of such an array is shown below:

[Array drawing]


total = 0;
k = 0;
// Precondition: total = 0, k = 0, array a contains data as described
while (a[k] != -1)
   // Invariant: total contains the sum of a[0] through a[k-1]
   {
   total = total + a[k];
   k++;
   }
// Postcondition: total contains the sum of a[0], a[1], etc. up to but
// not including the first -1 in the data

To prove that the postcondition would hold requires proving the 3 steps given above. In outline the proof goes like this:
  1. If the precondition holds, then the Invariant trivially holds, since there is no data from indices 0 to k-1 = -1 and a total of 0 is correct for a sum involving no data.
  2. If the Invariant and Condition hold and we execute the loop body, then we have added one more number to total and advanced k by 1. Since the Condition holds we know that the number added was not the -1. Since the old total was the sum of a[0] through a[k-1], so is the new total (with 1 more number included in the sum and with k now one larger).
  3. If the Invariant and (not Condition) hold then a[k] is the -1 that marks the end of the data. Putting this together with the Invariant says that total is the sum of all items up to but not including the -1. That is what we wanted to prove.
You can see that proofs of correctness are difficult! So, although such proofs have some practical utility in small programs, the degree of practicality can be called into question.

Back to the main page for Software Design Using C++

Author: Br. David Carlson with contributions by Br. Isidore Minerd
Last updated: August 27, 2009
Disclaimer