Practical Proofs

Proof Approximations for Practical Code

Have you ever written a correctness proof of a program? It's a lot of work, error prone, and not always possible. The proofs themselves can have bugs, so The Tao of Programming gives us this paradox/wisdom:

“Even perfect programs have bugs.”

Faced with these difficulties, Simon Peyton Jones suggests scaling back, “I think much more productive for real life is to write down some properties that you'd like the program to have. You'd like to say, ‘This valve should never be shut at the same time as that valve.’ ‘This tree should always be balanced.’ ‘This function should always return a result that's bigger than zero.” Maybe in the future researchers will find a way to easily prove entire programs correct, but for now, it's still an area of research.

Until then we have proof approximations. Good programmers are always trying to explain in their mind why a piece of code works. They try to think of every possibility where things could go wrong. They don't always succeed, but when failure happens, they try to think of a way to avoid that failure in the future.

A proof is a careful explanation
of why something is true.

This is the basis of practical proofs. Good programmers are already thinking with the proving mindset, but they don't always communicate their reasoning to others. For a good proof approximation, you need to have an explanation why your code is correct, and also explain your reasoning to others.

Some proofs are very short. Here is a famous proof of Pythagoras' theorem. It is a single word:

picture of four triangles surrounding a square making a larger square shape

Behold!

Ideally, an author would write all the steps of the proof, but sometimes it gets too tedious so they omit steps. Find a balance between “too tedious” and “not enough info.”

Here is an example: threading. Threading is chosen because it's hard: you can't test every possibility with unit tests because the cyclomatic complexity is too high: there are too many possibilities. To avoid bugs, you need, at a minimum, to have an informal explanation of why your code is correct.

Here we want to prove two things: that there are no race conditions, and there are no deadlocks. Notice how the programmer solved these problems, and tried to communicate her solution.

//------------------------------------------
// All references to 'slicedMelon' should
// occur in this section. You should be
// able to visually inspect that each
// LOCK has a matching UNLOCK. This mutex
// should not be used anywhere else.
//------------------------------------------
//
// No race conditions: only touch this variable
//                        with a mutex acquired.
// No deadlocks: Coffman condition four
//
FUNCTION(getResetSlicedMelon,
`
   LOCK(slicedMutex)

   movl  _slicedMelonLocation(%rip), %r12d
   movl  $-1, _slicedMelonLocation(%rip)

   UNLOCK(slicedMutex)

   movl  %r12d, %eax
')

//
// Stores a value in slicedMelon
// No race conditions: mutex required for access.
// No deadlocks: the circular wait chain
//                is clearly broken here.
//
FUNCTION(putSlicedMelon,
`
   movl  %edi, %r12d

   LOCK(slicedMutex)
   
   movl  %r12d, _slicedMelonLocation(%rip)

   UNLOCK(slicedMutex)
')

You can make this easier; for example, Ken Thompson writes his code simply. He says, “There's one place where you add a feature and it fits; fragile code, you've got to touch ten places....If there's something that characterizes my code: it's simple, choppy, and little. Nothing fancy. Anybody can read it.”

And this particular example could be improved: it could automate unlocking with a macro, it could use a language or library that supports higher-level threading constructs. The example uses assembly to draw attention to the salient points and avoid getting distracted by details of a language.

The programmer here wanted to make sure that only one valve could be open at a time. Her company decided against formal verification, so she put all the logic in one place, and wrote comments to describe for future programmers why it was correct.

//-----------------------------------------------
// - Valve guard -
// All valve logic should be located here.
// Only use these functions to operate a valve.
//-----------------------------------------------
//
// Opens the valve only if the right valve
// is not open.
// Returns - true if the valve is now open
//
FUNCTION(openLeftValve,
`
   movl $ 0, rv

   ifEqual($ 0, _rightValveOpen(%rip),
      `movl $ 1, _leftValveOpen(%rip)
       movl $ 1, rv
       hardwareOpen(left)
   ')
')

//
// Opens the valve only if the left valve
// is not open.
// Returns - true if the valve is now open
//
FUNCTION(openRightValve,
`
   movl $ 0, rv

   ifEqual( $ 0, _leftValveOpen(%rip),
      `movl  $ 1, _rightValveOpen(%rip)
       movl  $ 1, rv
       hardwareOpen(right)
   ')
')

This example can be improved (I wrote it this way so you can learn how to make your own code better). What does hardwareOpen() do when an error occurs? Understanding what the API calls do in every case is imperative for writing correct approximation proofs.

Threaded code is an area where you need to have an informal proof. Be careful also when you are dealing with input from the user: it can burn your house down no matter what language you use, and routine testing usually doesn't catch security bugs.

Beginning to think this way will improve your code, reduce your bug count, and increase your coding speed.

Some source code is available, feel free to look.