Don Knuth’s Volume 4, Fascicle 5 of The Art of Computer Programming has some great combinatorial puzzles in the exercises, including a variant of a puzzle called “Twenty Questions” invented by Donald Woods.

Donald Woods has written up some history of the problem that probably serves as a better introduction than I could give. Knuth introduces the puzzle as an exercise in backtracking (and has written a fast backtracking solver for a variant of the problem here), but you can also solve Twenty Questions using a SAT solver, and that’s what I’ll describe in this post.

I’ll use z3 (with its Python frontend) to find the solution to the puzzle. z3 is technically a little more than just a SAT solver, but the encoding of the problem in this post could easily be mapped down to a “pure” boolean formula and fed to a SAT solver if you were patient and careful enough.

The Questions

The questions start off easy, but the first few answers clearly depend on answers to other questions:

1. The first question whose answer is A is:

   (A) 1  (B) 2  (C) 3  (D) 4  (E) 5

2. The next question with the same answer as this one is:

   (A) 4  (B) 6  (C) 8  (D) 10  (E) 12

But wait, it gets harder! Instead of one question referencing another’s answer, some questions reference the distribution of all answers to the entire set of questions:

7. The answer that appears most often (possibly tied) is:

   (A) A  (B) B  (C) C  (D) D  (E) E

8. Ignoring those that occur equally often, the answer that appears least often is:

   (A) A  (B) B  (C) C  (D) D  (E) E

...

18. The number of prime-numbered questions whose answers are vowels is:

   (A) prime  (B) square  (C) odd  (D) even  (E) zero

Finally, question 20 doesn’t just require you to know answers to other questions in the quiz, you have to know the optimal score over any set of answers to the quiz:

20. The maximum score that can be achieved on this test is:

   (A) 18  (B) 19  (C) 20  (D) indeterminate
   (E) achievable only by getting this question wrong

You might want to leave question 20 out and solve 1-19 separately, then use the results to figure out 20. But remember, some of the first 19 questions refer to the distribution of answers on the quiz, which includes the answer to question 20!

Using z3 to find solutions

My general idea in encoding this problem as a boolean formula is:

Start by defining 100 variables: x1a, x1b, x1c, x1d, x1e, x2a, x2b, …, x20e. x1a means “Question 1 is marked A”, and so on.
Make sure that each question has exactly one answer by adding some constraints to x1a-x1e, x2a-x2e, etc.
Define 20 more variables: x1, x2, x3, …, x20. x1 means “The answer to question 1 is correct”. Each of these is defined by an expression involving the original 100 variables that represent answers to the questions.

Then we just let z3 run and try to satisfy as many of x1, x2, …, x20 as it can and interpret the results.

The Encoding

My program starts off by declaring the boolean variables, something like:

from z3 import *

x1 = Bool('x1')
...
x20 = Bool('x20')

x1a = Bool('x1a')
...
x20e = Bool('x20e')

I also want to write some helper functions that iterate over variables and answers, so I’ll define:

answers = [None]
answers.append(dict(zip(['A','B','C','D','E'],
                        [x1a, x1b, x1c, x1d, x1e])))
...
answers.append(dict(zip(['A','B','C','D','E'],
                        [x20a, x20b, x20c, x20d, x20e])))

so that answers[7]['D'] gives you x7d, for example. Also, there’s

correct = [None] + \
          [x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20]

so that correct[13] is x13.

I need to treat booleans as integers sometimes, so I do that with:

def btoi(x): return If(x,1,0)

Finally, I want to reduce some sequences of expressions using And, Or, and +, but functional programming is hard to read in Python, so I’ll use a few more helper functions:

def SumOf(xs): return reduce(lambda x,y: x+y, xs)
def AndOf(xs): return reduce(lambda x,y: And(x,y), xs)
def OrOf(xs): return reduce(lambda x,y: Or(x,y), xs)

Now we can start defining x1, x2, etc. in terms of the answers.

# 1. The first question whose answer is A is:
#     (A) 1  (B) 2  (C) 3  (D) 4  (E) 5

s.add(x1 == Or(x1a,
               And(x1b,x2a),
               And(x1c,x3a,Not(x2a)),
               And(x1d,x4a,Not(x2a),Not(x3a)),
               And(x1e,x5a,Not(x2a),Not(x3a),Not(x4a))))

z3 and Python make it easy to express some of the more complicated constraints:

# 3. The only two consecutive questions with identical answers are questions:
#    (A) 15 and 16  (B) 16 and 17  (C) 17 and 18  (D) 18 and 19  (E) 19 and 20

def same_answer(i,j):
    return Or(And(answers[i]['A'],answers[j]['A']),
              And(answers[i]['B'],answers[j]['B']),
              And(answers[i]['C'],answers[j]['C']),
              And(answers[i]['D'],answers[j]['D']),
              And(answers[i]['E'],answers[j]['E']))

s.add(x3 == Or(And(x3a, same_answer(15, 16),
                   AndOf(Not(same_answer(i,i+1)) for i in range(1,20) if i != 15)),
               And(x3b, same_answer(16, 17),
                   AndOf(Not(same_answer(i,i+1)) for i in range(1,20) if i != 16)),
               And(x3c, same_answer(17, 18),
                   AndOf(Not(same_answer(i,i+1)) for i in range(1,20) if i != 17)),
               And(x3d, same_answer(18, 19),
                   AndOf(Not(same_answer(i,i+1)) for i in range(1,20) if i != 18)),
               And(x3e, same_answer(19, 20),
                   AndOf(Not(same_answer(i,i+1)) for i in range(1,20) if i != 19))))

A few of these were a little tricky. Question 8 wants the answer that occurs least often among all answers that don’t occur the same number of times as another answer. So if A and B are both chosen 3 times and C is chosen 4 times, C would be the correct answer.

# 8. Ignoring those that occur equally often, the answer that appears least
#    often is:
#    (A) A  (B) B  (C) C  (D) D  (E) E

def other_answers(ans): return set(['A','B','C','D','E']) - set([ans])

def answer_tally(z): return SumOf(btoi(answers[i][z]) for i in range(1,21))

def least_of_distinct(ans):
    others = other_answers(ans)
    clauses = [answer_tally(ans) != answer_tally(x) for x in others]
    for x in others:
        remains = others - set([x])
        clauses.append(Or(answer_tally(ans) < answer_tally(x),
                          OrOf(answer_tally(x) == answer_tally(y) for y in remains)))
    return AndOf(clauses)

s.add(x8 == Or(And(x8a, least_of_distinct('A')),
               And(x8b, least_of_distinct('B')),
               And(x8c, least_of_distinct('C')),
               And(x8d, least_of_distinct('D')),
               And(x8e, least_of_distinct('E'))))

All of the other questions up through 19 are pretty straightforward. But what should we do about question 20?

20. The maximum score that can be achieved on this test is:

   (A) 18  (B) 19  (C) 20  (D) indeterminate
   (E) achievable only by getting this question wrong

We can’t correctly grade it just based on one solution to all of x1, x2, … x19. But if we can find at least one valid solution with the first 19 correct, then C is correct. And if we can’t find a solution with the first 19 correct, then maybe we can find one with 18 or 17 correct and fall back on B or A. So we’ll punt for now and define it optimistically as:

s.add(x20 == x20c)

We’ll hope to work down from there, re-defining it as needed.

Finding the Solution(s)

Once we’ve defined x1 through x20, we can use z3 to check whether our formula is satisfiable and, if so, get a model that maps variables to a satisfying assignment:

if s.check() == sat:
    print_solution(s.model())

My print_solution just prints out all of the answers using uppercase letters if the question is correct and lowercase otherwise. Now we can add one more constraint to maximize the score:

s.add(SumOf(btoi(correct[i]) for i in range(1,21)) >= 20)

When we do that, we see there’s no satisfying assignment at 20, so we reset our expectations and re-define x20 as:

s.add(x20 == x20b)

We update our maximization constraint to:

s.add(SumOf(btoi(correct[i]) for i in range(1,21)) >= 19)

and then find out that there is a solution with 19 correct:

A E D C A B C D C A C E D B C A D A A c

What would it mean if this were the only solution with 19 correct? Then 20(E), which says that the maximum score is achievable only by getting 20 wrong, would be correct! And once we knew that 20(E) was correct, maybe there would be a solution with all 20 correct. But if that optimal solution with 20(E) correct stands, then 20(E) (“the maximum score is achievable only by getting 20 wrong”) is necessarily incorrect! And if we can get stuck in that kind of circular reasoning going back and forth between 20(E) being true and false, doesn’t that mean that 20(D) (“the answer to 20 is indeterminate”) is then true?

Before we go too far down that path, we need to keep looking for solutions with 19 questions correct. Clearly, we need to iterate over all satisfying assignments at a given score to make any sense of a solution to 20. To do that, we can modify our check/print logic:

while s.check() == sat:
    m = s.model()
    print_solution(m)
    block_solution(s,m)

Where block_solution adds a new clause to the solver that prohibits the most recent solution:

def block_solution(s, m):
    ans = [v for vv in [answers[i].values() for i in range(1,21)] for v in vv]
    s.add(Not(And(AndOf(v == m[v] for v in ans),
                  AndOf(v == m[v] for v in correct[1:]))))

Now when we run the solver and iterate over all solutions with score at least 19, we get:

A E D C A B C D C A C E D B C A D A A c
D C E A B A D C D A E D A E D B D B E e
D C E A B E B C E A B E A E D B D A b B

Whew, so that settles it – 20(E) is incorrect since there is one way to achieve the maximum possible score (19) without getting question 20 incorrect. And 20(D) is clearly incorrect since 20(E) is incorrect and there is an absolute achievable best score for the quiz.

So B is the only correct answer for question 20 and the above three answers are the only three ways to achieve the maximum score of 19. We’re done!

Final Thoughts

Question 20 is supposed to be the kicker, but question 9 is also a little weird:

9. The sum of all question numbers whose answers are correct and the same as
   this one is in the range:
   (A) 59 to 62, inclusive
   (B) 52 to 55, inclusive
   (C) 44 to 49, inclusive
   (D) 61 to 67, inclusive
   (E) 44 to 53, inclusive

It’s the only one of the questions other than 20 that depends on its own correctness. You can see this in the way I encoded it:

def answer_sum(ans):
    return SumOf(If(And(correct[i],answers[i][ans]), i, 0) for i in range(1,21))

s.add(x9 == Or(And(x9a, answer_sum('A') >= 59, answer_sum('A') <= 62),
               And(x9b, answer_sum('B') >= 52, answer_sum('B') <= 55),
               And(x9c, answer_sum('C') >= 44, answer_sum('C') <= 49),
               And(x9d, answer_sum('D') >= 61, answer_sum('D') <= 67),
               And(x9e, answer_sum('E') >= 44, answer_sum('E') <= 53)))

When I say it “depends on its own correctness”, I just mean that x9 appears on both the left and the right side of the == in the definition of x9 (recall that correct[9] is just x9).

Think about any specific case where x9 is correct: for example, if question 9 is marked “A” and the sum of all question numbers with A as the answer that are correct (including question 9) is, say, 61. 9(A) is clearly correct here, but a grader would be just as correct if they marked 9 wrong: in that case, the sum of all question numbers that are correct and have A as the answer is now 61 - 9 = 52, which is no longer in the range [59, 62].

Since none of the ranges in question 9 span more than 9 numbers, any time question 9 is marked correct, it can also be marked incorrect – it’s all up to the grader! This sort of choice in grading never really comes into play since we’re always looking to maximize the score, but it does make you think about clever alternatives for question 20 that could introduce some indeterminate conclusions.

In fact, if you look back at my definition for block_solution, you’ll see that it blocks both the answers (x1a, x2b, etc.) and the whether or not they’re correct (x1, x2, …) when blocking an solution. You have to do this to get correct results, otherwise the solver could come across a solution where question 9 could be graded correct to get a score of 19 but choose to grade it incorrect instead, giving it a score of 18. If you only block the choices of answers, that solution gets blocked and the solver is never able to discover the “alternative grading” that would give it 19 points.

More Final Thoughts

My full Python solution is here if you want to run it or play around with the constraints.

If you think this is interesting, you should check out Knuth’s treatment of a variant of this problem in Exercises 7.2.2.71-72 of in Volume 4, Fascicle 5 of The Art of Computer Programming. The latter exercise plays around with constraints to see what happens when the answer to question 20 isn’t as clear-cut as it was in Donald Wood’s original problem.