In our first two approaches, we came up with the riddles ourselves, which has the advantage of immediately ensuring the riddle is easily human-interpretable. However, it also has some limitations. The main one is that we cannot go through all possible combinations of model and announcement. This would be a herculean task that is not just outside the scope of this project, but would be for anyone. Computers, however, can process logical computations such as these much faster, allowing us to simple define what a riddle should look like, and letting the program check each possibility. This is what we have done for our third and final approach to our problem, we have created a riddle generator and solver which combined should find a riddle that can be used to test Theory of Mind ability.
The code was written in python 3.14.2 inside a jupyter notebook. It uses the python packages numpy, typing, copy, networkx, matplotlib and pickle, the versions of which are available in the requirements file associated with the project, which can be found in our github repository. There are three parts to the code, which are the model generator class, the announcement generator function and the solver class. After this is a demo of how to use the code. In multiple places, the code has a certain percentage chance of doing something or a limit to how many times it can do something. The values for these chances and limits are mostly arbitrarily chosen, based on intuition and with the thought in mind that these values can easily be changed in a future iteration of this project, as we did not have the time to perform a parameter sweep nor could we find any sources on what kind of values would be reasonable choices. For similar reasons, we have decided to choose our parameters quite conservatively, as we wanted to be able to fully explore a certain parameter space. We have written our code such that it could easily be altered to explore a larger space, but due to time limitations, running this expansion has fallen outside the scope of our project.
The class Model serves the purpose of containing all parts of the model that would describe the situation of the riddle. In the case of Cheryl’s birthday, a model object would contain the different worlds with all of their possible birthdays and the relations between those worlds, indicating agents A and B knowing the month or day of the birthday. When a class object is initiated, the user can give values for each of the four variables the Model class uses, Agents, Worlds, Valuations and Relations.
Announcements are generated with a single recursive function. This function creates a single string composed of K-operators, not-operators, and-operators and prepositional atoms. The output formula always starts with a random number (between 5 to 12) of K-operators, each of which having a 20% chance of being negated. After the string of knowledge, there is a 90% chance a single prepositional atom will be appended (also having a 20% chance of being negated), and a 10% chance of an and-operator appearing instead. If the and-operator is chosen, the left and right side of the operator are generated by calling the function again, this time allowing for only 0 to 2 K-operators. Since this recursively calls the function, either side of the and-operator again has a chance of containing another and-operator, although unlikely.
We decided to use these operators as these are the operators that seemed most common in riddles like this, due to for example the or-operator being quite ambiguous in the english language, having no distinction between inclusive and exclusive or. The decision to only use one single announcement was for simplicity. Our model and announcements can get quite complicated on their own already, so we decided that this would be a good part to compromise on, leaving it for future research instead.
The Solver class is quite straightforward, and is designed to work with the model and announcement generators. It takes three inputs, a formula, a model, and an optional boolean verbosity check. As default values, the solver will generate its own Model object and announcement string, and verbose will be set to False.
The solver iterates over versions of the formula, removing one ‘depth’ of the K-operators that appear at the start of the formula after each loop. It first computes the product update of the action model belonging to the announcement formula and the model. It does this by checking the truth value of the announcement string in each world, and leaving all the worlds in which it is true. It does so recursively, adding up all the truth values throughout the string based on which characters it is reading. This recursive evaluation function has been optimized a little to remember what kind of facts it already knows are true for each world. When it computes a new truth value for a world, whether that is as simple as ‘’$!p$ holds in $w_0$’’ or a more complicated statement like ‘’$\neg K_EK_AK_EK_B\neg K_DK_EK_CK_BK_EK_D(K_Cb)\land (K_A(d)\land (K_Bc))$ does not hold in $w_1$’’, it will save this formula - world combination in a dictionary that it keeps from the start of its computation. This resets when the next model-announcement combination is tried, as the old truths will be useless in a new world.
When it knows which worlds are consistent with the announcement that it is currently processing, it checks what it can answer about the current states. If there are no worlds left over in which the announcement is true, then it will say that the announcement was inconsistent as is. If there is a single world left over, it will answer that world. If there are multiple worlds left over, it will iterate over each atom and answer whether it knows for sure if it is true, false, or whether it is not sure. Any of these three conclusions count as an answer, and get added the the list of answers that this solver has so far. After this is where the solver removes one K-operator from the formula and restarts the loop, computing the new product update.
When it has calculated all of its answers for all the different orders of ToM, it checks if all answers were unique. In this case, unique answers also account for knowing or not knowing for certain whether a prepositional atom is correct. ‘p’ (p is true), ‘!p’ (p is false) and ‘p?’ (I don’t know p) are all valid, unique answers that would be accepted. If it finds that all answers are indeed unique, a satisfactory announcement model combination has been found and we would be able to use this to create a riddle that can be used in Theory of Mind experiments.
If ‘verbose’ is set to False (the default), the solver will not print anything unless it finds a satisfactory riddle, which it will always print. If set to True, it will print the formula, the worlds and the current answer for each iteration of the formula (each order of ToM).
At the bottom of the jupyter notebook, there is some example code to run our solver. In this example code, the tried combinations of model and announcement are added to a dictionary that is checked against to make sure we don’t try the same combination multiple times. This dictionary is pickled to a file when the loop is stopped, allowing the user to come back to run it whenever they have time.
So far, we have run the solver for at least an hour after already having ran it during testing for some time as well, and the dictionary with combinations contains about 4000 unique announcements that have been tried against many different models. Very few unique combinations are now found, and most possibilities have been tried. No unique riddle has been found by the solver.
During testing, when we had other requirement for what a good riddle would be, we did find a few satisfying riddles, but these were all only valid for up to the fourth order ToM, and since riddles to this order already exist as shown by our second approach, we decided we would not report on these.