Modelling beliefs about sets

Here is an interesting scheme I encountered in the wild, generalised and made abstract for you, my intrepid reader. Let \(X\) be a set of binary variables. We are given information about subsets of \(X\), where each update is a probability ranging over a concrete set \(A \subset X\), the state of which is described by an arbitrary quantified logic formula. For example, \[P\bigg\{\exists_{x_i, x_j \in A} \big(x_i \ne x_j))\bigg\} = p\] The above assigns a probability \(p\) to some concrete subset A, with the additional information that at least 1 pair of its members do not have the same value. Note that due to the existential quantification, the probability really ranges over the possible states of \(A\), since there could be many configurations which satisfy the form above.

At any point in time, using all information available, the task is to estimate a probability for some set of possible states also specified as an arbitrary quantified logic formula over a subset \(Q \subset X\). For example,

\[ P\bigg\{ \forall_{x\in Q} (x=1) \bigg\} \] which requires the probability for some specific concrete subset \(Q \subset X\) such that all its members have the value \(1\).

Part of the fascination for me is that it is hard to get a Bayesian solution off the ground because (1) the statements are not about the atoms of the domain, and (2) there are no true statements to condition on. There’s a Frequentist solution in principle at the end of this post, which could be made ostensibly Bayesian by arbitrarily adding priors, but I see little added value.

A Dempster-Shafer solution

Dempster-Shafer (DS) theory offers an elegant solution. DS is a theory of evidence from the 1960s in which probabilities are assigned directly to subsets of events. Formally, there is some set of events \(X\), and beliefs are assigned to the powerset with the requirement that \(\sum_{A \in 2^X} m(A) = 1\), where \(m: 2^X \to [0,1]\) (known as a mass assignment). Surprisingly, sets which have non-zero mass assignments are allowed to overlap. Further, it is required that \(m(\emptyset)=0\) (there is no assignment to the empty set) and that any probability not assigned to a subset is assigned to \(m(X)\) – known as a vacuous assignment – to make the powerset sum to \(1\). The theory defines \(bel(A) = \sum_{B\mid B \subseteq A} m(B)\) and \(pl(A) = \sum_{B\mid A \cap B \ne \emptyset} m(B)\): belief and plausibility. These roughly translate to mass directly supporting \(A\) and mass which does not contradict it: an upper and lower bound.

In the original theory, mass assignments from multiple sources are combined using Dempster’s rule: \[ m_{1,2}(A) = \frac{1}{1-K}\sum_{B \cap C=A \ne\emptyset}m_1(B)m_2(C) \] where \(K = \sum_{B\cap C = \emptyset} m_1(B)m_2(C)\). \(K\) is interpreted as a measure of conflict since it sums all mass from the two sources, the intersection of which is the empty-set. Dempster’s rule has no optimality guarantees or assurances against absurd results in the way that Bayes’ rule does, so it (or one of its many variants) has to be used with consideration of possible outcomes by design. Wikipedia gives an easy to read and more thorough introduction to the basics here.

Applied to the problem at hand, given that arbitrary logical formula can be used to specify information, our working set \(\bar{X}\) is all possible assignments to \(X\) and our mass assignment will therefore range over \(2^\bar{X}\) instead. Each new information update acts as a separate source with a mass assigned to a particular subset of \(\bar{X}\), and the rest assigned to the vacuous set. We can proceed by fusing the \(T\) updates using Dempster’s rule (\(\oplus\)) such that \(m_{\le T}=m_1 \oplus \dots \oplus m_T\). We can then use \(m_{\le T}\) to calculate belief and plausibility for any member of \(2^\bar{X}\) which ranges over all arbitrary logical formulae over a subset of \(\bar{X}\).

A proof of concept in GNU SETL

I recently picked up GNU SETL: a wonderful implementation of a set based language from the 1960s. It makes for a convenient language with which to prototype a small example of the scheme above. SETL reads a lot like formal set theory, so it should still be possible to follow along even if it is unfamiliar to you.

The whole code can be found here.

Problem statement: you are given the following information.

$ Global variables.
var 
  dom := {false,true},
  X   := {[a,b,c,d,e]: a in dom, b in dom, c in dom,
          d in dom, e in dom};

$ Information.
up1 := {[{[a,b,c,d,e] in X| a or b}, 0.7],
        [X, 0.3]};
up2 := {[{[a,b,c,d,e] in X| b or c}, 0.5],
        [X, 0.5]};
up3 := {[{[a,b,c,d,e] in X| b and d and e}, 0.8],
        [X, 0.2]};
up4 := {[{[a,b,c,d,e] in X| c and e}, 0.9],
        [X, 0.1]};

The task is to calculate belief and plausibility for the following sets.

$ Queries.
q1 := {[a,b,c,d,e] in X| b};
q2 := {[a,b,c,d,e] in X| c};
q3 := {[a,b,c,d,e] in X| d or e};
qs := [q1,q2,q3];

Lets define a Dempster’s rule operator, and belief and plausibility calculation procedures.

$ Procedures and operators.

proc fuse(a, m1, m2);
  k := +/[p1*p2: [b,p1] in m1,[c,p2] in m2| b*c={}] ? 0;
  m := +/[p1*p2: [b,p1] in m1,[c,p2] in m2| b*c=a] ? 0;
  return [a, m/(1-k)];
end;

op dempster(m1,m2);
  X_ := {fuse(b*c,m1,m2):
         [b,p1] in m1, [c,p2] in m2};
  return {[a,p] in X_| p>0};
end;

proc belief(q, m);
  return +/ [p: [a,p] in m| a subset q] ? 0;
end;

proc plause(q, m);
  return +/ [p: [a,p] in m| a*q/={}] ? 0;
end;

Lets combine the information using Dempster’s rule.

m_star := dempster/ [up1,up2,up3,up4];

Lets calculate the belief and plausibility for the queries.

(for i in [1..3])
  print('q', i, ', Belief', belief(qs(i), m_star),
        ', Plausibility', plause(qs(i), m_star));
end;

> q 1 , Belief 0.8 , Plausibility 1
> q 2 , Belief 0.9 , Plausibility 1
> q 3 , Belief 0.98 , Plausibility 1

An elegant solution; as required.

In the wild, DS is used for “sensor fusion”; a kind of task in which many sensors report overlapping beliefs about the same state of affairs, and the task is to combine them into an overall belief. I had not realised that DS theory would make for such a portable metaphor, so I am pleased to have discovered it.

A shortcoming of DS is computational complexity. In the example above, combining evidence scales by a factor of \(2\), until the full size of the powerset is reached: \(2^N\) in the general case. There are special cases with linear or polynomial complexity – such as for hierarchical evidence – but for the general case there exist various approximation algorithms (e.g. see here and here ).

A Frequentist solution

For completeness, here is a Frequentist solution in principle. Start with a probability on each member \(P(x_i=1) = p_0\). When given information, change the probabilities on the members implicated by the information to be compatible with it. This is non-identifiable in the general case because many assignments may be compatible with the information, so we can simulate, aiming to generate a representative random sample from the space of valid probability assignments. Each concrete assignment allows us to calculate \(P(Q \subset X, \dots)\) – our query – and therefore at any given \(t\), we are able to produce \(\mathbb{E}_t\big\{ P(Q \subset X, \dots )\big\}\) – an expectation for the probability assigned to our query – having used all available information as simulation constraints.

The main issue with this method for me is that it lacks elegance, but secondly, for non trivial state spaces, it seems unlikely to be tractable.