Candidate Elimination
implemented in Haskell

by Emanuel Berg
embe8573@student.uu.se

Updated: Sat Nov 2012 (24/11) 06:49

http://user.it.uu.se/~embe8573                        (general)
http://user.it.uu.se/~embe8573/candelim/ce.hs         (Haskell implementation)
http://user.it.uu.se/~embe8573/candelim/docu.txt      (this document)
http://user.it.uu.se/~embe8573/candelim/docu_wrap.txt (ditto with line breaks)

Candidate Elimination (CE) is an AI algorithm. Its purpose is to detect 
patterns (concepts) as answers to yes-or-no questions. It reaches a 
conclusion based on a training set of uniform examples: property 
vectors, all tagged with "yes" or "no".

Yes-or-no question

Start with a yes-or-no question. For example, say you have some 
controlled environment, like an ecological system inside a big bottle. 
Your question is: "what organisms will survive in this bottle?" To make 
it work with CE (that only deals with yes-or-no questions) you rephrase 
it like this: "Will [some organism] survive?"

Describe the entities of your world

Second, you must find a syntax for describing organisms in a uniform 
way, as vectors. For examples, [a, b, c] and [i, j, k] may describe two 
organisms in terms of three properties.

Note that in the real world, distinct organisms may share hundreds of 
properties; still, as far as CE is concerned, two identical vectors 
represent the same organism.

AI note 1: Inductive bias of CE

CE only considers training set examples in terms of the properties that 
are defined by you. As a consequence, CE can only learn concepts in 
terms of those properties. There is one exception: although examples 
are completely described property-by-property, concepts may also 
include * for "don't care".

Concepts are patterns: they are generalizations of training sets (that 
contain *specific* examples). The purpose of the generalization is to 
include all positive examples, and exclude those negative.

So, if you ask the CE what organisms will survive, and you get the 
answer [a, *, *], this should be read as "all organisms with <a> as 
first property will survive. The second and third properties do not 
influence whether an organism will survive".

AI note 2: Search space reduction

Most AI methods have some form of search as its center. This is fine as 
long as it is possible to cover as much of the search space as is 
necessary to find an answer (and that within reasonable time, and with 
reasonable computer resources). If this is not possible, it is said 
that the search space is intractable.

In our CE case, we could, if we end up with an intractable search 
space, reduce its size by changing the organism representation. Apart 
from reducing the search space size, this could also work to our 
advantage by streamlining the search to focus on relevant concepts 
(although a lot of them will probably still be meaningless). But, it 
may also work to our disadvantage as, if we pick properties carelessly 
or without enough understanding, we might exclude concepts (making them 
impossible to find) that *do* matter to our investigation.

Tag your entities

Third, you transplant the organisms into the bottle. When you feel 
confident whether they live or die, you tag their respective 
descriptions with "yes" or "no", indicating "lives" or "dies".

Note that the same organism cannot both live and die, although that 
would probably be common in your real-life bottle. In CE, that would 
represent inconsistent data and any run on that would crash the 
algorithm.

Although CE cannot recover from an inconsistent (crash) state, it can 
detect when it cannot proceed and notify the user.

The annotation of the examples makes the yes-and-no question implicit.

The *order* you extract examples from the training set and feed the CE 
algorithm matters. Notably, the first example must be positive. (Apart 
from that, I did not examine how the order matters.) So, strictly 
speaking, the training "set" is not a set but a list.

AI note: CE learning is supervised

CE is an example of "supervised learning": as we annotate our example's 
property vectors with "yes" or "no", we do this based on external 
(non-CE) observations (or calculations).

Run the training set

The first organism in the training set must be a survivor. Also, as 
said, all organisms must be completely described property-by-property 
(no unknown or do-not-care values). Also, the vectors must be uniform 
for all organisms: the same properties, the same number of properties, 
in the same order.

If all is good, you feed the examples (in order) into the CE, and it 
gives you an answer.

How it works

Beside the training set, you use a concept S and a set of concepts G. 
The S concept represents the *inner* bound, and the concepts of G the 
*outer* bound. A concept differs from a property vector only in that a 
concept may include an * for "don't care".

In CE, the search for the concept that answers your question is done 
*bidirectionally*: from S to G, and from G to S, simultaneously, until 
they intersect (or you run out of examples): what is left is your 
answer. (S is for specific, G for general.)

To initialize CE, you remove the first example from the training set. 
You set S to the properties of that example (i.e., you remove the plus 
sign). This will be the starting point for S' movement outward, toward 
G.

At this selfsame moment, set G to the so-called top concept, which 
includes everything. So, if your examples look like this (sign, 
[property_a, property_b, property_c]), G looks like this [*, *, *]. 
This will be the starting point for G's movement inward, toward S.

In general, S could be thought of as a set. But I implemented it as a 
concept as:

"If concepts are vectors of independent properties and * (don't care), 
then two concepts always have a unique most specific generalization. In 
that case S will always have 1 element. Thus if you remove a concept 
from S you have inconsistency." 
(http://www.it.uu.se/edu/course/homepage/ai/ht11/learning-logic.html)

What CE does on S on a positive example

On a positive example, S must be generalized to include it, if S indeed 
excludes it. This is done by replacing all properties in S with * 
(don't care) that differs from the respective properties of the 
example. E.g,

e = +[a, b, c]
S =  [a, *, d]

The first property is fine. As it is the same in S and e, it does not 
make S exclude e.

The second property is fine. As it is * (don't care) in S, it does not 
make S exclude e ("don't care" includes e's second property value, b, 
as well as all other possible values for that property).

But the third property value must be generalized in S. As it is d in S, 
this is equivalent of saying that e cannot be a positive example (an 
organism that lives), as e has value c for its third property. But e 
*is* a positive example (indicated by the plus sign), and this is an 
indisputable observation made externally of the CE algorithm. (That is, 
the training set examples have the final say.) So, S has to drop its 
insistence on d. Thus, we set S to [a, *, *], making S more general.

Last, we must make sure that the new S does not include any previous 
negative examples. If it does, the data is inconsistent.

What CE does on G on a positive example

On a positive example, we must iterate G and drop all concepts in G 
that do not include our positive example. E.g.,

e =  +[a, b, c]
G = { [a, b, *],
      [*, *, d] }

The first concept in G is fine, as it includes e. But the second 
concept of G must be dropped as it, by insisting on a third property 
value of d, is excluding e.

Note the difference here compared with the action on S. As we are 
dropping (not modifying) concepts, we don't have to check for 
consistency.

What CE does on S on a negative example

On a negative example, we need to check weather it is included or not 
by S. If it is, the data (the entire training set, collectively) is 
inconsistent.

What CE does on G on a negative example

On a negative example, all concepts in G must be specialized to exclude 
the negative example (that is, those that indeed include it). In 
addition, checks must make sure that new concepts in G does not exclude 
any previous positive examples. Those who do are dropped. (I'm unsure 
if this is wise. Perhaps, if a drop has to be made here, the data is 
inconsistent.)

Also, too specific concepts, that are overlapped by more general 
concepts in G, should be removed.

Specialization of concepts in G is done in terms of S. E.g.,

S =   [a, b, d]
e =  -[a, b, c]
G = { [*, *, d],
      [*, b, *] }

Here, the first concept of G is fine: it already excludes e.

The second element of G does include e. We need to find an * in it and 
replace it with the corresponding value in S. The first such instance 
-- the first property -- does not help us. As it is the same -- a -- in 
both S and e, it cannot help the second concept of G exclude e. But, 
the second (and last) such instance, the third property -- * in the G 
concept, d in S -- differs from the corresponding third property value 
of e, namely c. So, the second concept of G is redefined into [*, b, 
d]. However, this is a specification of an existing concept in G, 
namely [*, *, d]. So, [*, b, d] should be dropped altogether, leaving G 
as = { [*, *, d] }.

As we did not introduce any new concepts in G this time around (we did, 
but we instantly dropped it), there is no need to check weather the 
concepts of G excludes any previous positive examples.

What just happened?

Recall, we just made S more *general*. Now, we made G less general, 
thus more *specific*. We are encircling our target (the answer to our 
question) from two directions, from the specific to the general, as 
well as the other way around. This can't be stressed enough as it is 
the core of CE.

When CE terminates

When the last example has been fed (and CE is in a consistent state) 
the answer is all concepts that lie between the S concept (the 
"inside") and the "outside", whose borders are defined by the concepts 
of G. If G contains a single concept, and that concept is equal to the 
S concept, then no more narrowing down is possible.