Convergence Analysis of Stochastic Diffusion Search
Convergence Analysis of Stochastic Diffusion Search
S. Nasuto
1
& M. Bishop, Department of Cybernetics, University of Reading, Reading RG6 2AE, UK
Abstract
In this paper we present a connectionist searching technique the Stochastic Diffusion Search (SDS), capable of rapidly locating a specified pattern in a noisy search space. In operation SDS finds the position of the prespecified pattern or if it does not exist  its best instantiation in the search space. This is achieved via parallel exploration of the whole search space by an ensemble of agents searching in a competitive cooperative manner. We prove mathematically the convergence of stochastic diffusion search. SDS converges to a statistical equilibrium when it locates the best instantiation of the object in the search space. Experiments presented in this paper indicate the high robustness of SDS and show good scalability with problem size.
The convergence characteristics of SDS makes it a fully adaptive algorithm and suggests applications in dynamically changing environments.
1
sjn@cyber.rdg.ac.uk
1. Introduction
In Connex (a privately funded connectionist research programme by British Telecomm) several neural architectures have been proposed as possible solutions for locating facial features in images [1, 2, 3, 4]. Typically they use a standard multilayer perceptron architecture trained with the classical steepest descent algorithm. Due to slow learning on the raw data several preprocessing techniques were used in order to reduce the learning time. Hand [1] uses several representations of images with different resolutions. Waite [2] applies principal component representation of images, thus reducing the dimensionality of the input data. Lisboa [3] uses Gabor functions for image representations. Finally Vincent [4] uses heuristics to speed up the learning. In all cases the network scans sequentially over the image with a predefined window. Thus when locating the desired feature in these techniques time and reliability are influenced by the preprocessing and scanning of images. They also are prone to converging on false positives corresponding to local minima in the error surface used by the steepest descent algorithm. Stochastic Diffusion Search (SDS) was introduced in [5] as a part of the Stochastic Diffusion Network (SDN) for solving visual search tasks. The SDN was used in the visual domain for facial feature location [5] and in on line, real time locating and tracking of lips images in video films [6]. In both cases the SDN exhibited a high level of reliability and fast convergence in spite of imperfections in the test images. SDN did not use any preprocessing of the data for locating target features. The Stochastic Diffusion Network as shown in Figure 1 is a hybrid system comprising of a set of ntuple neurons [7], guided by Stochastic Diffusion Search towards potentially correct positions in the search space. In this article we will present a modified version of SDS derived from the algorithm described in [5]. The modification enables SDS to act as a standalone technique able to locate any prespecified object within a given space. FIGURE 1 Block diagram of SDN SDS can be placed in the family of directed random search algorithms defined by Davies in [8] to include Simulated Annealing [9] and Genetic Algorithms [10]. Simulated Annealing is a general purpose global optimisation technique for very large combinatorial problems. Most of the development to date has been in the area of largescale combinatorial optimisation (problems with discrete variable parameters) but it has also been applied to optimisation over continuous variables. Simulated Annealing is based on the concept of physical
annealing  the gradual cooling of a molten solid to produce a solid of minimum energy. In Simulated Annealing the set of parameters defining the function to be optimised (the parameter set) are stochastically adjusted, with adjustments that worsen the system performance by a factor (z) being accepted with a probability defined by the Boltzmann distribution:
Paccepte
z KT
{}
=
, where
K
is the Boltzmann constant and
T
is a parameter corresponding to temperature. Genetic Algorithms are search algorithms utilising the mechanics of Darwinian natural selection and genetics. The Parameter Set is encoded in a manner isomorphic to genetic encoding and a population of encoded parameter sets is evaluated on the optimisation problem to define a fitness measure for each of the parameter sets. New parameters sets are probabilistically derived from the old in proportion to this fitness measure. Each such cycle of evaluation and derivation is defined as one cycle of evolution. Over a series of such cycles, the parameter sets self adjust to define robust and efficient solution(s) to the problem. Stochastic Diffusion Search is a connectionist, probabilistic model comprising of a number of processing elements called agents. All agents synchronously perform their tasks and exhibit competitive cooperative behaviour. Unlike most connectionist models the agents do not process information by summing up inputs and applying to the result a nonlinear transfer function. Agents in SDS do not have fixed connections, so information processing is not carried out by adjusting weights according to some learning rule. Instead agents can communicate with each other and the information processing emerges as a collective property arising from the ability of individual agents to selectively choose other agents for this information transmission. Most connectionist models provide a solution by converging to a certain point in weight space. Once they have converged, their performance will decrease if the problem is not static in time but changes dynamically. Thus after a sufficient amount of time such a network has to be retrained to produce consistently reliable solutions. Therefore, in contrast to widespread opinion, these models are not adaptive in a true sense of this word  they do not possess the capability of detecting dynamic changes in the input and adapting to them to produce a new solution. SDS by contrast is capable of continuous exploration of the search space even after finding the optimal solution. Therefore effectively it can adapt to changes in the environment and find the new best solution. This claim is empirically supported in [6] where SDS was used for tracking in real time video films. Successive video frames can be regarded as a dynamic environment where the optimal solution changes over time. In this article we will explain how this adaptability is achieved by SDS.
We claim that SDS on its own is an interesting connectionist algorithm showing robustness towards noise and fast performance. In this article we will give a full mathematical model of SDS and will discuss its convergence. It will be shown that if the target exists in the search space, all agents will eventually converge to its position. We will also discuss how to extend the notion of the convergence of the algorithm in the case when there is no ideal instantiation of the target in the search space and we will prove that convergence also occurs in this case. Finally we will illustrate the behaviour of SDS on a simple text string search task which nevertheless is easily extendible to more interesting problems. From the presented simulations we can draw some preliminary conclusion concerning the time complexity of SDS.
2. Stochastic Diffusion Search
In this section we will introduce SDS and explain in detail the fundamental mechanism underlying its performance. Stochastic Diffusion Search effectively performs the best possible match between existing objects in the search space and the description of the target object. It follows that SDS is able to find the target if it exists in the search space, otherwise it will locate an object with the most similar description to the target. The space and the object are defined in terms of Atomic Data Units (ADU’s) which constitute the set of basic features. Each object in the search space and the target object are described in terms of ADU’s and cannot contain any other features. ADU’s can be thought of as single pixels intensities if the search space is a bit map image or can constitute some higher level properties like vertical and horizontal lines, angles, semicircles etc. If the search space and the target are described in terms of these properties or they can be letters (with the search space being a text) or nodes of a graph (the search space would be a graph with a prespecified neighbourhood). Each agent acts autonomously, and in parallel with the others tries to locate the position of the target in the search space. The position of the target is represented as coordinates of predefined reference point in the target’s description. The transmission or diffusion of information (the exchange of coordinates of potential reference point locations) enables agents to communicate with each other and to allocate computational resources dynamicaly depending on the outcome of the search. Depending on their performance in the search agents can become ‘active’, if they point to potentially correct positions within the search space, otherwise they remain ‘inactive’. All agents have access to the search space and to the description of the target object. Initially all agents are randomly initialised to reference points in the search space (e.g. if the task to solve is to locate a word ABRACADABRA in Encyclopaedia Brittannica, then agents will point to possible positions of the first ‘A’. We assume that the word ABRACADABRA does occur in Encyclopaedia Brittannica). They are also initially all set as inactive. Each of the agents then independently performs a probabilistic check of the information at the reference point by comparing a random ADU from the target object (e.g. one of the agents could have chosen a letter ‘R’ from ABRACADABRA) with a corresponding ADU in the search space (i.e. the third letter to the right of the reference point). If the test is successful then the agent becomes active, otherwise it remains inactive. In effect the activity of an agent gives an indication of the possibility that it is pointing to the correct position. However due to the partial test (only one letter is checked at a time) the possibility of false positives (activation on nontarget word like ‘CURVATURE’  the third letter to the right of the reference point is ‘R’ as in the target) is not excluded nor is the probability of false negatives (failing to activate on the best match to the target in the case the ideal instantiation of the target does not exist in the search space e.g. if ‘ABRACADABRA’ was missspelt). In this way Stochastic Diffusion Search can escape from local minima corresponding to objects partially matching the description of the target.
Subsequently in the diffusion phase each inactive agent selects, at random, another agent with which to communicate. Depending on whether the chosen agent is active or not the choosing agent will point either to the same reference point as the active one or, if the chosen agent is inactive, will randomly reinitialise its position. Active agents do not sample other agents for communication but they also undergo a new testing phase and depending on the outcome may retain activity ( if an agent points actually to ‘ABRACADABRA’ then it will remain active and will always point towards this position regardless of the letter chosen for the testing phase) or become inactive (in our example an agent pointing to ‘CURVATURE’ would become inactive if it chose for example the sixth letter to the right of the reference point for the next testing phase). This process iterates until a statistical equilibrium state is achieved. The halting criterion used in [5] monitored the maximal number of agents pointing to the same position in the search space (in our example, the more letters a given word has in common with ‘ABRACADABRA’ the more likely the agent pointing to it is to remain active, and therefore to attract other agents to this word via the diffusion phase). If the number of agents in this cluster exceeds a certain threshold and remains within certain boundaries for a number of iterations then it is said that SDS have reached its equilibrium state and the procedure is terminated. Even though agents act autonomously and there is only a very weak form of probabilistic coupling, it nevertheless enables agents to develop a cooperative behaviour. In order to establish the convergence of the Stochastic Diffusion Search we will introduce in the next section a mathematical model of SDS based on Markov Chain theory [11].
3. Model.
In the most general case, stochastic diffusion search is supposed to locate the target or if it does not exist in the search space its best instantiation. Therefore from now on we will refer to the object sought by SDS as the target. Let the search space size be
N
(measured as a number of possible locations of objects). Let the probability of locating the target in a uniformly random draw be
p
m
and let the probability of locating the suboptimal object (one sharing to some extent common features with the target) be
p
d
. Let the probability of a false positive be
p
+
and the probability of false negative be
p

.
Assume that there are
M
agents. The state of the search in the
n
th
step is determined by the number of active agents pointing towards the position of the target and active agents pointing towards the false positives (the number of nonactive agents will be equal to the difference between the total number of agents and the two numbers). This is because, by assumption, only active agents are considered as carrying potentially useful information and effectively they influence the search directions of all other agents. Also the strong halting criterion uses only information from active agents. Thus in effect we have finite number of discrete states each characterised by the pair of two natural numbers. Stochastic Diffusion Search changes its state in a random manner and the possible future evolution of the SDS can be influenced by the past only via the present state (agents are memoryless and the information about the past evolution of SDS is contained in its current configuration) thus effectively it can be modelled by a Markov Chain.