This example demonstrates the implementation of a core subset selection problem (see example 1A and example 1B) with a different objective: maximize the average distance of each selected item to the closest other selected item (instead of the average distance between all pairs). To be able to provide an efficient delta evaluation, the objective generates custom evaluation objects that store additional metadata. Both a basic random descent as well as an advanced parallel tempering algorithm are applied to optimize the core.
To evaluate a subset we iterate over all selected items and find the closest other selected item to compute the average entry-to-nearest-entry distance. This value is to be maximized.
public class EntryToNearestEntryObjective implements Objective<SubsetSolution, CoreSubsetData>{
public Evaluation evaluate(SubsetSolution solution, CoreSubsetData data) {
double value = 0.0;
if(solution.getNumSelectedIDs() >= 2){
// sum distances to closest other selected item
int numDist = 0;
double sumDist = 0.0;
Set<Integer> selected = solution.getSelectedIDs();
for(int sel : selected){
// find closest other selected item
int closestOther = findClosest(sel, selected, data);
// add distance to sum
sumDist += data.getDistance(sel, closestOther);
numDist++;
}
// divide by number of distances
value = sumDist/numDist;
}
return SimpleEvaluation.WITH_VALUE(value);
}
// finds the item in the given collection that is closest to the given item;
// null if the collection does not contain any items different from the given item
private Integer findClosest(int item, Collection<Integer> group, CoreSubsetData data){
double dist;
Double minDist = null;
Integer closestOther = null;
for(int other : group){
if(other != item){
dist = data.getDistance(item, other);
if(minDist == null || dist < minDist){
minDist = dist;
closestOther = other;
}
}
}
return closestOther;
}
public boolean isMinimizing() {
return false;
}
}
We might also want to include an efficient delta evaluation (see example 1B) to speed up applied neighbourhood searches.
public Evaluation evaluate(Move move, SubsetSolution curSolution, Evaluation curEvaluation, CoreSubsetData data){
// provide efficient delta evaluation here ...
}
However, for this objective, it is not possible to efficiently compute the modified evaluation when only the current solution, its evaluation (value) and the applied move are available.
The full evaluation from above finds the closest other selected item for each item in the selection, but this information is lost when
returning the average of the respective distances wrapped in a SimpleEvaluation.
If we keep track of this metadata, it can be efficiently updated
when performing a delta evaluation. The new average entry-to-nearest-entry distance is then inferred from the updated metadata.
With this approach in mind we create a custom evaluation object that implements the Evaluation interface.
It keeps track of the closest neighbour of each item in the selection. The only required method getValue()
returns the average entry-to-nearest-entry distance which is updated whenever the metadata changes.
public class EntryToNearestEntryEvaluation implements Evaluation {
// maps items to closest other items (IDs)
private Map<Integer, Integer> closestItemMap;
// maps items to distance to respective closest item
private Map<Integer, Double> minDistMap;
// sum of distances from items to respective closest items
private double minDistSum;
public EntryToNearestEntryEvaluation() {
closestItemMap = new HashMap<>();
minDistMap = new HashMap<>();
minDistSum = 0.0;
}
// deep copy constructor
public EntryToNearestEntryEvaluation(EntryToNearestEntryEvaluation toCopy){
closestItemMap = new HashMap<>(toCopy.closestItemMap);
minDistMap = new HashMap<>(toCopy.minDistMap);
minDistSum = toCopy.minDistSum;
}
// add item
public void add(int itemID, int closestOtherItemID, double distance){
// update minimum distance sum
minDistSum += distance;
// update metadata
closestItemMap.put(itemID, closestOtherItemID);
minDistMap.put(itemID, distance);
}
// remove item
public boolean remove(int itemID){
if(closestItemMap.containsKey(itemID)){
// update minimum distance sum
minDistSum -= minDistMap.get(itemID);
// update metadata
closestItemMap.remove(itemID);
minDistMap.remove(itemID);
return true;
}
return false;
}
// update closest item
public boolean update(int itemID, int closestOtherItemID, double distance){
if(closestItemMap.containsKey(itemID)){
// update minimum distance sum
minDistSum -= minDistMap.get(itemID);
minDistSum += distance;
// update metadata
closestItemMap.put(itemID, closestOtherItemID);
minDistMap.put(itemID, distance);
return true;
}
return false;
}
// get closest item (null of no closest item registered)
public Integer getClosest(int itemID){
return closestItemMap.get(itemID);
}
// return average distance from each item to closest item; 0.0 if no distances
public double getValue() {
int numDistances = minDistMap.size();
if(numDistances > 0){
return minDistSum/numDistances;
} else {
return 0.0;
}
}
}
Now we can modify the objective to use this custom evaluation type and provide an efficient delta evaluation. The received
evaluation of the current solution is cast to an EntryToNearestEntryEvaluation so that we can access
and update the metadata. This cast can not fail as both evaluate(...) methods (full and delta) return an evaluation of this type.
If an item's closest neighbour is not removed from the selection it suffices to check whether any of the added items is even closer.
Of course, this is much more efficient as compared to rescanning the entire new selection to find each item's closest neighbour.
public class EntryToNearestEntryObjective implements Objective<SubsetSolution, CoreSubsetData>{
public Evaluation evaluate(SubsetSolution solution, CoreSubsetData data) {
// initialize evaluation object
EntryToNearestEntryEvaluation eval = new EntryToNearestEntryEvaluation();
// find closest neighbour of each item in the selection
Set<Integer> selected = solution.getSelectedIDs();
for(int sel : selected){
// find closest other selected item
Integer closestOther = findClosest(sel, selected, data);
// register closest item in evaluation object (if any)
if(closestOther != null){
eval.add(sel, closestOther, data.getDistance(sel, closestOther));
}
}
return eval;
}
// finds the item in the given collection that is closest to the given item;
// null if the collection does not contain any items different from the given item
private Integer findClosest(int item, Collection<Integer> group, CoreSubsetData data){
// same as before ...
}
public Evaluation evaluate(Move move, SubsetSolution curSolution, Evaluation curEvaluation, CoreSubsetData data){
// check move type
if(!(move instanceof SubsetMove)){
throw new IncompatibleDeltaEvaluationException("Entry to nearest entry objective should be used in "
+ "combination with neighbourhoods that generate moves of type SubsetMove.");
}
// cast move
SubsetMove subsetMove = (SubsetMove) move;
// cast evaluation (cannot fail as both evaluate methods return such evaluation object)
EntryToNearestEntryEvaluation eval = (EntryToNearestEntryEvaluation) curEvaluation;
// copy to initialize new evaluation
EntryToNearestEntryEvaluation newEval = new EntryToNearestEntryEvaluation(eval);
// get added and deleted IDs from move
Set<Integer> added = subsetMove.getAddedIDs();
Set<Integer> deleted = subsetMove.getDeletedIDs();
// get current selection from solution
Set<Integer> curSelection = curSolution.getSelectedIDs();
// infer new selection
List<Integer> newSelection = new ArrayList<>(curSelection);
newSelection.addAll(added);
newSelection.removeAll(deleted);
// discard contribution of removed items
for(int item : deleted){
newEval.remove(item);
}
// update closest items in new selection
for(int item : newSelection){
Integer curClosest = newEval.getClosest(item);
if(curClosest == null){
// case 1: previously unselected or no closest item set (less than two items selected);
// search for closest item in new selection
Integer newClosest = findClosest(item, newSelection, data);
// register, if any
if(newClosest != null){
newEval.add(item, newClosest, data.getDistance(item, newClosest));
}
} else {
// case 2: current closest item needs to be updated
if(deleted.contains(curClosest)){
// case 2A: current closest item removed, rescan entire new selection
Integer newClosest = findClosest(item, newSelection, data);
// update, if any
if(newClosest != null){
newEval.update(item, newClosest, data.getDistance(item, newClosest));
} else {
// no closest item left (less than two items selected); discard
newEval.remove(item);
}
} else {
// case 2B: current closest item retained; only check if any newly
// added item is closer
Integer closestAddedItem = findClosest(item, added, data);
if(closestAddedItem != null
&& data.getDistance(item, closestAddedItem) < data.getDistance(item, curClosest)){
// update closest item
newEval.update(item, closestAddedItem, data.getDistance(item, closestAddedItem));
}
}
}
}
return newEval;
}
public boolean isMinimizing() {
return false;
}
}
As in example 1A we can easily apply a basic random descent search to select a core,
using the predefined SingleSwapNeighbourhood. The only difference is the objective function.
// set name array and distance matrix (e.g. read from a file)
String[] names = ...
double[][] dist = ...
// create data object
CoreSubsetData data = new CoreSubsetData(names, dist);
// create objective
EntryToNearestEntryObjective obj = new EntryToNearestEntryObjective();
// combine data and objective in a subset problem, specify desired subset size
int subsetSize = ...
SubsetProblem<CoreSubsetData> problem = new SubsetProblem<>(data, obj, subsetSize);
// create random descent search with single swap neighbourhood
RandomDescent<SubsetSolution> search = new RandomDescent<>(problem, new SingleSwapNeighbourhood());
// set maximum runtime (in seconds)
long timeLimit = ...
search.addStopCriterion(new MaxRuntime(timeLimit, TimeUnit.SECONDS));
// attach listener (see example 1A)
search.addSearchListener(new ProgressSearchListener());
// start search
search.start();
// print best solution and evaluation
System.out.println("Best solution (IDs): " + search.getBestSolution().getSelectedIDs());
System.out.println("Best solution (names): " + search.getBestSolution().getSelectedIDs()
.stream()
.map(data::getName)
.collect(Collectors.toSet()));
System.out.println("Best solution evaluation: " + search.getBestSolutionEvaluation());
// dispose search
search.dispose();
However, you may notice a slight variation in solution quality across repeated runs of this algorithm. This means that the search gets stuck in local optima of varying quality, i.e. this objective seems harder to optimize than the average pairwise distance objective from example 1A. More advanced algorithms can be applied to escape from such local optima.
An example of an advanced method is the parallel tempering algorithm, which concurrently runs several Metropolis searches with different temperatures. Higher temperatures correspond to higher probabilities to accept inferior moves. The subsearches swap solutions so that the best solutions are pushed to the coolest systems, for convergence, and the worst solutions are pushed to the hot systems, which offer a lot of freedom for further modifications. The parallel tempering algorithm is somewhat similar to simulated annealing but takes advantage of multi-core CPUs, as the subsearches can be executed in parallel. Also, it has a built-in multi-start behaviour.
When applying parallel tempering, the number of Metropolis replicas and the minimum and maximum temperature are specified. It is important to set appropriate temperatures, taking into account the evaluation range, as the probability to accept an inferior move depends on the ratio of the difference in evaluation (delta) and temperature.
// create parallel tempering search with single swap neighbourhood
double minTemp = 1e-8;
double maxTemp = 1e-4;
int numReplicas = 10;
ParallelTempering<SubsetSolution> search = new ParallelTempering<>(problem,
new SingleSwapNeighbourhood(),
numReplicas, minTemp, maxTemp);
// same as before ...
If you run this example you will notice that applying parallel tempering reduces variability across repeated search runs and, on average, yields solutions with a slightly higher value. Nevertheless, it is a more complex and therefore somewhat slower algorithm as compared to random descent, which also finds good solutions and has no additional parameters. So, in practice, both algorithms can be useful when dealing with this problem, depending on what is desired in terms of speed and solution quality.
The complete source code of this example is available on GitHub, including some additional code to read the input from a CSV file. You can also download a ZIP file that contains the Java sources of all examples, a compiled JAR (including all dependencies) as well as some input files for in case you want to run any of the examples. To run this example, execute
$ java -cp james-examples.jar org.jamesframework.examples.coresubset3.CoreSubset3 <inputfile> <subsetsize> <runtime>from the command line. The input file should be a CSV file in which the first row contains N item names and the subsequent N rows specify an N × N distance matrix. The runtime is given in seconds.