Info

In this example the random descent metaheuristic is applied to sample a good core subset. Don't know what random descent is? Read this.

This example demonstrates the implementation of a simple core subset selection problem in JAMES. A distance matrix is given, which expresses the dissimilarity between each pair of items as a value in [0,1]. The goal is to select a subset of items so that the average distance between all pairs of selected items is maximized. Every item also has a name. Therefore, the input consists of the combination of an N × N distance matrix and an array of N item names (in corresponding order).

First create a data class CoreSubsetData by implementing the IntegerIdentifiedData interface. The data contains the distance matrix and item names, and every item is assigned a unique integer ID. Here the IDs correspond to the respective indices in the distance matrix and name array. The IntegerIdentifiedData interface has a single required method getIDs() which returns the set of all IDs that have been assigned to the items. The framework needs to know these IDs so that any subset selection problem can be translated into the generic problem of selecting a subset of IDs.

public class CoreSubsetData implements IntegerIdentifiedData {

    // item names
    private String[] names;
    // distance matrix
    private double[][] dist;
    // IDs
    private Set<Integer> ids;

    public CoreSubsetData(String[] names, double[][] dist){
        // store data
        this.names = names;
        this.dist = dist;
        // infer IDs: 0..N-1 in case of N items
        // (indices in distance matrix and name array)
        ids = new HashSet<>();
        for(int id=0; id<names.length; id++){
            ids.add(id);
        }
    }

    public Set<Integer> getIDs() {
        return ids;
    }

}

Now add two custom methods to access the underlying data based on IDs of selected items. The method getName(int id) returns the name of an item with a given ID and getDistance(int id1, int id2) yields the distance between two items.

public String getName(int id){
    return names[id];
}

public double getDistance(int id1, int id2){
    return dist[id1][id2];
}

Now create a class CoreSubsetObjective that implements the Objective interface. This interface has two type parameters: the solution and data type. Set the solution type to SubsetSolution (as for all subset selection problems) and the data type to CoreSubsetData (specifically for this example). The Objective interface requires to implement two methods:

  1. Evaluation evaluate(SubsetSolution, CoreSubsetData): evaluates a subset solution given our core subset data. The solution indicates the IDs of the selected items based on which the necessary data can be retrieved. In this example, the average distance between all selected items is computed. The value is fixed to 0.0 if less than 2 items are selected (no distances).
  2. boolean isMinimizing(): indicates whether evaluations are to be minimized or maximized, i.e. whether lower values or higher values are better, respectively. In this example, false is returned as the average distance is to be maximized.
The evaluate(...) method returns an object of type Evaluation (interface) that can be converted into a double value by calling getValue(). A predefined implementation of a SimpleEvaluation is provided that merely wraps a double value. For this example we can just compute the average distance as a double value and wrap it in such simple evaluation object when returning. Example 1B and example 1C further explain why evaluate(...) returns an evaluation object and not just a plain double value, and how this can be used to implement efficient delta evaluations. For now, we will keep things simple.

public class CoreSubsetObjective implements Objective<SubsetSolution, CoreSubsetData>{

    public Evaluation evaluate(SubsetSolution solution, CoreSubsetData data) {
        double value = 0.0;
        if(solution.getNumSelectedIDs() >= 2){
            // at least two items selected: compute average distance
            int numDist = 0;
            double sumDist = 0.0;
            Integer[] selected = new Integer[solution.getNumSelectedIDs()];
            solution.getSelectedIDs().toArray(selected);
            for(int i=0; i<selected.length; i++){
                for(int j=i+1; j<selected.length; j++){
                    sumDist += data.getDistance(selected[i], selected[j]);
                    numDist++;
                }
            }
            value = sumDist/numDist;
        }
        return SimpleEvaluation.WITH_VALUE(value);
    }

    public boolean isMinimizing() {
        return false;
    }

}

We are now ready to apply an optimization algorithm to sample a good core subset. Here we will use the random descent algorithm, starting from a random initial solution (default) with the predefined SingleSwapNeighbourhood. This neighbourhood generates moves that swap a single selected and unselected ID. The data and objective are combined in a SubsetProblem with data type CoreSubsetData. When creating the problem, the desired subset size is also specified. The problem is then passed to the search to find a good solution. Note that searches are parameterized on the solution type of the problem that is being solved, which should be set to SubsetSolution for this example.

A maximum runtime is set before starting the search. Alternatively, a different stop criterion could be used such as a maximum number of steps or a maximum time without finding any improvement. Calling search.start() blocks until the search has terminated, after which the best found solution and its evaluation are printed to the console. Here, the data is used again to map selected IDs to item names. Finally, the search is disposed so that all resources are released.

// set name array and distance matrix (e.g. read from a file)
String[] names = ...
double[][] dist = ...
// create data object
CoreSubsetData data = new CoreSubsetData(names, dist);
// create objective
CoreSubsetObjective obj = new CoreSubsetObjective();

// combine data and objective in a subset problem, specify desired subset size
int subsetSize = ...
SubsetProblem<CoreSubsetData> problem = new SubsetProblem<>(data, obj, subsetSize);

// create random descent search with single swap neighbourhood
RandomDescent<SubsetSolution> search = new RandomDescent<>(problem, new SingleSwapNeighbourhood());
// set maximum runtime (in seconds)
long timeLimit = ...
search.addStopCriterion(new MaxRuntime(timeLimit, TimeUnit.SECONDS));

// start search
search.start();

// print best solution and evaluation
System.out.println("Best solution (IDs): " + search.getBestSolution().getSelectedIDs());
System.out.println("Best solution (names): " + search.getBestSolution().getSelectedIDs()
                                                                       .stream()
                                                                       .map(data::getName)
                                                                       .collect(Collectors.toSet()));
System.out.println("Best solution evaluation: " + search.getBestSolutionEvaluation());

// dispose search
search.dispose();

It is possible to attach one or more search listeners to a search engine. These are informed by the algorithm when certain events occur. For example, an event is fired when the search has started, stopped or found a new best solution. We now create a ProgressSearchListener that prints messages to the console to stay informed about the progress of a search.

public class ProgressSearchListener implements SearchListener<Solution> {

    public void searchStarted(Search search) {
        System.out.println(" >>> Search started");
    }

    public void searchStopped(Search search) {
        System.out.println(" >>> Search stopped (" + search.getRuntime()/1000 + " sec, " + search.getSteps() + "  steps)");
    }

    public void newBestSolution(Search search,
                                Solution newBestSolution,
                                Evaluation newBestSolutionEvaluation,
                                Validation newBestSolutionValidation) {
        System.out.println(" >>> New best solution: " + newBestSolutionEvaluation);
    }

}

Search listeners are parameterized on the solution type of the searches to which they can listen. However, the solution type of a listener is allowed to be more general than the solution type of the search (i.e. a super class or interface). As our progress listener does not perform any solution type specific actions, its solution type is set to the top-level class Solution. It can thus be used to track the progress of any search with any solution type as these are all required to be subclasses of Solution.

Good to know

All methods in the SearchListener interface have a default empty implementation. Listeners can therefore simply ignore any callbacks for which no action is to be taken.

To track the progress of the applied random descent algorithm, simply add the listener before starting the search.

search.addSearchListener(new ProgressSearchListener());

The complete source code of this example is available on GitHub, including some additional code to read the input from a CSV file. You can also download a ZIP file that contains the Java sources of all examples, a compiled JAR (including all dependencies) as well as some input files for in case you want to run any of the examples. To run this example, execute

$ java -cp james-examples.jar org.jamesframework.examples.coresubset.CoreSubset <inputfile> <subsetsize> <runtime>

from the command line. The input file should be a CSV file in which the first row contains N item names and the subsequent N rows specify an N × N distance matrix. The runtime is given in seconds.

Example input file (275 items)