Cluster Analysis in Supply Chain Optimization
November 2nd, 2010 8:28 am Category: Optimization, by: Joe Litko
Cluster Analysis (CA) is a versatile tool for grouping objects based on their similarity to each other. It is applied in a wide variety of situations. Some example applications are similarity of faces, countries, and species of dogs. One starts by defining attributes of the individuals (countries, faces, species…) that can be assigned a numerical value, e.g. length of nose, separation of eyes. One of several available metrics is used to define the ‘distance’ between the individuals. Individuals are then assigned to a cluster based on the shortest distance. CA produces some appealing results and gives useful insights that spark further investigation even in the exotic applications.
In the relatively straightforward context of grouping locations, cluster analysis performs equally well. This makes CA a useful tool in supply chain optimization. Designing a distribution network often involves planning of routes over regions or deciding on locations for warehouses. One could look at a map covered with dots representing locations and draw circles around groups. But this would be an arbitrary approach. CA offers a way to group locations in a systematic way and speeds up the process of exploring several different versions of the clusters. I am going to present a few examples and describe the mechanics of the CA process.
In the examples that follow we have a set of 400+ locations from an area surrounding Denver Colorado. The latitude and longitude of the locations are the attributes that will define the similarity of locations. In this case the distance between objects and clusters really is a distance. In CA you can generally use Euclidean distance or XY distance – as though you were on a sidewalk in a city. Which to use depends on the situation. If looking at a downtown area, the XY distance might be best. But over a broader area the Euclidean or Great Circle distance makes more sense.
There are really two broad categories of clustering – hierarchical and non-hierarchical. The hierarchical methods begin by considering all the individuals (locations) as separate groups. The nearest two are joined and now we have one less group. The location of that group can be defined in a few different ways. For now picture it as the centroid of the group members – which could be weighted by something like demand for a product. The next join depends on the remaining distances between members or between a member and the existing clusters. In either case we end up with one less group. CA keeps joining until there is one large group. This is agglomerative or hierarchical clustering.
Along the way there is a point at which we have 434, 433… 5, 4, 3, 2, 1 groups. For any given number of groups we can look at the distance between the groups and the distance of group members from the centroid of their own group. One could look for a small number of clusters where the distances within a cluster are small compared to the distances between clusters.
Most of the options within hierarchical clustering are based on the way we define the distance between an individual location and a cluster and the distance from one cluster to another. One example is nearest neighbor. In this case the distance of an individual to a cluster is the distance to the closest existing location in the cluster. Farthest neighbor is also a possibility. Neither of these is well-suited to most supply chain applications. The typical method is to use the centroid as the cluster location for defining distances. Let’s take a look at three methods, average distance, farthest neighbor, and Ward’s method (which will be described soon).
Figure 1 – Clustering based on farthest neighbor (Complete Clustering)
In the complete method an individual or cluster is joined to another cluster based on the smallest maximum distance between any two individual locations. Figure 1 above is what it produces. If the groups were to be used to define clusters for delivery routes one problem might be the unequal sizes of the five groups that were created.
The average distance method joins individuals or clusters to another cluster based on the smallest average distance to the members of a cluster. This gives another result with unequal cluster membership. A method based on the median distance gives a result very similar to the complete method using the sample data, though it is less sensitive to outlying locations that are far from all others.
If reasonably equal group sizes are important, then another method called Ward’s method is one of the best choices. The results using Ward’s method are shown below. Ward’s method makes assignments that minimize the within clusters deviations from the centroid of the cluster. What you get depends on how many clusters you ask for. You can compare the results of five and six clusters.
Figure 2 – Clustering based on average distance to the existing cluster members.
Figure 3 – Ward’s Method with five clusters requested.
Figure 4 – Ward’s Method with six clusters requested.
Another way to look at the results is with a type of graph called a dendogram. The dendogram gives a visual look at the way similarity of clusters changes as individuals are merged into the clusters. The dendogram is shown in Figure 5 below. The vertical scale is just a measure of similarity. It is linear and is often just a normalized scale from zero to one hundred.
What you take away from the dendogram is this: as fewer clusters are formed, the similarity of members within a cluster decreases. Picture a horizontal line that cuts across the figure where it will intersect just six of the vertical dendogram lines. Notice that with just a small change in the vertical scale we could cross four or five vertical lines of the dendogram. That implies there is not much change going from four to six clusters in terms of the nearness of individuals within the clusters. They are still very similar (close). Things change much more rapidly going below four clusters. Of course, at the bottom when there are two hundred clusters, members of each cluster are very similar (read close) to each other.
The last method to look at is different than hierarchical clustering. It is called K Means clustering. It is not a hierarchical method so a dendogram is not really possible – joining does not occur sequentially. Without too many details, K Means clustering works like this. We start with an initial division into a certain number of clusters – let’s say six. The initial division should be done in some sensible way, e.g. Ward’s method.
Figure 5 – Dendogram based on Ward’s method.
K-Means clustering keeps examining all the observations to see if they are closer to the centroid of some group than to the group they are currently in. If so, they are moved and the centroids of both affected groups are recalculated. The method continues until no more improving moves can be made. It is important to start with a decent initial grouping. The result of K-Means clustering using Ward’s method as a starting point is shown in Figure 6.
Figure 6 – K Means Clustering with an initial group.
If we had started without an initial group the result would be a bit different. It would look like Figure 7.
Figure 7 – K Means Clustering without an initial group.
This is somewhat different than the result with an initial group.
Bottom line recommendations are these. Use K-Means clustering with Ward’s method as an initial seed. If K-Means clustering is not available, use Ward’s method. If you use a hierarchical method take a look at the dendogram as a way to decide on a number of clusters. You can reduce the number of clusters until the reduction of similarity is large. Looking at the scattergram of the points for various numbers of clusters should confirm the dendogram information. Methods like the farthest or nearest neighbor will very likely give poor results. All methods are affected by outliers to some degree. Consider managing a few far out points by hand. Minitab is a very good software product for doing cluster analysis.
CA is far from a mathematical curiosity. It can be a very useful tool for network or facility location analysis. It certainly lets the analyst explore more solutions than could be done manually. Rather than some arbitrary decisions on grouping, CA can contribute the analysis that leads to bottom line savings.