+<p><b>PART A: </b>(for simplicity, we will use 2 dimensions, the real problems typically involve a larger number of dimensions). More concretely: create a chare array called Points of N chares, each with M/N data points. The constructors initialize each data point with random X and Y coordinates, (0 <= X,Y < 1.0). The main chare generates K random points as an initial guess for centroids of the K clusters, and broadcasts them (as an array of x-y pairs) to the Points chare array, to an entry method called Assign. This method decides, for each point it owns, which centroid it is closest to. It then contributes into 2 reductions: one a sum of points it added to each cluster (so an integer array of size K) and another a sum of X and Y coordinates for each cluster (so, an array of 2K doubles). The target of the reductions are the UpdateCounts and UpdateCoords methods in the main chare. When both reductions are complete the main chare updates the centroids of each cluster (simply calclulate, for each of the K clusters, the sum of X coordinates i’th cluster divided by the count of points assigned to i’th cluster, and similarly for Y.) The algorithm then repeats the Assign (via broadcast) and Update steps, until the assignment of points to cluster remains unchanged. We will approximate this by calculating (in the main chare) the changes to any centroid, and when no centroid coordinate changes beyond a small threshold T (say 0.001), we will assume the algorithm has converged.</p>
0 commit comments