Skip to content

Added kmeans exercise #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions content/exercises.html
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ <h2>!! Under Construction, the following is currently being updated and will inc
<li>Method 2</li>
<li>Method 3</li>
</ol>
<li><a href="k-means">K-Means Clustering</a></li>
<li><a href="oddevensort">Odd-even sort</a></li>
<li><a href="particle">Particle exercise</a></li>
</ol>
Expand Down
26 changes: 26 additions & 0 deletions content/exercises/k-means.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: Chare Arrays Design Exercise
homec: home
tutorialc: tutorial
applicationsc: applications
miniAppsc: miniApps
downloadc: download
toolsc: tools
helpc: help
---
<link rel="stylesheet" type="text/css" href="../../tutorial/TutorialStyle.css">

<h1>Chare Arrays: K-Means Clustering Exercise</h1>

<h2> K-Means Clustering:</h2>

<!--\comment{
I have a design problem for you. I want you to run your solution by me
(.ci file and pseudocode, for example) before implementing it. Or even
just the idea for the algorithm:
} -->
<h3>Give a collection M points in 2 dimensions, optimally classify them in K clusters using an iterative refinement algorithm, and output only the centroid of each cluster.</h3>
<p><b>PART A: </b>(for simplicity, we will use 2 dimensions, the real problems typically involve a larger number of dimensions). More concretely: create a chare array called Points of N chares, each with M/N data points. The constructors initialize each data point with random X and Y coordinates, (0 <= X,Y < 1.0). The main chare generates K random points as an initial guess for centroids of the K clusters, and broadcasts them (as an array of x-y pairs) to the Points chare array, to an entry method called Assign. This method decides, for each point it owns, which centroid it is closest to. It then contributes into 2 reductions: one a sum of points it added to each cluster (so an integer array of size K) and another a sum of X and Y coordinates for each cluster (so, an array of 2K doubles). The target of the reductions are the UpdateCounts and UpdateCoords methods in the main chare. When both reductions are complete the main chare updates the centroids of each cluster (simply calclulate, for each of the K clusters, the sum of X coordinates i’th cluster divided by the count of points assigned to i’th cluster, and similarly for Y.) The algorithm then repeats the Assign (via broadcast) and Update steps, until the assignment of points to cluster remains unchanged. We will approximate this by calculating (in the main chare) the changes to any centroid, and when no centroid coordinate changes beyond a small threshold T (say 0.001), we will assume the algorithm has converged.</p>
<p><b>Part B: </b>Reduce the number of reductions in each iteration from 2 to 1. Hint: there are 2 ways of doing this: first involves approximating the counts to be double precision numbers and the second involves writing a custom reduction).</p>
<p><b>Part C: </b>The “no change to centroids above a threshold” method above is an approximation. Implement a more accurate method by using an additional reduction of the number of points which have changed their “allegiance” (i.e. the cluster to which they belong in each iteration. If this number is 0, the algorithm has converged. Again, as in part B, reduce the number of reductions to just 1 if possible.</p>
<p><b>Part D: </b>Change the random number generation so that you create K sets, with each set centered around a distinct coordinate (purported centroid), and see if your program is able to retrieve the clusters correctly. Use any method to restrict points around a centroid to within the 0.0:1.0 box, along each dimension.</p>