K Means with KUDA

I m working on a KUDA exercise and need the explanation and answer to help me learn. It uses C/C++

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The goal of this assignment is exposure to GPU programming. You will solve k-means, using CUDA and Thrust.

I have uplo

9/24/23, 10:28 AM
cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/gpu-kmeans-submission-cs380p.html
CS380P: Parallel Systems
Lab #2: KMeans with CUDA
Submission Guide
In this lab, you will experiment with different implementations of the Kmeans algorithm.
Among the files you have to tar and submit, one file required by our autograder is the submit file. It does not
have an extension (e.g. .txt), so just call it submit
This file contains instructions on how to compile each variant of your code, where the binary will be located and,
if your code requires, any extra arguments
A template submit file is provided. Please read the instructions carefully because the autograder relies on this
submit file to grade your implementations.
Do not rename the submit file.
As an example, the Sequential part of your submit file could look something like this, assuming you are
compiling the binaries in a folder called bin and you are going to have one binary per implementation (one for
cpu, one for CUDA, etc.):
[Sequential]
How_To_Compile: gcc main.c kmeans_cpu.c -O2 -o bin/kmeans_cpu
Executable: bin/kmeans_cpu
Extra_Args:
You could also have just one application with all implementations, and switch which should execute using
arguments (which you should implement yourself, feel free to reuse code from lab 1), for example:
[Sequential]
How_To_Compile: nvcc main.cpp kmeans_cpu.cpp kmeans_cuda.cu kmeans_thrust.cpp -O2 -o kmeans
Executable: kmeans
Extra_Args: –use_cpu
1. Inputs
In this lab, you can start to test your Kmeans implementations with the following 3 sets of multi-dimensional
points. The dimensions of the points in each set are known. However, your program should accept the dimension
of points as one of the command line arguments.
random-n2048-d16-c16.txt
random-n16384-d24-c16.txt
random-n65536-d32-c16.txt
Note that the autograder will use different inputs (different dims and number of points) to test your program.
However, sample solutions for the above inputs can be found at the links below. Note that concerns like floating
point associativity will mean that your answers can be different, so checking correctness requires verifying that
https://www.cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/gpu-kmeans-submission-cs380p.html
1/6
9/24/23, 10:28 AM
cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/gpu-kmeans-submission-cs380p.html
answers are within a small epsilon (e.g. 10^-4) in each dimension of each point. Also, your answer need not
produce the same order of centers or IDs, so again, make sure your own correctness checking does not assume a
particular order of centers in the output. These outputs were generated with the seed: -s 8675309 (see the
section about Random Initial Centroids Generation below).
random-n2048-d16-c16-answer.txt
random-n16384-d24-c16-answer.txt
random-n65536-d32-c16-answer.txt
Students are encouraged to share newly generated inputs and their respective solutions, calculated by their
implementation of kmeans, on Piazza for other students to compare. Just don’t share code.
1.1 CmdLine Arguments
For all your implementations, your program should at least accept the following arguments
-k num_cluster: an integer specifying the number of clusters
-d dims: an integer specifying the dimension of the points
-i inputfilename: a string specifying the input filename
-m max_num_iter: an integer specifying the maximum number of iterations
-t threshold: a double specifying the threshold for convergence test.
-c: a flag to control the output of your program. If -c is specified, your program should output the
centroids of all clusters. If -c is not specified, your program should output the labels of all points. See
details below.
-s seed: an integer specifying the seed for rand(). This is used by the autograder to simplify the
correctness checking process. See details below.
Your program can also accept more arguments to control the behavior of your program for different
implementations. These extra arguments can be specified in the submit file. Refer to the instruction in submit
file for more details.
-k, -d, and -i should depend on the input files. The max number of iterations -m is used to prevent your program
from an infinite loop if something goes wrong. In general, your implementation is expected to converge within
150 iterations. Therefore, an value of 150 to -m should be good enough. Depending on your methods for
convergence test, you might want to use different thresholds -t for different implementations. As a reference, the
threshold for comparing centers without any non-determinism issues can be as small as 10^-10. However, for
comparing centers with non-determinism issues, you might want to use a threshold of 10^-6. The autograder will
specify -t 10^-5 for all implementations.
2. Output
For each input, each implementation is asked to classify the points into k clusters. You should measure the
elapsed time and total number of iterations for Kmeans to converge. The averaged elapsed time per iteration and
the number of iterations to converge should be written to STDOUT in the following form
printf(“%d,%lf\n”, iter_to_converge, time_per_iter_in_ms)
Note that the time should be measured in milliseconds. Part of your grade will be based on how fast your
implementation is.
https://www.cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/gpu-kmeans-submission-cs380p.html
2/6
9/24/23, 10:28 AM
cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/gpu-kmeans-submission-cs380p.html
2.1 Points Labels
If -c is not specified to your program, it needs to write points assignment, i.e. the final cluster id for each point,
to STDOUT in the following form
printf(“clusters:”)
for (int p=0; p < nPoints; p++) printf(" %d", clusterId_of_point[p]); The autograder will redirect your output to a temp file, which will be further processed to check correctness. 2.1.1 Correctness of Output To check the correctness of the output, the autograder will do the following 1. Compute the centroid of each cluster given by the points assignment. 2. For each point, select the cluster centroid that is closest to the point. 3. Compute the distance between the point and the selected centroid. Let the result be dis0. 4. Compute the distance between the point and the centroid of its assigned cluster. Let the result be dis1. 5. If the difference between dis0 and dis1 is larger than a threshold, this point is considered to be a wrong point. Your grade will be based on the total number of wrong points. 2.2 Centroids of All Clusters if -c is specified, your program should output the centroids of final clusters in the following form for (int clusterId = 0; clusterId < _k; clusterId ++){ printf("%d ", clusterId); for (int d = 0; d < _d; d++) printf("%lf ", centers[clusterId + d * _k]); printf("\n"); } Note that the first column is the cluster id. 2.2.1 Random Initial Centroids Generation At the beginning of the kmeans algorithm, k points should be randomly chosen as the initial set of centroids. Since the final set of centroids depends heavily on the initial set of centroids, the autograder will specify the seed for random number generation so that your final set of centroids will be compared with the set of centroids using the same initial set of centroids. Specifically, your program should use the seed provided by the cmdline argument to randomly generate k integer numbers between 0 and num_points and use these k integers as indices of points used as initial centroids. E.g. static unsigned long int next = 1; static unsigned long kmeans_rmax = 32767; int kmeans_rand() { next = next * 1103515245 + 12345; return (unsigned int)(next/65536) % (kmeans_rmax+1); } void kmeans_srand(unsigned int seed) { next = seed; } https://www.cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/gpu-kmeans-submission-cs380p.html 3/6 9/24/23, 10:28 AM cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/gpu-kmeans-submission-cs380p.html kmeans_srand(cmd_seed); // cmd_seed is a cmdline arg for (int i=0; i MAX_ITERS || converged(centroids, oldCentroids); } https://www.cs.utexas.edu/~rossbach/cs380p/lab/gpu-kmeans-cs380p.html 1/2 9/24/23, 10:27 AM CS380P Assignment For detailed instructions on how to write your code and submit it for evaluation, please see these instructions. You should include a report that answers the following questions. In cases where we ask you to explain performance behavior, it is fine to speculate, but be clear whether your observations are empirical or speculation. Report the GPU hardware details, CPU hardware details, and OS version on the machine where you did your measurements if you measured in any environment other than Codio; if you just used Codio for measurements, it's fine to just report that fact. Which of your implementations is fastest? Does it match your expectations of which should be fastest? Estimate the best-case performance speedup your CUDA implementations should have based on the number of threads in your program and the number of processing contexts actually supported by your hardware. How far of that prediction is your best-case performance? Which of the parallel implementations is slowest, and does it match your expectations? Why or why not? What fraction of the end-to-end runtime in your CUDA versions is spent in data transfer? How much time did you spend on the lab? https://www.cs.utexas.edu/~rossbach/cs380p/lab/gpu-kmeans-cs380p.html 2/2 9/24/23, 10:28 AM cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/submit ############################################################## # Submission Instructions # # # # Please follow these intructions so that the autograder # # can compile and run your program and check the results. # # # # For each implementation of kmeans (Sequential, Thrust, # # CUDA, Alternatives), you need to specify values to # # fields according to the following rules # # # # Compilation # # # # You need to specify how to compile your program and let # # the autograder know the name of the executable. These # # can be done using the following two fields # # # # How_To_Compile # # Executable # # # # Note: # # - Your program will be run with -d dims automatically. # # Make sure your program accepts -d dims as one of # # the command line arguements to specify the dimention # # of the points. # # - The provided commands will be run in the top level # # directory of your submission directory. If you # # have sub dirs inside your submission directory, # # make sure the commands work at the top level dir. # # - Make sure to specify a one-line command to compile # # your program. If more than one line is needed, put # # all commands in a Makefile and specify make here. # # # # Extra Flags # # # # By defaul, your program will be run with the following # # command line arguments # # # # -k nClusters -t thrshd -d dims -i iFile -m 200 -s seed # # # # If your implementation requires additional argument, you # # should specifiy the following field # # # # Extra_Args # # # # # # Implementation # # # # Do not delete any field under the section of your # # implementation. However, if you do not have any of # # the following implementations, you should delete # # the whole section corresponding to the unimplemented # # solution. # # # # Comments # # # # Anything after '#' through the rest of the line is # # comment, which is ignored by the autograder. # # # ############################################################## # # Specification for sequential implementation of kmeans # [Sequential] https://www.cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/submit 1/2 9/24/23, 10:28 AM cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/submit How_To_Compile: Executable: Extra_Args: # # Specification for GPU implementation of kmeans # using Thrust # [Thrust] How_To_Compile: Executable: Extra_Args: # # Specification for GPU implementation of kmeans # using CUDA # [CUDA basic] How_To_Compile: Executable: Extra_Args: # # Specification for GPU implementation of kmeans # using Shared Memory # [CUDA shared] How_To_Compile: Executable: Extra_Args: # # Specification for GPU implementation of kmeans # Alternatives # [Alternatives] How_To_Compile: Executable: Extra_Args: https://www.cs.utexas.edu/~rossbach/cs380p/lab/kmeans/workspace/submit 2/2

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER