Data Clustering Characteristics Types and Algorithms Discussion Replies

Discussion 1:1. Characteristics of data
The characteristics of data that can affect cluster analysis are high dimensionality, size of
the data set, sparseness, noise and outliers that can negatively affect the performance of the
algorithm, the mathematical properties of the data space, the scale on which different variables
are measured, and the type of attributes of the data set, whether they are categorical or
continuous (Belde, 2018).
2. Compare the difference in each of the following clustering types: prototype-based,
density-based, graph-based.
The purpose of clustering is to put together data that have similar parameters to create
groups where the attributes within them are alike and different from the objects in other groups.
In prototype-based clustering, each observation is assigned to the most representative point of the
cluster, either a centroid for numerical data or medoid for categorical variables. In density-based
clustering, the idea is that a cluster is a formation of high point density, separated from other
clusters by a region of low density. Lastly, in graph-based clusters, two points are related only if
they are within a specified distance of each other. Each element to be clustered is represented as
a node and the distance between two objects is modeled by a certain weight on linking the nodes
(Belde, 2018).
3. Scalable clustering algorithm
In clustering, scalability means that as the number of data attributes in a data set
increases, the time required for performing the clustering computation should also scale to the
order of complexity of the algorithm. If a clustering algorithm requires too much memory or too
much time to perform, it might not be entirely useful for the analysis. To keep the algorithm
efficient, there are different techniques to reduce the storage and the computation amount
required by a clustering algorithm. Depending on the set of variables, the data engineer can use
methods such as sampling, summarization, data objects partitioning, bounds on proximity,
distributing the computation between several processors for parallel processing or applying
multidimensional or spatial access methods for finding the closest centroid, the nearest neighbor
of a point, or all points within a specified distance.
4. Choosing the right algorithm
Some of the factors that need to be considered when selecting the right algorithm are the
type of clustering that the application requires, the characteristics of the clusters such as shape,
size, or density, the features of the variables, the number of data objects, noise and outliers (Tan
et al., 2019). These elements impact the objective of the analysis and influence the speed,
performance, and accuracy of the algorithm.
References
Belde, G. (2018, september 2021). Types of clustering – definitions, formations and limitations.
Medium. https://medium.com/@ganesh.b1114/types-of-clustering-definitionsformations-and-limitations-de709755c940
Tan, P.-N., Steinbach, M., Karpatne, A., Kumar, V. (2019). Introduction to data mining (2nd
ed.). Pearson.
Discussion 2:
Question 1
There are various data characteristics as discussed. Firstly, it should be precise to enhance
accuracy. Precision enables users to save time when conducting research and other activities such
as; data analytics. Additionally, users spend less resources to process precise data; thus, saving
money and institutional resources. Data should adhere to compliance standards by being relevant
to user requirements. Reliability and consistency are data characteristics that ensures information
is appropriate for use. Data relevance is a feature that enhances information quality, whilst
promoting user value and comprehension during pertinent functional processes. Furthermore,
quality data enhances easy accessibility and future data processing without challenges (Sharma,
2019).
Question 2
Prototype-related clusters are center-based and are typically globular. When employing
numerical data there is an average of cluster points known as centroids. Categorical features entail
medoids that represent cluster points. Unlike prototype clusters, density clusters are irregular, and
intertwined because of outliers that are present when there is noise. Consequently, there is
omission of low-density points in classified noisy regions. Graph-associated clusters differ from
the above two cluster types because object connection applies within specific object’s distance.
Graph clusters are inefficient because of no data noise, where small bridges merge two clear-cut
clusters. Additionally, graph clusters integrate agglomerative hierarchical and clique clustering
techniques, unlike prototype and density-related clusters (Belde, 2018).
Question 3
Scalable clustering algorithm entails recognizing homogenized data groups based on size
or profiles. The above process involves distance metrics where data points existing in different
partitions indicate similarity; therefore, reducing clustering algorithm complexity. Consequently,
scalability involves adding more cluster processors, and adding extra aggregate interconnection
bandwidth without interfering with the symmetric multiprocessing programs.
Question 4
There are various considerations that enable users to make right decisions when choosing
an appropriate algorithm. For instance, the size of relevant training data to enhance reliable
predictions, increased observations from constrained data availability, massive features like;
genetics, and textual data. The algorithm choice should have low variance such as; linear
regression. Other considerations include; accuracy, output interpretability, training period, and
linearity.
References
Belde, G. (2018). Types of Clustering — Definitions, Formations and Limitations!!. Retrieved 13
July
2021,
from https://medium.com/@ganesh.b1114/types-of-clustering-definitions-
formations-and-limitations-de709755c940.
Sharma, J. (2019). What Is Data And Its Characteristics?. Retrieved 13 July 2021,
from http://www.rebellionrider.com/data-definition-and-characteristics/.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
Still stressed with your coursework?
Get quality coursework help from an expert!