November 21, 2018 | 0 Feedbacks on “K-Clustering” | Biostatistics & Bioinformatics | Brianne
K-Clustering
What is it? K-clustering groups samples that are most similar to each other. One cluster (or group) is formed around one centroid; the number of centroids are determined by the user.
When is it used? This test is performed to profile samples. You dictate how many clusters are made.
How does it work?
K-Clustering: Example
We analyze the protein profile of 1,000 proteins of 100 breast cancer patients using an antibody-based microarray. We believe that there are five sub-types of breast cancer based on cellular phenotypes. We want to determine whether the patients fall into their diagnosed sub-type.
- The protein expression for each protein across 8 patients is centered and then "scaled" by taking into account the mean and standard deviation, respectively, of the expression (Figure 1).
- We tell the software that we want 5 sub-types. The software picks five points on the plot called centroids (Figure 2).
- The Euclidean distance, or the closest distance between the centroid and the sample data, is calculated. Numerous iterations are performed to find the optimal centroid position and grouping.
- The samples are clustered into 5 final groups (Figure 3). This is the data that we care about.



What does the data look like? K-Clustering can result in a 2-D plot like Figure 3 and/or as a table that lists the clusters and the samples assigned to each cluster.