What is it? K-clustering groups samples that are most similar to each other. One cluster (or group) is formed around one centroid; the number of centroids are determined by the user.

When is it used? This test is performed to profile samples. You dictate how many clusters are made.

How does it work?

K-Clustering: Example

We analyze the protein profile of 1,000 proteins of 100 breast cancer patients using an antibody-based microarray. We believe that there are five sub-types of breast cancer based on cellular phenotypes. We want to determine whether the patients fall into their diagnosed sub-type.

  1. The protein expression for each protein across 8 patients is centered and then "scaled" by taking into account the mean and standard deviation, respectively, of the expression (Figure 1).
  2. We tell the software that we want 5 sub-types. The software picks five points on the plot called centroids (Figure 2).
  3. The Euclidean distance, or the closest distance between the centroid and the sample data, is calculated. Numerous iterations are performed to find the optimal centroid position and grouping.
  4. The samples are clustered into 5 final groups (Figure 3). This is the data that we care about.
Figure 1
Figure 1. Example of Centering and Scaling Data. A) Expression level of Protein "X" across two datasets are B) centered and C) scaled so that all datasets have a mean of 0 and a standard deviation of 1.
Figure 2. 2-D plot representing "reduced dimension" data and the clusters assigned in the first iteration. Each spot represents the reduced dimension data from one patient.
Figure 3
Figure 3. 2-D plot representing "reduced dimension" data and the clusters assigned in the last (optimal) iteration. Each spot represents the reduced dimension data from one patient.

What does the data look like? K-Clustering can result in a 2-D plot like Figure 3 and/or as a table that lists the clusters and the samples assigned to each cluster.