You are a guest. Restricted access. Read more.

Data clustering

SCaVis contains a framework for clustering analysis, i.e. for non-supervised learning in which the classification process does not depend on a priory information. It includes the following algorithms:

  • K-means clustering analysis (single and multi pass)
  • C-means (fuzzy) algorithm
  • Agglomerative hierarchical clustering

All algorithms can be run in a fixed cluster mode and for a best estimate, i.e. when the number of clusters is not a priory given but is found after estimation of the cluster compactness. The data points can be defined in multidimensional space.

Data clustering is based on jMinHep package. You can run this in a completely stand-alone mode, without ScaVis. ScaVis integrates this Java program and enable Java scripting.

Using GUI

The easiest approach is to run a GUI editor to perform clustering. In the example below, we create several clusters in 3D and then passed the data holder to a GUI for clustering analysis:

from java.util import Random
from jminhep.cluster    import *
from jhplot import *
 
data = DataHolder("Build clusters")
r = Random()
for i in range(100):  # fill 3D data with Gaussian random numbers
      a =[]
      a.append( 10*r.nextGaussian() )
      a.append( 2*r.nextGaussian()+1 )
      a.append( 10*r.nextGaussian()+3 )
      data.add( DataPoint(a) )
c1=HCluster(data) # start jMinHEP GUI

This brings up a GUI editor which will run a selected algorithm:

Using Jython scripts

Alternatively, one can run any clustering algorithm in batch mode without GUI. You can use Java, or any scripting programming language.

We show below a code which creates a data sample in 3D and then runs several clustering algorithms in one go. You can optionally print positions of the clusters and membership of the data points. The following modes will be used:

  • K-means algorithm fixed cluster mode with single seed event
  • K-means algorithm for multiple iterations
  • K-means clustering using exchange method for best estimate
  • K-means clustering using exchange method
  • Hierarchical clustering algorithm
  • Hierarchical clustering algorithm, best estimate

You are not full member and have a limited access to this section. One can unlock this part after becoming a full member.

The output of the above script is shown below:

test 0
algorithm: kmeans algorithm fixed cluster mode with single seed event
Compactness: 1.98271418832
No of final clusters: 3
test= 1
algorithm: kmeans algorithm for multiple iterations
Compactness: 1.31227526642
No of final clusters: 3
test= 2
algorithm: K-means clustering using exchange method for best estimate
Compactness: 1.35529140568
No of final clusters: 5
test= 3
algorithm: K-means clustering using exchange method
Compactness: 1.35529140568
No of final clusters: 5
test= 4
algorithm: Hierarchical clustering algorithm
Compactness: 1.41987639705
No of final clusters: 5
test= 5
algorithm: Hierarchical clustering algorithm, best estimate
Compactness: 1.20134128248
No of final clusters: 6

You can print centers of clusters as:

Centers = pat.getCenters()
for i in range(Centers.getSize()):
                   g=Centers.getRow(i)
                   n=g.getDimension()
                   print i, " ",  g.getAttribute(0),g.getAttribute(1),g.getAttribute(2)

Where “Centers” are “DataHolder” container in the above example.

Read the book "Scientific data analysis using Jython scripting and Java for more details.

Sergei Chekanov 2010/03/07 16:37

Navigation

Print/export