### Table of Contents

# Data structures

The SCaVis data containers are designed for scientific data analysis and are well suited for data manipulation, input/output and data representation using various canvaces. Unlike Data Collections, they are mainly designed for the data visualistaion purpose and high-level operations. If you are interested in high-speed numerical calculations, use Data Collections which typically over-perform these data structures (typically by a factor 5) and have very small memory footprint.

Examples are:

- P0D - (double) array in 1 dimension.
*High-performance collection* - P0I - (integer) array in 1 dimension.
*High-performance collection* - P1D - (double) array in two dimensions (X,Y). 2-level errors on X and Y (optional)
*High-performance collection* - P2D - array in 3D (X,Y,Z).
*High-performance collection* - P3D - array in 3D (X,Y,Z) with extension
- PND - array with double values (arbitrary dimension)
- PNI - array with integer values (arbitrary dimension)
- HStatData - keep and manipulate with time series (PRO edition)

# 1D-arrays. P0D and P0I classes

Such arrays are based on the Java class P0D. One can fill such arrays using the method “add” and display its content. The statistical summary can also be easily obtained:

The example below shows how to build such arrays using the Python syntax. We fill an array with 10 sequential numbers from 0 to 9 and then we convert it into a string (for printing). Finally, we evaluate a complete statistical summary using the “getStat” method:

from jhplot import * p0=P0D("test") for i in range(10): p0.add(i) print p0.toString()

This prints in the prompt:

Click to display ⇲

Click to hide ⇱

P0D test 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

One can view the data containers using several methods. One is “toString()” which converts data into a string. One can write data into a file (including a compression) using the method “toFile(file)”). One can also view data in a sortable table using the method “toTable()”, or calling the “HTable(obj)” directly, where “obj” is one of the containers discusses above. This works even for SCaVis histograms. For example, the executing this line after the above example is

HTable(p0)

brings up a table where data can be sorted and searched:

As for any SCaVis data object, one can write and read arrays into files using the method “toFile” and read using the method “read()”

p0.toFile("data.txt")

and read it back as:

p0.read("data.txt")

If a file was zipped use the method “readZip()”. Read more about IO in the Section io.

One can access various statistical characteristics of the P0D arrays as:

print p0.getStatString()

The output of this script is shown below:

Click to display ⇲

Click to hide ⇱

cern.hep.aida.bin.DynamicBin1D ------------- Size: 10 Sum: 45.0 SumOfSquares: 285.0 Min: 0.0 Max: 9.0 Mean: 4.5 RMS: 5.338539126015656 Variance: 9.166666666666666 Standard deviation: 3.0276503540974917 Standard error: 0.9574271077563381 Geometric mean: 0.0 Product: 0.0 Harmonic mean: 0.0 Sum of inversions: Infinity Skew: 0.0 Kurtosis: -1.5616363636363637 Sum of powers(3): 2025.0 Sum of powers(4): 15333.0 Sum of powers(5): 120825.0 Sum of powers(6): 978405.0 Moment(0,0): 1.0 Moment(1,0): 4.5 Moment(2,0): 28.5 Moment(3,0): 202.5 Moment(4,0): 1533.3 Moment(5,0): 12082.5 Moment(6,0): 97840.5 Moment(0,mean()): 1.0 Moment(1,mean()): 0.0 Moment(2,mean()): 8.25 Moment(3,mean()): 0.0 Moment(4,mean()): 120.8625 Moment(5,mean()): 0.0 Moment(6,mean()): 2079.515625 25%, 50%, 75% Quantiles: 2.25, 4.5, 6.75 quantileInverse(median): 0.55 Distinct elements: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] Frequencies: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

You can access all such characteristics using the method getStat() which return Java Map (or Jython dictionary) where the key identifies each statistical value.

Since it is useful to rebin 1D arrays using Histograms, consider using HPlotJas canvas which offers a GUI with a slider for on-fly rebinning. This is explained in more detail in the Section Interactive fit.

# 2D arrays. P1D class

2D arrays are based on the Java class `jhplot.P1D`

. This is one of the reachest data collections
that can be used to keep values, (X,Y, Err) values, where “Err” is an error on “Y”, and
any (X,Y) values with asymmetric errors on X, Y.

As before, one can fill such arrays using the method “add(x,y)”. It is one of the most advanced containers since data can contain up to 8 errors: 1st level (usually statistical) and 2nd level (usually systematic uncertainty). The dimension of this array can grow and shrink. If you need to keep only 2 values, X and Y, set the dimension of this object to 2 (which is anyway the default). But the dimension can also be up to 10, with 8 additional values necessary to set errors on X and Y.

This is also a high-performance container, which is faster than Jython list or java list. For example, you will need
2 Python list to keep X and Y. Instead, use a single `jhplot.P1D`

.
Read about the performance of this container in Section data_collections.

The dimensions of this array are below. Use the “setDimension(dimension)” to initialize the container:

data | dimension |
---|---|

X,Y | 2 (default) |

X,Y, errY-, errY+ (symmetric) | 3 |

X,Y, errY-, errY+ | 4 |

X,Y,errX-, errX+, errY-, errY+ | 6 |

X,Y,errX-, errX+, errY-, errY+ + errXsys-, errXsys+, errYsys-, errYsys+ | 10 |

Here are errY means a 1st level error (usually statistical), while errYsys means second-level errors (usually systematical).

`jhplot.P1D`

depends on its dimension.
If you keep only (X,Y) values with the default dimension 2, the memory footprint is exactly 5 times smaller than if you store (X,Y) and 6 errors on X and Y separately (1st and 2nd level).
For example, let us make a simple (X,Y) array:

from jhplot import * p=P1D("XY") p.add(x,y)

You can get arrays back as:

x=p.getArrayX() y=p.getArrayY()

The main advantage in using `jhplot.P1D`

in its ability to handle various operations together with error propagations.

The example below shows how to build such arrays using the Python syntax. We fill an array with 10 sequential numbers from 0 to 9 and then we convert it into a string (for printing). Finally, we evaluate a complete statistical summary using the “getStat” method:

1: from jhplot import * 2: from java.awt import Color 3: 4: p1=P1D("1st data set") 5: p1.add(1,2) 6: p1.add(2,3) 7: p1.add(4,5) 8: 9: p2=P1D("2nd data set") 10: p2.add(-1,3) 11: p2.add(5,-2) 12: p2.add(1,0) 13: p2.setColor(Color.red) 14: 15: c1 = HPlot("Canvas") 16: c1.visible() 17: c1.setAutoRange() 18: 19: c1.draw(p1) 20: c1.draw(p2)

The output of this script is shown here

The P1D data container can also show errors in X and Y for each data points, as well it has advanced mathematical operations with proper error propagation. For example,

p1.add(x,y,err) # fills X,Y and symmetric error on Y

where “err” is a statistical error on the Y value, assuming that yUpper=yLower. You can get values back as arrays using:

x=p1.getArrayX() y=p1.getArrayX() error=p1.getArrayErr()

If the error on Y is asymmetric, use this method:

p1.add(x,y,err_up, err_down)

where “err” and “err_down” are symmetric upper and lower error on Y.

The SCaVis contains advanced error propagation algorithms and can handle statistical errors (on X and Y) as well as systematic error (2nd level errors) Here is a small example which illustrate how to draw points with 1st-level (statistical) errors:

1: from jhplot import * 2: 3: c1 = HPlot("Canvas") 4: c1.visible() 5: c1.setAutoRange() 6: c1.setGTitle("Systematic errors") # put global title 7: 8: p1= P1D("Data") 9: p1.add(1,100,7,5) # x, y, error_UP, error_Down 10: p1.add(2,80,5,4) 11: p1.add(3,90,5,2) 12: c1.draw(p1)

In this code, errors on the y-axis are asymmetric (jut to show that this is possible). The “add()” method has many variations, so one can assign errors for x-axis, y-axis (plus 2-level errors).

# 3D- arrays. P2D class

Analogously, one can plot data in 3D. Use P2D class to add values and plot them.

1: from jhplot import * 2: 3: c1 = HPlot3D("Canvas") 4: c1.visible() 5: c1.setAutoRange() 6: 7: p1=P2D("3D") 8: p1.add(1,2,3) 9: p1.add(2,1,3) 10: p1.add(3,2,0) 11: c1.draw(p1)

Here is more advanced example:

1: # Canvas3D example. Show data points using 2 pads. 2: 3: from java.awt import Color 4: from jhplot import * 5: 6: d1 = P2D("data1") 7: d2 = P2D("data2") 8: d3 = P2D("data3") 9: d1.setSymbolColor(Color.red) 10: d2.setSymbolColor(Color.blue) 11: d3.setSymbolSize(2) 12: for i in range(0,10): 13: for j in range(0,10): 14: d1.add(i,j,0.5) 15: d2.add(i,j,0.6) 16: 17: for i in range(0,50): 18: for j in range(0,50): 19: d3.add(0.2*i,0.2*j,0.9) 20: 21: c1 = HPlot3D("plot",600,700,1,2); 22: c1.setRange(0.0,10,0.0,10,0,2) 23: c1.visible() 24: 25: c1.cd(1,1) 26: c1.draw(d1) 27: c1.draw(d2) 28: 29: c1.cd(1,2) 30: c1.setRange(0.0,10,0.0,10,0.2,1.0) 31: c1.draw(d1) 32: c1.draw(d2) 33: c1.draw(d3)

Which generates the output:

# Multi-dimensional arrays

Let us assume that we have a matrix of numbers organized as

# this is a multi-dimensional data 1 2 3 4 5 6 7 8 .......

(the numbers of rows and columns can be arbitrary). We can load and work with this data using the PND class. A first step is to read the data into a SCaVis data container designed to keep such data and do some manipulation. Our preference is to read a data from a prepared file located on the Web:

from jhplot import * pn=PND('data','/scavis/examples/data/pnd.d') print pn.toString()

Here we create a PND object from the file “pnd.d” stored on the Web and print it for checking. The file has exactly the same structure as shown before, i.e. each row is separated by a new line. From now on, we use the Python syntax to print a string returned by the method “toString()”. Alternatively, one can use “pn.toTable()” method to display all numbers in a sortable and searchable table. You will see the numbers printed out in the Jython shell (which is used for output of the print command).

Let us continue with the analysis of our data. First thing we want to do is to extract the numbers from the 2nd column and display Assuming that the “pn” object is created as shown before, we will extract the second column using the index 1 (the first column has the index 0)

p0=pn.getP0D(1) # extract 2nd column and put to a 1D array print p0.getStat() # print a detailed statistical characteristics c1=HPlot('Plot') # create a canvas to display a histogram c1.visible(); c1.setAutoRange() # set auto-range h1=p0.getH1D(10) # convert 1D array into a histogram with 10 bins c1.draw(h1) # draw the histogram

The next step in our analysis is to extract the 2 columns and to make a X-Y scatter plot in order find a correlation between the numbers from these columns. In the example below we extract the 2nd and 3rd column, plot them on X-Y canvas and then perform a least-squared linear regression:

from jhplot.stat import * p1=pn.getP1D(1,2) # extract 2nd and 3rd columns c1=HPlot('X-Y plot') c1.visible(); c1.setAutoRange() # set autorange c1.draw(p1)

This code should follow after the code which creates the object “pn” as discussed before. The execution of this code makes a X-Y graph with the values of the 2nd and 3rd columns

# Information for advanced users

If you are a registered SCaVis user, please go to this link: Data structures for advanced users

click here if you want to know more

click here if you want to know more

A complete description of how to use Java, Jython and SCaVis for scientific analysis is described in the book Scientific data analysis using Jython and Java published by Springer Verlag, London, 2010 (by S.V.Chekanov)