Questions and Answers

At the CERN School of Computing in Marathon there were presentations given on JAS (presentation), Root and LHC++/Lizard. Students were given the opportunity to ask questions to compare the different systems. The answers to these questions for JAS are given below.

Summary Questions

Provide summary information for each product stating the basic philosophy or approach to solving the analysis/reconstruction problem
Leverage the power of Java as much as possible because:

provides many of the facilities we need as standard.
Is easy to learn and well matched (in terms of complexity) to physics analysis
Is a mainstream language, so time spent learning it is well spent.
Is a high performance language (see my talk)

Is a highly productive language (no time wasted debugging core dumps).

Age of product (how long has it been in development)
4 years (since Hepvis 96)
Platforms supported
Windows (95/98/NT/2000), Linux, Solaris, or any platform with a Java VM.
Number of components and total number of lines of code produced.
As of JAS 2.2.1 -- 53894 lines of Java code. Note that this is less the 10% of the number of lines in Root, but not because JAS has 10% of the features of Root (in fact I think it has a broadly comparable feature set) but because large amounts of Root code replicate features already directly supported by Java (GUI, IO, reflection). Over time it is quite likely the number of lines will go down, as we are better able to use standard tools (e.g. XML instead of custom parsers). (I believe a significant fraction of root code deals with IO, yet I estimate, based on our initial very simple implementation of Root IO in Java, that the entire root IO package for reading and writing Root files (including random access, compression, trees, automatic splits, pointer following, StreamerInfo) can be implemented in <1000 lines of Java -- you will have to check back later to see if I am right).

I'm not quite sure how to enumerate "components", but as I showed in my talk the system is composed of (maybe 20?) highly modular subcomponents which can be used together or independently.

List of external packages used
Binary distributions of JAS are self contained and can be run "out-of-the-box" with no requirements (except Java itself). Internally JAS uses many other packages, such as JavaHelp, jEdit editor, XML parsers, but these are all freely redistributable and are included in the JAS distribution.
Number of FTEs involved in development
Approx 2 Full time + contributions from many others via collaborations (with Wired, LCD, Babar, FreeHEP) and via "open source" model.
List of experiments where the product is in active use
CLEO (online monitoring), Babar (online monitoring), LCD (reconstruction+analysis), µLAN (online monitoring), CMS and Atlas (test beam work, evaluation), SLD (mini-dst analysis)
Process by which decisions are made and feedback handled
Currently most decisions have been made by the developers, with feedback from people using JAS for specific experiments (particularly LCD, Babar). We have a mailing list and bug report page and encourage feedback and suggestions from anyone (negative feedback is very welcome, especially if accompanied by suggestions for improvements).
Interfacing of product with: Experiment software
Direct interface with C++ code is currently a weak point of Java, thus direct interface with C++ experiment code is currently difficult. We expect more and more experiments to adopt Java as the huge productivity benefits of using Java become more widely appreciated, meanwhile we are attempting to address this issue via the development of tools such as JACO, and plan to test this in the context of the Atlas event model. Interfacing with experiment software in Java (such as LCD) or via some intermediate storage format (e.g. PAW, Objectivity, ROOT) is comparatively straightforward.
Common HEP software packages (including G4)
We have interfaces with Root, Objectivity, PAW, WIRED, G4, StdHEP, and AIDA. Due to the simple "plugin" mechanism we expect to develop many more.
Existence within GRID context.
JAS has been designed from the outset to run in a "client-server" mode, and to support distributed data analysis. There are Java bindings to many
of the GRID components (e.g GLOBUS) and we expect that features of the GRID such as global authentification will be easy to interface to JAS. We believe that the model of moving the code to the data (rather than vice-versa) is most applicable to HEP data, and think Java is the best language for exploiting this due to its high performance and built-in network and code portability features.
Alternative products (e.g. JAS & ROOT)
The Data Interface Model and Plugin architecture used by JAS means that unlike other systems it is not tied to a particular data format and is easily able to inter-operate with other tools.
Alternative components (for GUI, data storage, fitting etc)
We have designed JAS so you can take individual components out and use them alone (many people are using our plot bean by itself, or in the
context of Java servlets). Using C++ components directly in Java GUI is not easy. We plan to add an interface to the LHC++ Gemini fitter, which will allow fitting using either Minuit or NAG fitters.
Explain product capacity for scaling in the following areas: more concurrent/distributed users access same dataset, access and processing of very large datasets
To some extent since the data access is "external" to JAS this question is not directly relevant. We believe the JAS data access model will scale to very large datasets, but this has not been tested extensively to date.
Use of scripting language with product

Initially JAS was designed to be operated by the GUI, and by writing Java programs. We have received many requests for scripting capabilities and have therefore started to implement this. We demonstrated the use of Beanshell with JAS during the CSC talk on JAS. Many other scripting languages for Java are available, including JPython - a complete implementation of Python in Java. Interfacing these to JAS is almost trivial. Although I rarely remember to talk about it, JAS histogramming can be used in a "batch" mode - with no GUI, and analysis can be written in Java, or in any Java scripting language.
Future directions of development: new features, major changes, short-term and long-term.
Near term we expect to:

Add more plotting features (eg 3d lego plots)
Add more powerful GUI features for N-Tuple analysis (interactive cuts etc), provide features for creating n-tuples (in memory or on disk) from object data
Improve newer plugins - e.g Root interface, scripting support
Add support for AIDA XML format for interoperability with Lizard, Open Scientist etc.

Longer term:

Exploit GRID facilities for distributed data analysis
Probably change to more AIDA like histogramming classes

More Summary Questions

To what extent does Xxx allow a user or experiment to choose their scripting language (e.g. Java, Python, CINT, etc)? Can an experiment choose more than one?
First, Java is NOT a scripting language. Scripting languages are designed differently from compiled languages such as Java, C++ and Fortran, and to use a compiled language as a scripting language or vice-versa would be unwise. Having said that Java does exhibit some of the advantages sometimes associated with scripting languages, such as very fast compile, load, run cycle (especially when using dynamic loading to load only your analysis routines, as in JAS).

We are currently adding support for scripting languages to JAS, we demoed beanshell as a scripting language during the talk. There are many other scripting languages available for Java, including JPython, a complete and very fast implementation of Python in Java. Any Java scripting language can be very easily used with JAS (or any Java program). There is no technical reason why an experiment should not use more than one.
How does Xxx work with non-native data storage? If an experiment defines its own storage system, can Xxx use it? Also, can ROOT/JAS work with HepODBMS/Objectivity? Can JAS/LHC++ work with ROOT files? ("Work" may not be the right word here - perhaps something like "What capabilities are lost when using .... data" is a better phrasing?)
JAS does not have a "native" data format, it can work with any data format for which a DIM exists. DIM's already exist for PAW, ROOT and Objectivity and many other formats, and it is fairly easy to create new DIMs for experiment specific data.

The more detailed question is harder to answer, the specifics depend mainly on how completely the DIM has been implemented. For example the current Objectivity DIM is only able to read HEPTuple data from objectivity databases. Objectivity does have a Java binding, so writing a more fully functioned interface is possible, although there are some complications arising when attempting to read data initially stored into Objectivity from C++, especially if no thought was given to Java access up front.
What will need to be developed in Xxx to handle the expected size of LHC data analysis? What are the current strengths and weaknesses of Xxx for storing very large amounts of data?
The strengths of JAS are in its ability to adapt to whatever data format is eventually decided upon, and to support access to very large datasets using its distributed client-server mode. There are some weaknesses in the current java.io package when dealing with large amounts of binary data, but these will be addressed by the addition of a new java.nio package in the next release of Java (JDK 1.4 scheduled for release next summer), after which there is no reason to expect Java IO will be any less efficient than C++ IO.
How does Xxx work with external software such as GEANT4? GEANT3? What can you do and not do via Xxx?
We demonstrated the use of JAS with Geant4 during the workshop. The Geant4 collaboration is considering a proposal to adopt the AIDA interface as a standard interface to histogramming in Geant4, meaning that it will be easy for Geant4 to interact with any AIDA compliant analysis tool.
If an experiment has an existing software package, how do you interface it, and how much its capability will be available via Xxx?
In principle, using a combination of plugins and DIM's you should be able to interface any experiment to JAS. In practice in depends how "Java Friendly" the experiment is (extensive use of C++ features such as templates tend to make it more difficult). Well designed, modular experiment software also helps. The person who builds the JAS interface will need to learn a fair bit about Java and JAS, but once that is done it should be easy for other collaborators to use the interface.
How does Xxx utilize large parallel farms for computation?
The "Client-server" model in JAS was designed to support distributed computing. It has not yet been tested with very large datasets on large farms, but that will hopefully be done in the coming year.
If I want to make an improvement in Xxx, how do I go about it?
JAS is an "Open Source" project. All of the source code is easily available and we use a Java version of make which allows you to build the system yourself, in the same way on any platform, using the simple instructions on our web site. (Building JAS from scratch takes less than one minute, and much less if you only need to recompile files you have changed). Any changes or additions you make are likely to be happily accepted back into the project.

In addition you can often extend JAS without having to learn the internals of the program by writing a plugin which adds the extra functionality you require.

Individual Questions

Could you show the min. COMPILED program to:

Create 100 1D histograms with range [0,1] histograms should have a title like "histo number:xxx
make a loop 100000 times with:
random selection of one of the 100 histogram
fill the selected histogram with a flat generator
save the histograms to a file
what is the file size in bytes?
what is the total RealTime and CpuTime to execute this program?

The program is given below. Note that JAS supports many different histogramming algorithms, but to make this test as comparable to Root and LHC++ as possible I have choosen to use HBOOK style "fixed" binning in this example. The question did not state how many bins should be used for the histograms, so the size of the histogram file is somewhat arbitrary (I took the JAS default of 50 bins for fixed binned histograms).

The size of the file was 12667 bytes, and the program took 891 ms to execute on my 300MHz PII laptop (probably dominated by startup time).

import hep.analysis.*;
import hep.analysis.partition.*;
import java.util.Random;

public class CSCTest
{
   private final static int NHISTS = 100;
   private final static int NITER = 100000;
   
   // This is designed to run as a standalone (Batch) job.
   public static void main(String[] args)
   {
      final int niter = args.length == 0 ? NITER : Integer.parseInt(args[0]);
      long start = System.currentTimeMillis();
      
      // Create a Job, which will own all the histograms
      Job job = new Job("CSCTest");
      Histogram[] hists = new Histogram[NHISTS];
   
      // Java's built in random number generator
      Random random = new Random(); 
      
      for (int i=0; i<NHISTS; i++)
      {
         hists[i] = new Histogram("histo number: "+i);
         // Force HBOOOK style fixed binning 
         // (by default JAS keeps all the points for rebinning)
         hists[i].setPartition(new FixedPartition(0,1));
      }
      for (int i=0; i<niter; i++)
      {
         int index = random.nextInt(NHISTS);
         hists[index].fill(random.nextDouble());
      }
      job.save(); // saves histograms (as CSCTest.javahist)
      
      // Print out timing info
      long end = System.currentTimeMillis();
      System.out.println("Elapsed time "+(end-start)+"ms");
   }
}

Could you show the minimum INTERACTIVE program to:

connect the file of histograms generated above
create a window with 2 zones (top and bottom)
fit histogram 10 with a straight line and draw it in top zone
fit histogram 20 with a straight line and draw it in bottom zone
generate a ps file corresponding to the window.

This is very easy to do with the JAS interactive GUI, just open the javahist file, create a new 1x2 plot page, display the desired histograms, and perform the fits. Currently JAS supports saving plots in XML or GIF format, but the next release of JAS will also support encapsulated postscript and other vector graphics formats (using the freehep org.freehep.graphics2d package) You can generate a PS file today by using "Print" and then selecting, "Print to File".

As explained above support for scripting languages such as beanshell is just being added to JAS, and so support is not yet complete, in particular fitting is not yet (easily) usable from the scripting language. This will be fixed very soon and you will then be able to use the following script.

Histograms.read("CSCTest.javahist");

h10 = Histograms.find("histo number: 10");
h20 = Histograms.find("histo number: 20");

fitter = new Fitter(); // default fitter
f10 = fitter.fit(h10,new StraightLineFunction());
f20 = fitter.fit(h20,new StraightLineFunction());

page = new Page();
// x,y,width,height (in percent)
page.add(new Plot(h10),0,0,100,50);
page.add(new Plot(h20),0,50,100,50);
page.show();
page.save("CSCTest.eps");