Thursday, June 24, 2010

Java theory and practice: A brief history of garbage collection

Summary:  The Java language may be the most widely used programming language to rely on garbage collection, but it is by no means the first. Garbage collection has been an integral part of many programming languages, including Lisp, Smalltalk, Eiffel, Haskell, ML, Scheme, and Modula-3, and has been in use since the early 1960s. In this installment ofJava theory and practice, Brian Goetz describes the most common techniques for garbage collection. Over the next several months, he'll look at the garbage collection strategies employed by the 1.4 JVM, some performance implications of various garbage collection strategies, and how (as well as how not) to assist the garbage collector to yield better performance.





The benefits of garbage collection are indisputable -- increased reliability, decoupling of memory management from class interface design, and less developer time spent chasing memory management errors. The well-known problems of dangling pointers and memory leaks simply do not occur in Java programs. (Java programs can exhibit a form of memory leak, more accurately called unintentional object retention, but this is a different problem.) However, garbage collection is not without its costs -- among them performance impact, pauses, configuration complexity, and nondeterministic finalization.
An ideal garbage collection implementation would be totally invisible -- there would be no garbage collection pauses, no CPU time would be lost to garbage collection, the garbage collector wouldn't interact negatively with virtual memory or the cache, and the heap wouldn't need to be any larger than the residency (heap occupancy) of the application. Of course, there are no perfect garbage collectors, but garbage collectors have improved significantly over the past ten years.
The 1.3 JDK includes three different garbage collection strategies; the 1.4.1 JDK includes six, and over a dozen command-line options for configuring and tuning garbage collection. How do they differ? Why do we need so many?
The various garbage collection implementations use different strategies for identification and reclamation of unreachable objects, and they interact differently with the user program and scheduler. Different sorts of applications will have different requirements for garbage collection -- real-time applications will demand short and bounded-duration collection pauses, whereas enterprise applications may tolerate longer or less predictable pauses in favor of higher throughput.
There are several basic strategies for garbage collection: reference counting, mark-sweep, mark-compact, and copying. In addition, some algorithms can do their job incrementally (the entire heap need not be collected at once, resulting in shorter collection pauses), and some can run while the user program runs (concurrent collectors). Others must perform an entire collection at once while the user program is suspended (so-called stop-the-world collectors). Finally, there are hybrid collectors, such as the generational collector employed by the 1.2 and later JDKs, which use different collection algorithms on different areas of the heap.
When evaluating a garbage collection algorithm, we might consider any or all of the following criteria:
  • Pause time. Does the collector stop the world to perform collection? For how long? Can pauses be bounded in time?
  • Pause predictability. Can garbage collection pauses be scheduled at times that are convenient for the user program, rather than for the garbage collector?
  • CPU usage. What percentage of the total available CPU time is spent in garbage collection?
  • Memory footprint. Many garbage collection algorithms require dividing the heap into separate memory spaces, some of which may be inaccessible to the user program at certain times. This means that the actual size of the heap may be several times bigger than the maximum heap residency of the user program.
  • Virtual memory interaction. On systems with limited physical memory, a full garbage collection may fault nonresident pages into memory to examine them during the collection process. Because the cost of a page fault is high, it is desirable that a garbage collector properly manage locality of reference.
  • Cache interaction. Even on systems where the entire heap can fit into main memory, which is true of virtually all Java applications, garbage collection will often have the effect of flushing data used by the user program out of the cache, imposing a performance cost on the user program.
  • Effects on program locality. While some believe that the job of the garbage collector is simply to reclaim unreachable memory, others believe that the garbage collector should also attempt to improve the reference locality of the user program. Compacting and copying collectors relocate objects during collection, which has the potential to improve locality.
  • Compiler and runtime impact. Some garbage collection algorithms require significant cooperation from the compiler or runtime environment, such as updating reference counts whenever a pointer assignment is performed. This creates both work for the compiler, which must generate these bookkeeping instructions, and overhead for the runtime environment, which must execute these additional instructions. What is the performance impact of these requirements? Does it interfere with compile-time optimizations?
Regardless of the algorithm chosen, trends in hardware and software have made garbage collection far more practical. Empirical studies from the 1970s and 1980s show garbage collection consuming between 25 percent and 40 percent of the runtime in large Lisp programs. While garbage collection may not yet be totally invisible, it sure has come a long way.
The problem faced by all garbage collection algorithms is the same -- identify blocks of memory that have been dispensed by the allocator, but are unreachable by the user program. What do we mean by unreachable? Memory blocks can be reached in one of two ways -- if the user program holds a reference to that block in a root, or if there is a reference to that block held in another reachable block. In a Java program, a root is a reference to an object held in a static variable or in a local variable on an active stack frame. The set of reachable objects is the transitive closure of the root set under the points-to relation.
The most straightforward garbage collection strategy is reference counting. Reference counting is simple, but requires significant assistance from the compiler and imposes overhead on the mutator (the term for the user program, from the perspective of the garbage collector). Each object has an associated reference count -- the number of active references to that object. If an object's reference count is zero, it is garbage (unreachable from the user program) and can be recycled. Every time a pointer reference is modified, such as through an assignment statement, or when a reference goes out of scope, the compiler must generate code to update the referenced object's reference count. If an object's reference count goes to zero, the runtime can reclaim the block immediately (and decrement the reference counts of any blocks that the reclaimed block references), or place it on a queue for deferred collection.
Many ANSI C++ library classes, such as string, employ reference counting to provide the appearance of garbage collection. By overloading the assignment operator and exploiting the deterministic finalization provided by C++ scoping, C++ programs can use the string class as if it were garbage collected. Reference counting is simple, lends itself well to incremental collection, and the collection process tends to have good locality of reference, but it is rarely used in production garbage collectors for a number of reasons, such as its inability to reclaim unreachable cyclic structures (objects that reference each other directly or indirectly, like a circularly linked list or a tree that contains back-pointers to the parent node).
None of the standard garbage collectors in the JDK uses reference counting; instead, they all use some form of tracing collector. A tracing collector stops the world (although not necessarily for the entire duration of the collection) and starts tracing objects, starting at the root set and following references until all reachable objects have been examined. Roots can be found in program registers, in local (stack-based) variables in each thread's stack, and in static variables.
The most basic form of tracing collector, first proposed by Lisp inventor John McCarthy in 1960, is the mark-sweep collector, in which the world is stopped and the collector visits each live node, starting from the roots, and marks each node it visits. When there are no more references to follow, collection is complete, and then the heap is swept (that is, every object in the heap is examined), and any object not marked is reclaimed as garbage and returned to the free list. Figure 1 illustrates a heap prior to garbage collection; the shaded blocks are garbage because they are unreachable by the user program:

Figure 1. Reachable and unreachable objects
Reachable and unreachable objects
Mark-sweep is simple to implement, can reclaim cyclic structures easily, and doesn't place any burden on the compiler or mutator like reference counting does. But it has deficiencies -- collection pauses can be long, and the entire heap is visited in the sweep phase, which can have very negative performance consequences on virtual memory systems where the heap may be paged.
The big problem with mark-sweep is that every active (that is, allocated) object, whether reachable or not, is visited during the sweep phase. Because a significant percentage of objects are likely to be garbage, this means that the collector is spending considerable effort examining and handling garbage. Mark-sweep collectors also tend to leave the heap fragmented, which can cause locality issues and can also cause allocation failures even when sufficient free memory appears to be available.
In a copying collector, another form of tracing collector, the heap is divided into two equally sized semi-spaces, one of which contains active data and the other is unused. When the active space fills up, the world is stopped and live objects are copied from the active space into the inactive space. The roles of the spaces are then flipped, with the old inactive space becoming the new active space.
Copying collection has the advantage of only visiting live objects, which means garbage objects will not be examined, nor will they need to be paged into memory or brought into the cache. The duration of collection cycles in a copying collector is driven by the number of live objects. However, copying collectors have the added cost of copying the data from one space to another, adjusting all references to point to the new copy. In particular, long-lived objects will be copied back and forth on every collection.
Copying collectors have another benefit, which is that the set of live objects are compacted into the bottom of the heap. This not only improves locality of reference of the user program and eliminates heap fragmentation, but also greatly reduces the cost of object allocation -- object allocation becomes a simple pointer addition on the top-of-heap pointer. There is no need to maintain free lists or look-aside lists, or perform best-fit or first-fit algorithms -- allocating N bytes is as simple as adding N to the top-of-heap pointer and returning its previous value, as suggested in Listing 1:

Listing 1. Inexpensive memory allocation in a copying collector
void *malloc(int n) { 
    if (heapTop - heapStart < n)
        doGarbageCollection();

    void *wasStart = heapStart;
    heapStart += n;
    return wasStart;
}

Developers who have implemented sophisticated memory management schemes for non-garbage-collected languages may be surprised at how inexpensive allocation is -- a simple pointer addition -- in a copying collector. This may be one of the reasons for the pervasive belief that object allocation is expensive -- earlier JVM implementations did not use copying collectors, and developers are still implicitly assuming allocation cost is similar to other languages, like C, when in fact it may be significantly cheaper in the Java runtime. Not only is the cost of allocation smaller, but for objects that become garbage before the next collection cycle, the deallocation cost is zero, as the garbage object will be neither visited nor copied.
The copying algorithm has excellent performance characteristics, but it has the drawback of requiring twice as much memory as a mark-sweep collector. The mark-compact algorithm combines mark-sweep and copying in a way that avoids this problem, at the cost of some increased collection complexity. Like mark-sweep, mark-compact is a two-phase process, where each live object is visited and marked in the marking phase. Then, marked objects are copied such that all the live objects are compacted at the bottom of the heap. If a complete compaction is performed at every collection, the resulting heap is similar to the result of a copying collector -- there is a clear demarcation between the active portion of the heap and the free area, so that allocation costs are comparable to a copying collector. Long-lived objects tend to accumulate at the bottom of the heap, so they are not copied repeatedly as they are in a copying collector.
OK, so which of these approaches does the JDK take for garbage collection? In some sense, all of them. Early JDKs used a single-threaded mark-sweep or mark-sweep-compact collector. JDKs 1.2 and later employ a hybrid approach, calledgenerational collection, where the heap is divided into several sections based on an object's age, and different generations are collected separately using different collection algorithms.
Generational garbage collection turns out to be very effective, although it introduces several additional bookkeeping requirements at runtime. In next month's Java theory and practice, we'll explore how generational garbage collection works and how it is employed by the 1.4.1 JVM, in addition to all the other garbage collection options offered by the 1.4.1 JVM. In the installment following that, we'll look at the performance impact of garbage collection, including debunking some performance myths related to memory management.


---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Friday, June 11, 2010

Java - Stack and Heap Memory


n this section we will look at how variables are stored in memory in Java. We are examining memory in Java at this point so that you can understand at a lower level what happens when you create and manipulate the objects that make up your programs.
Primitive data types have just one value to store. For instance:
int i = 1;
The appropriate amount of space is allocated given the data type, and the variable is stored in memory just as it is.
Objects must be stored differently because they are more complex. They often hold multiple values, each of which must be stored in memory. The association between each value and the object must be maintained throughout its life. An object reference variable must then hold a reference to those values. This reference represents the location where the object and its metadata are stored.
There are two kinds of memory used in Java. These are called stack memory and heap memory. Stack memory stores primitive types and the addresses of objects. The object values are stored in heap memory. An object reference on the stack is only an address that refers to the place in heap memory where that object is kept.
Say you've got two Test objects, and you assign the first to the second, like this:
Test test1 = new Test();
Test test2 = new Test();

test2 = test1;
What you're actually doing when you write this is assigning the address of the test1 object to thetest2 object. Assume that test1's memory address was 0x33d444 and that test2's address was0x99f775. After performing the above assignment, test2 now holds this address in stack memory:0x99f775, which refers to the same object as test1. The test2 object on the heap still exists, but it cannot be accessed. That's because this reassignment overwrote the old address that test2 was keeping on the stack. This kind of reassignment makes two stack references to the same object on the heap.
It is useful to know that these two different kinds of memory exist in Java. Stack memory is the program's memory, and heap memory resides outside of the program.
As a Java programmer, you do not have to directly address memory allocation and recovery of memory space, which is a common headache for C++ programmers. When you need a new object, Java allocates the required memory. When you are done with an object, the memory is reclaimed for you automatically via Java's garbage collection facility.
Garbage collection runs as a thread in the background, looking for objects that no longer have a usable reference. When it finds them, it destroys them and reclaims the memory.
The implementation of garbage collection varies between Java Virtual Machines. They generally follow the same process, however. First, the garbage collector gets a snapshot of all running threads and all loaded classes. Then, all objects that are referred to by this thread set are marked as current. The process stops when all objects that it is possible to reach have been marked and the rest have been discarded.
In order to help the Virtual Machine, it is a good idea to remove your references to unneeded objects. This is often done by simply setting your reference to null:
Test t = new Test();
t.someAction();
// all done
t = null;

7.8.1 Finalizers

Constructors create objects. Most OOP languages provide methods for you to clean up after yourself. That is, they provide a way for you to destroy the objects you've created after you no longer reference them. Such methods are called finalizers.
NOTE
ColdFusion does not have finalizers for the same reason it doesn't have constructors: No objects are created. Memory is allocated in ColdFusion for what CF developers refer to as objects, such as the application object, the session object, and the query object. And good ColdFusion programming practice dictates that you shouldn't have a bunch of stuff lying around memory resident, so when you're done with a session variable but you still have an active session, you can set references you don't need to null.
Because all objects in Java extend java.lang.Object, which defines a finalize() method, any object can call finalize(). The protected void finalize() method returns nothing and does nothing. The finalize() method may never be invoked more than once by a JVM for an object.
The typical example for when to use finalizers is when you're working with file input/output. When you open a connection to a resource, such as a file, you might not have a chance to close it before an exception is thrown. In this case, you've got invalid processes consuming system resources unnecessarily.
Finalizers are generally regarded as the sort of thing to do in a last-ditch effort to save system resources. This is because they do not necessarily run as soon as it is possible. That means that the programmer cannot be certain when exactly they will run. Finalizers should therefore be used sparingly. We will see alternatives in Chapter 8, "Exceptions."

Wednesday, June 9, 2010

Mysql installation on linux

   1. Become the superuser if you are working in your account. (Type "su" and the prompt and give the root password).
   2. Change to the directory that has the RPM download.
   3. Type the following command at the prompt:

      rpm -ivh "mysql_file_name.rpm"

      Similarly you can also install the MySQL client and MySQL development RPMs if you've downloaded them.
      Alternatively, you can install the RPMs through GnoRPM (found under System).
   4. Now we'll set a password for the root user. Issue the following at the prompt.

      mysqladmin -u root password mysqldata

      where mysqldata is the password for the root. (Change this to anything you like).
   5. It is now time to test the programs. Typing the following at the prompt starts the mysql client program.

      mysql -u root -p

      The system asks for the the password. Type the root password (mysqldata).
      If you don't get the prompt for password, it might be because MySQL Server is not running. To start the server, change to /etc/rc.d/init.d/ directory and issue the command ./mysql start (or mysql start depending on the value of the PATH variable on your system). Now invoke mysql client program.
   6. Once MySQL client is running, you should get the mysql> prompt. Type the following at this prompt:

      show databases;

   7. You should now get a display similar to:

      +----------------+
      | Database       |
      +----------------+
      | mysql          |
      | test           |
      +----------------+
      2 rows in set (0.00 sec)

Okay, we've successfully installed MySQL on your system. Now let's look at some MySQL basics.

Java installation on linux box.

Install java on linux machine  (http://www.meritonlinesystems.com/docs/apache_tomcat_redhat.html)

DOwnload page for java : http://java.sun.com/j2se/1.4.2/download.html

I chose the J2SE v1_4_2_12 SDK Linux self-extracting binary file. (j2re-1_4_2_12-linux-i586.bin,j2sdk-1_4_2_12-linux-i586.bin present in /usr/local/java directory)

1.Change to the directory where you downloaded the SDK and make the self-extracting binary executable:

chmod +x j2sdk-1_4_2_12-linux-i586.bin

2.Run the self-extracting binary:

./j2sdk-1_4_2_12-linux-i586.bin

3.There should now be a directory called j2sdk1.4.2.12 in the download directory.
Move the SDK directory to where you want it to be installed. I chose to install it in /usr/local/java.
Create /usr/local/java if it doesn't exist. Here is the command I used from inside the download directory:

mv j2sdk1.4.2.12 /usr/local/java

4.Set the JAVA_HOME environment variable, by modifying /etc/profile so it includes the following:
JAVA_HOME="/usr/local/java/j2sdk1.4.2.12"
export JAVA_HOME

Note : /etc/profile is run at startup and when a user logs into the system, so you will need to log out and log back in for JAVA_HOME to be defined.
exit
su -

5.Check to make sure JAVA_HOME is defined correctly using the command below. You should see the path to your Java SDK.
echo $JAVA_HOME

also do following if java not orking
PATH=$PATH:/usr/java/j2sdk1.4.2_03­ /bin
JAVA_HOME=/usr/java/j2skd1.4.2_03
export PATH


Installation of RPM File

It requires root access to install.

1. Run the rpm command to install the packages that comprise the Java 2 SDK:

rpm -iv jdk-6u2-linux-i586.rpm

2. Delete the bin and rpm file if you want to save disk space.



Note: By default, the .rpm is installed in /usr/java. Use which javac to ensure the classpath was setup correctly.

TO Uninstall any rpm use following command
rpm -e

Linux - Samba Mount

Bellow are the steps for samba mounting :

1)  Share the folder on Windows machine.e.g on "1521.1101.1491.2271" windows machine I have shared "DocumentRepository" folder.
    Give full access to any user e.g administrator

2) Create directory on Linux box on which you want mount.
    e.g  /u/ServicesPortal/DocMgmtAttachments

3) Edit /etc/fstab file and make entry for your mount . Please refer attached file for your referemce
    e.g //1521.1101.1491.2271/DocumentRepository /u/ServicesPortal/DocMgmtAttachments smbfs credentials=/etc/samba/siebelFile.credential 0 0
   Note : In fstab file, line starting with "#" represents comment.

4) Create credential file required for mountig. Please find attached credential file for your reference.
    In above example we have credential file at /etc/samba/siebelFile.credential  location.
    It contains following entry :
  username=administrator
  password=administrator

5) Now run following command for mounting
    Command is : "mount -a"

6) To check mounting is done or not, run following command which will show you alla mounts
    Command is :  "mount"

7)  for mannual mount use
mount -t smbfs -o username=Nihilent,password=N3wp@55,workgroup=augsoa //1521.1101.01.361/DocumentRepository  /u/ServicesPortal/DocMgmtAttachments


Note :
a) fstab file looks as bellow


# This file is edited by fstab-sync - see 'man fstab-sync' for details
/dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
none                    /dev/shm                tmpfs   defaults        0 0
none                    /proc                   proc    defaults        0 0
none                    /sys                    sysfs   defaults        0 0
/dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0
/dev/hdc                /media/cdrom            auto    pamconsole,exec,noauto,managed 0 0
/dev/fd0                /media/floppy           auto    pamconsole,exec,noauto,managed 0 0

#Following mount is used as a alternative to CMDB mount (Represents portal file server location)
//1521.1101.1491.2271/DocumentRepository /u/ServicesPortal/DocMgmtAttachments smbfs credentials=/etc/samba/siebelFile.credential 0 0

----------------------------------------------------------------------------------------------------------
b) siebel.credential file looks as bellow :


username=siebfiles
password=siebfiles
gid=UKLHCSVSBLWEBBL

Apcache-Tomcat integration

Bellow are the steps for Apcache-Tomcat integration :
An AJP Connector is used for integration of Tomcat with Apache. The AJP connector will provide faster performance than proxied HTTP. AJP clustering is the most efficient from Tomcat perspective.
The native connectors supported with this Tomcat release are:
o JK Connector 1.2.x with any of the supported servers.

Configuring Tomcat
o Add the following Connector tag in the $CATALINAHOME/conf/server.xml file.


Configuring Apache
o Define the following properties in the $APACHE_HOME/conf/workers.properties file.
worker.list=ServicesPortal
worker.ServicesPortal.port=8009
worker.ServicesPortal.host=
worker.ServicesPortal.socket_keepalive=600000
worker.ServicesPortal.type=ajp13

o Add the following directives in “$APACHE_HOME/conf/httpd.conf.proxy” to configure mod_jk connector
LoadModule jk_module  libexec/mod_jk.so
AddModule mod_jk.c
JkWorkersFile $APACHE_HOME/conf/workers.properties
JkLogFile /var/log/httpd/mod_jk.log
JkLogLevel info
JkLogStampFormat "[%a %b %d %H:%M:%S %Y]"
JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories

o Add the following directive in “$APACHE_HOME/conf/httpd.conf.proxy” to re-direct all the requests for Services Portal to Tomcat worker called ServicesPortal
JkMount /ServicesPortal/* ServicesPortal

o Add the following rewrite rule in $APACHE_HOME/conf/httpd.conf.proxy file.
RewriteRule ^/ServicesPortal/.* - [L,PT]

o Add the following directive in “$APACHE_HOME/conf/httpd.conf.proxy” to provide RSA authentication for Insite customer


                   AuthType      "SecurID"
                       require  valid-user


Tomcat installation on Linux

Following are the steps related to tomcat installation on Linux box :

Note:

JAVA must be installed before installing tomcat and JAVA_HOME must be setup.

Downloads :

Install Tomcat on linux machine  (Refer this link for JAVA and Tomcat installation : http://www.meritonlinesystems.com/docs/apache_tomcat_redhat.html)
download java from : http://java.sun.com/javase/downloads/index_jdk5.jsp

Download the latest release binary build from http://www.apache.org/dist/jakarta/tomcat-5/. Since Tomcat runs directly on top of a standard JDK
I chose the gnu zipped tar file (jakarta-tomcat-5.0.28.tar.gz).

Steps :

1.Unzip Tomcat by issuing the following command from your download directory: (Currently we have jakarta-tomcat-5.0.28.tar.gz inside /usr/local directory)
tar xvzf jakarta-tomcat-5.0.28.tar.gz

2.The directory where Tomcat is installed is referred to as CATALINA_HOME in the Tomcat documentation.
In this installation CATALINA_HOME=/usr/local/jakarta-tomcat-5.0.28.

add CATALINA_HOME=/usr/local/jakarta-tomcat-5.0.28 in /etc/profile file

it will look like

/******* /etc/profile *************/
 CATALINA_HOME=/usr/local/jakarta-tomcat-5.0.28
/*******************************/



Optional : following are the optional steps

1.I recommend setting up a symbolic link to point to your current Tomcat version. This will save you from having to make changes to startup and shutdown scripts each time
you upgrade Tomcat. It also allows you to keep several versions of Tomcat on your system and easily switch amongst them.
Here is the command I issued from inside /usr/local to create a symbolic link called /usr/local/jakarta-tomcat that points
to /usr/local/jakarta-tomcat-5.0.28:

ln -s jakarta-tomcat-5.0.28 jakarta-tomcat

2.Change the group and owner of the /usr/local/jakarta-tomcat and /usr/local/jakarta-tomcat-5.0.28 directories to tomcat:

chown tomcat.tomcat /usr/local/jakarta-tomcat
chown -R tomcat.tomcat /usr/local/jakarta-tomcat-5.0.28

Setting Up SSL on Tomcat

Setting Up SSL on Tomcat


Step 1. Generating the KeyStore file

i) Go to bin directory of jdk
cd %JAVA_HOME%/bin on Windows
cd $JAVA_HOME/bin on Linuxkeytool -genkey -alias techtracer -keypass ttadmin -keystore techtracer.bin -storepass ttadmin

ii)execute following command
keytool -genkey -alias -keypass -keystore -storepass

please note that the values in angle  brackets i.e. inside <> can be changed as per ur choice.
but make sure keypass and storepass passwords should be the same.
The .bin file is actually your keystore file

When you enter it will ask you some questions.Look below for a reference as to what to answer for the questions.

What is your first and last name?
[Unknown]: yogita Sananse
What is the name of your organizational unit?
[Unknown]: home
What is the name of your organization?
[Unknown]: mycert
What is the name of your City or Locality?
[Unknown]: pune
What is the name of your State or Province?
[Unknown]: maharashtra
What is the two-letter country code for this unit?
[Unknown]: IN
Is CN=nitin pai, OU=home, O=techtracer, L=mumbai, ST=maharashtra, C=IN correct?
[no]: yes

The command would then conclude. It would make a .bin file with the name you had provided inside the bin directory itself.
In this case it is mycert.bin which will located in

C:\Program Files\Java\jdk1.6.0_02\bin\
Note : if you do not find it in bin directory please check c:\Documents and Settings\yogita.sananse\ folder it may be created here

iii) Put the .bin file in the webapps directory of Tomcat.
This is required to avoid the need to give an absolute path of the file in the next step.


Step 2.Configuring Tomcat for using the Keystore file

i) Modify server.xml :
Open the file server.xml which can be found at: /conf/server.xml

Now you have to modify it. Find the Connector element which has port=”8443″ and uncomment it if already not done. Add two lines.
last two lines in bellow tag are newly added ones.

maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="true" disableUploadTimeout="true"
acceptCount="100″ debug=”0″ scheme="https" secure="true"
clientAuth="false" sslProtocol="TLS"
keystoreFile="../webapps/techtracer.bin"
keystorePass="ttadmin" />


Now all you have to do is start your server and check the working of SSL by pointing your browser to the URL to:

https://localhost:8443/
Note : Now that you have your tomcat running in the SSL mode you are ready to deploy an application to test its working. You must note that still your tomcat can run in normal mode too at the same time i.e on port 8080 with http. So it is but obvious that any application deployed to the server will be running on http and https at the same time. This is something that we don’t want. We want our application to run only in the secured mode.
Step 3 : Configuring your web application to work with SSL

Modify web.xml of your application which you want to make https enabled only and want to disable http access for it .
(In order to do this for our test, take any application which has already been deployed successfully in Tomcat and first access it through http and https to see if it works fine.)

If yes, then open the web.xml of that application and just add this XML fragment before web-app ends i.e 



securedapp
/*


CONFIDENTIAL

The term CONFIDENTIAL is the term which tells the server to make the application work on SSL. If you want to turn the SSL mode for this application off then just turn don’t delete the fragment. Just put the value as NONE instead ofCONFIDENTIAL. That’s it!

Tuesday, May 18, 2010

The Java IAQ: Infrequently Answered Questions

Q: What is an Infrequently Answered Question?

A question is infrequently answered either because few people know the answer or because it is about an obscure, subtle point (but a point that may be crucial to you). I thought I had invented the term, but it also shows up at the very informative About.com Urban Legends site. There are lots of Java FAQs around, but this is the only Java IAQ. (There are a few Infrequently Asked Questions lists, including a satirical one on C.)

Q:The code in a finally clause will never fail to execute, right?

Well, hardly ever. But here's an example where the finally code will not execute, regardless of the value of the boolean choice:
try {
    if (choice) {
      while (true) ;
    } else {
      System.exit(1);
    }
  } finally {
    code.to.cleanup();
  }



Q:Within a method m in a class C, isn't this.getClass() always C?

No. It's possible that for some object x that is an instance of some subclass C1 of C either there is no C1.m() method, or some method on x called super.m(). In either case, this.getClass()is C1, not C within the body of C.m(). If C is final, then you're ok.

Q: I defined an equals method, but Hashtable ignores it. Why?

equals methods are surprisingly hard to get right. Here are the places to look first for a problem:
  1. You defined the wrong equals method. For example, you wrote:
    public class C {
      public boolean equals(C that) { return id(this) == id(that); }
    }

    But in order for table.get(c) to work you need to make the equals method take an Object as the argument, not a C:

    public class C {
      public boolean equals(Object that) { 
        return (that instanceof C) && id(this) == id((C)that); 
      } 
    }

    Why? The code for Hashtable.get looks something like this:

    public class Hashtable {
      public Object get(Object key) {
        Object entry;
        ...
        if (entry.equals(key)) ...
      }
    }

    Now the method invoked by entry.equals(key) depends upon the actual run-time type of the object referenced by entry, and the declared, compile-time type of the variable key. So when you as a user call table.get(new C(...)), this looks in class C for the equals method with argument of type Object. If you happen to have defined an equals method with argument of type C, that's irrelevent. It ignores that method, and looks for a method with signature equals(Object), eventually finding Object.equals(Object). If you want to over-ride a method, you need to match argument types exactly. In some cases, you may want to have two methods, so that you don't pay the overhead of casting when you know you have an object of the right class:

    public class C {
      public boolean equals(Object that) {
        return (this == that) 
                || ((that instanceof C) && this.equals((C)that)); 
      }
    
      public boolean equals(C that) { 
        return id(this) == id(that); // Or whatever is appropriate for class C
      } 
    }
    

  2. You didn't properly implement equals as an equality predicate: equals must be symmetric, transitive, and reflexive. Symmetric means a.equals(b) must have the same value asb.equals(a). (This is the one most people mess up.) Transitive means that if a.equals(b) and b.equals(c) then a.equals(c) must be true. Reflexive means that a.equals(a) must be true, and is the reason for the (this == that) test above (it's also often good practice to include this because of efficiency reasons: testing for == is faster than looking at all the slots of an object, and to partially break the recursion problem on objects that might have circular pointer chains).
  3. You forgot the hashCode method. Anytime you define a equals method, you should also define a hashCode method. You must make sure that two equal objects have the same hashCode, and if you want better hashtable performance, you should try to make most non-equal objects have different hashCodes. Some classes cache the hash code in a private slot of an object, so that it need be computed only once. If that is the case then you will probably save time in equals if you include a line that says if (this.hashSlot != that.hashSlot) return false.
  4. You didn't handle inheritance properly. First of all, consider if two objects of different class can be equal. Before you say "NO! Of course not!" consider a class Rectangle with width andheight fields, and a Box class, which has the above two fields plus depth. Is a Box with depth == 0 equal to the equivalent Rectangle? You might want to say yes. If you are dealing with a non-final class, then it is possible that your class might be subclassed, and you will want to be a good citizen with respect to your subclass. In particular, you will want to allow an extender of your class C to use your C.equals method using super as follows:
    public class C2 extends C {
    
      int newField = 0;
    
      public boolean equals(Object that) {
        if (this == that) return true;
        else if (!(that instanceof C2)) return false;
        else return this.newField == ((C2)that).newField && super.equals(that);
      }
    
    } 

    To allow this to work, you have to be careful about how you treat classes in your definition of C.equals. For example, check for that instanceof C rather than that.getClass() == C.class. See the previous IAQ question to learn why. Use this.getClass() == that.getClass() if you are sure that two objects must be of the same class to be considered equals.
  5. You didn't handle circular references properly. Consider:
    public class LinkedList {
    
      Object contents;
      LinkedList next = null;
    
      public boolean equals(Object that) {
        return (this == that) 
          || ((that instanceof LinkedList) && this.equals((LinkedList)that)); 
      }
    
      public boolean equals(LinkedList that) { // Buggy!
       return Util.equals(this.contents, that.contents) &&
              Util.equals(this.next, that.next); 
      }
    
    } 

    Here I have assumed there is a Util class with:

    public static boolean equals(Object x, Object y) {
        return (x == y) || (x != null && x.equals(y));
      } 

    I wish this method were in Object; without it you always have to throw in tests against null. Anyway, the LinkedList.equals method will never return if asked to compare two LinkedLists with circular references in them (a pointer from one element of the linked list back to another element). See the description of the Common Lisp function list-length for an explanation of how to handle this problem in linear time with only two words of extra storge. (I don't give the answer here in case you want to try to figure it out for yourself first.)

Q: I tried to forward a method to super, but it occasionally doesn't work. Why?

This is the code in question, simplified for this example:
/** A version of Hashtable that lets you do
 * table.put("dog", "canine");, and then have
 * table.get("dogs") return "canine". **/

public class HashtableWithPlurals extends Hashtable {

  /** Make the table map both key and key + "s" to value. **/
  public Object put(Object key, Object value) {
    super.put(key + "s", value);
    return super.put(key, value);
  }
}

You need to be careful when passing to super that you fully understand what the super method does. In this case, the contract for Hashtable.put is that it will record a mapping between the key and the value in the table. However, if the hashtable gets too full, then Hashtable.put will allocate a larger array for the table, copy all the old objects over, and then recursively re-calltable.put(key, value). Now, because Java resolves methods based on the runtime type of the target, in our example this recursive call within the code for Hashtable will go toHashtableWithPlurals.put(key, value), and the net result is that occasionally (when the size of the table overflows at just the wrong time), you will get an entry for "dogss" as well as for "dogs" and "dog". Now, does it state anywhere in the documentation for put that doing this recursive call is a possibility? No. In cases like this, it sure helps to have source code access to the JDK.

Q: Why does my Properties object ignore the defaults when I do a get?

You shouldn't do a get on a Properties object; you should do a getProperty instead. Many people assume that the only difference is that getProperty has a declared return type of String, while get is declared to return an Object. But actually there is a bigger difference: getProperty looks at the defaults. get is inherited from Hashtable, and it ignores the default, thereby doing exactly what is documented in the Hashtable class, but probably not what you expect. Other methods that are inherited from Hashtable (like isEmpty and toString) will also ignore defaults. Example code:
Properties defaults = new Properties();
defaults.put("color", "black");

Properties props = new Properties(defaults);

System.out.println(props.get("color") + ", " + 
props.getProperty(color));
// This prints "null, black"

Is this justified by the documentation? Maybe. The documentation in Hashtable talks about entries in the table, and the behavior of Properties is consistent if you assume that defauls are not entries in the table. If for some reason you thought defaults were entries (as you might be led to believe by the behavior of getProperty) then you will be confused.

Q:Inheritance seems error-prone. How can I guard against these errors?

The previous two questions show that a programmer neeeds to be very careful when extending a class, and sometimes just in using a class that extends another class. Problems like these two lead John Ousterhout to say "Implementation inheritance causes the same intertwining and brittleness that have been observed when goto statements are overused. As a result, OO systems often suffer from complexity and lack of reuse." (Scripting, IEEE Computer, March 1998) and Edsger Dijkstra to allegedly say "Object-oriented programming is an exceptionally bad idea which could only have originated in California." (from a collection of signature files). I don't think there's a general way to insure being safe, but there are a few things to be aware of:
  • Extending a class that you don't have source code for is always risky; the documentation may be incomplete in ways you can't foresee.
  • Calling super tends to make these unforeseen problems jump out.
  • You need to pay as much attention to the methods that you don't over-ride as the methods that you do. This is one of the big fallacies of Object-Oriented design using inheritance. It is true that inheritance lets you write less code. But you still have to think about the code you don't write.
  • You're especially looking for trouble if the subclass changes the contract of any of the methods, or of the class as a whole. It is difficult to tell when a contract is changed, since contracts are informal (there is a formal part in the type signature, but the rest appears only in comments). In the Properties example, it is not clear if a contract is being broken, because it is not clear if the defaults are to be considered "entries" in the table or not.

Q:What are some alternatives to inheritance?

Delegation is an alternative to inheritance. Delegation means that you include an instance of another class as an instance variable, and forward messages to the instance. It is often safer than inheritance because it forces you to think about each message you forward, because the instance is of a known class, rather than a new class, and because it doesn't force you to accept all the methods of the super class: you can provide only the methods that really make sense. On the other hand, it makes you write more code, and it is harder to re-use (because it is not a subclass).For the HashtableWithPlurals example, delegation would give you this (note: as of JDK 1.2, Dictionary is considered obsolete; use Map instead):

/** A version of Hashtable that lets you do
 * table.put("dog", "canine");, and then have
 * table.get("dogs") return "canine". **/

public class HashtableWithPlurals extends Dictionary {

  Hashtable table = new Hashtable();

  /** Make the table map both key and key + "s" to value. **/
  public Object put(Object key, Object value) {
    table.put(key + "s", value);
    return table.put(key, value);
  }

  ... // Need to implement other methods as well
}

The Properties example, if you wanted to enforce the interpretation that default values are entries, would be better done with delegation. Why was it done with inheritance, then? Because the Java implementation team was rushed, and took the course that required writing less code.

Q: Why are there no global variables in Java?

Global variables are considered bad form for a variety of reasons:
  • Adding state variables breaks referential transparency (you no longer can understand a statement or expression on its own: you need to understand it in the context of the settings of the global variables).
  • State variables lessen the cohesion of a program: you need to know more to understand how something works. A major point of Object-Oriented programming is to break up global state into more easily understood collections of local state.
  • When you add one variable, you limit the use of your program to one instance. What you thought was global, someone else might think of as local: they may want to run two copies of your program at once.
For these reasons, Java decided to ban global variables.

Q: I still miss global variables. What can I do instead?

That depends on what you want to do. In each case, you need to decide two things: how many copies of this so-called global variable do I need? And where would be a convenient place to put it? Here are some common solutions:
If you really want only one copy per each time a user invokes Java by starting up a Java virtual machine, then you probably want a static instance variable. For example, you have a MainWindow class in your application, and you want to count the number of windows that the user has opened, and initiate the "Really quit?" dialog when the user has closed the last one. For that, you want:
// One variable per class (per JVM)
public Class MainWindow {
  static int numWindows = 0;
  ...
  // when opening: MainWindow.numWindows++;  
  // when closing: MainWindow.numWindows--;
}
In many cases, you really want a class instance variable. For example, suppose you wrote a web browser and wanted to have the history list as a global variable. In Java, it would make more sense to have the history list be an instance variable in the Browser class. Then a user could run two copies of the browser at once, in the same JVM, without having them step on each other.
// One variable per instance
public class Browser {
  HistoryList history = new HistoryList();
  ...
  // Make entries in this.history
}
Now suppose that you have completed the design and most of the implementation of your browser, and you discover that, deep down in the details of, say, the Cookie class, inside the Http class, you want to display an error message. But you don't know where to display the message. You could easily add an instance variable to the Browser class to hold the display stream or frame, but you haven't passed the current instance of the browser down into the methods in the Cookie class. You don't want to change the signatures of many methods to pass the browser along. You can't use a static variable, because there might be multiple browsers running. However, if you can guarantee that there will be only one browser running per thread (even if each browser may have multiple threads) then there is a good solution: store a table of thread-to-browser mappings as a static variable in the Browser class, and look up the right browser (and hence display) to use via the current thread:
// One "variable" per thread
public class Browser {
  static Hashtable browsers = new Hashtable();
  public Browser() { // Constructor
    browsers.put(Thread.currentThread(), this);
  }
  ...
  public void reportError(String message) {
    Thread t = Thread.currentThread();
    ((Browser)Browser.browsers.get(t))
      .show(message)
  }
}
Finally, suppose you want the value of a global variable to persist between invocations of the JVM, or to be shared among multiple JVMs in a network of machines. Then you probably should use a database which you access through JDBC, or you should serialize data and write it to a file.


Q: Can I write sin(x) instead of Math.sin(x)?

Short answer: Before Java 1.5, no. As of Java 1.5, yes, using static imports; you can now write import static java.lang.Math.* and then use sin(x) with impunity. But note the warning from Sun: "So when should you use static import? Very sparingly!"Here are some of the options that could be used before Java 1.5:

If you only want a few methods, you can put in calls to them within your own class:
public static double sin(double x) { return Math.sin(x); }
public static double cos(double x) { return Math.cos(x); }
...
sin(x)
Static methods take a target (thing to the left of the dot) that is either a class name, or is an object whose value is ignored, but must be declared to be of the right class. So you could save three characters per call by doing:
// Can't instantiate Math, so it must be null.
Math m = null; 
... 
m.sin(x)
java.lang.Math is a final class, so you can't inherit from it, but if you have your own set of static methods that you would like to share among many of your own classes, then you can package them up and inherit them:
public abstract class MyStaticMethods { 
  public static double mysin(double x) { ... } 
}

public class MyClass1 extends MyStaticMethods { 
  ... 
  mysin(x)
}

Peter van der Linden, author of Just Java, recommends against both of the last two practices in his FAQ. I agree with him that Math m = null is a bad idea in most cases, but I'm not convinced that the MyStaticMethods demonstrates "very poor OOP style to use inheritance to obtain a trivial name abbreviation (rather than to express a type hierarchy)." First of all, trivial is in the eye of the beholder; the abbreviation may be substantial. (See an example of how I used this approach to what I thought was good effect.) Second, it is rather presumptuous to say that this is very bad OOP style. You could make a case that it is bad Java style, but in languages with multiple inheritance, this idiom would be more acceptable.
Another way of looking at it is that features of Java (and any language) necessarily involve trade-offs, and conflate many issues. I agree it is bad to use inheritance in such a way that you mislead the user into thinking that MyClass1 is inheriting behavior from MyStaticMethods, and it is bad to prohibit MyClass1 from extending whatever other class it really wants to extend. But in Java the class is also the unit of encapsulation, compilation (mostly), and name scope. The MyStaticMethod approach scores negative points on the type hierarchy front, but positive points on the name scope front. If you say that the type hierarchy view is more important, I won't argue with you. But I will argue if you think of a class as doing only one thing, rather than many things at once, and if you think of style guides as absolute rather than as trade-offs.

Q: Is null an Object?

Absolutely not. By that, I mean (null instanceof Object) is false. Some other things you should know about null:
  1. You can't call a method on null: x.m() is an error when x is null and m is a non-static method. (When m is a static method it is fine, because it is the class of x that matters; the value is ignored.)
  2. There is only one null, not one for each class. Thus, ((String) null == (Hashtable) null), for example.
  3. It is ok to pass null as an argument to a method, as long as the method is expecting it. Some methods do; some do not. So, for example, System.out.println(null) is ok, butstring.compareTo(null) is not. For methods you write, your javadoc comments should say whether null is ok, unless it is obvious.
  4. In JDK 1.1 to 1.1.5, passing null as the literal argument to a constructor of an anonymous inner class (e.g., new SomeClass(null) { ...} caused a compiler error. It's ok to pass an expression whose value is null, or to pass a coerced null, like new SomeClass((String) null) { ...}
  5. There are at least three different meanings that null is commonly used to express:
    • Uninitialized. A variable or slot that hasn't yet been assigned its real value.
    • Non-existant/not applicable. For example, terminal nodes in a binary tree might be represented by a regular node with null child pointers.
    • Empty. For example, you might use null to represent the empty tree. Note that this is subtly different from the previous case, although some people make the mistake of confusing the two cases. The difference is whether null is an acceptable tree node, or whether it is a signal to not treat the value as a tree node. Compare the following three implementations of binary tree nodes with an in-order print method:

// null means not applicable
// There is no empty tree.

class Node {
  Object data;
  Node left, right;

  void print() {
    if (left != null)
      left.print();
    System.out.println(data);
    if (right != null)
      right.print();
  }
}
// null means empty tree
// Note static, non-static methods

class Node {
  Object data;
  Node left, right;

  void static print(Node node) {
    if (node != null) node.print();
  }

  void print() {
    print(left);
    System.out.println(data);
    print(right);
  }
}
// Separate class for Empty
// null is never used

interface Node { void print(); }

class DataNode implements Node{
  Object data;
  Node left, right;

  void print() {
    left.print();
    System.out.println(data);
    right.print();
  }
}

class EmptyNode implements Node { 
  void print() { }
}


Q: How big is an Object? Why is there no sizeof?

C has a sizeof operator, and it needs to have one, because the user has to manage calls to malloc, and because the size of primitive types (like long) is not standardized. Java doesn't need a sizeof, but it would still have been a convenient aid. Since it's not there, you can do this:
static Runtime runtime = Runtime.getRuntime();
...
long start, end;
Object obj;
runtime.gc();
start = runtime.freememory();
obj = new Object(); // Or whatever you want to look at
end =  runtime.freememory();
System.out.println("That took " + (start-end) + " 
bytes.");

This method is not foolproof, because a garbage collection could occur in the middle of the code you are instrumenting, throwing off the byte count. Also, if you are using a just-in-time compiler, some bytes may come from generating code.
You might be surprised to find that an Object takes 16 bytes, or 4 words, in the Sun JDK VM. This breaks down as follows: There is a two-word header, where one word is a pointer to the object's class, and the other points to the instance variables. Even though Object has no instance variables, Java still allocates one word for the variables. Finally, there is a "handle", which is another pointer to the two-word header. Sun says that this extra level of indirection makes garbage collection simpler. (There have been high performance Lisp and Smalltalk garbage collectors that do not use the extra level for at least 15 years. I have heard but have not confirmed that the Microsoft JVM does not have the extra level of indirection.)
An empty new String() takes 40 bytes, or 10 words: 3 words of pointer overhead, 3 words for the instance variables (the start index, end index, and character array), and 4 words for the empty char array. Creating a substring of an existing string takes "only" 6 words, because the char array is shared. Putting an Integer key and Integer value into a Hashtable takes 64 bytes (in addition to the four bytes that were pre-allocated in the Hashtable array): I'll let you work out why.

Q: In what order is initialization code executed? What should I put where?

Instance variable initialization code can go in three places within a class:
In an instance variable initializer for a class (or a superclass).
class C {
    String var = "val";
In a constructor for a class (or a superclass).
public C() { var = "val"; }
In an object initializer block. This is new in Java 1.1; its just like a static initializer block but without the keyword static.
{ var = "val"; }
}

The order of evaluation (ignoring out of memory problems) when you say new C() is:
  1. Call a constructor for C's superclass (unless C is Object, in which case it has no superclass). It will always be the no-argument constructor, unless the programmer explicitly codedsuper(...) as the very first statement of the constructor.
  2. Once the super constructor has returned, execute any instance variable initializers and object initializer blocks in textual (left-to-right) order. Don't be confused by the fact that javadoc andjavap use alphabetical ordering; that's not important here.
  3. Now execute the remainder of the body for the constructor. This can set instance variables or do anything else.
In general, you have a lot of freedom to choose any of these three forms. My recommendation is to use instance variable initailizers in cases where there is a variable that takes the same value regardless of which constructor is used. Use object initializer blocks only when initialization is complex (e.g. it requires a loop) and you don't want to repeat it in multiple constructors. Use a constructor for the rest.Here's another example:
Program:
class A {
    String a1 = ABC.echo(" 1: a1");
    String a2 = ABC.echo(" 2: a2");
    public A() {ABC.echo(" 3: A()");}
}

class B extends A {
    String b1 = ABC.echo(" 4: b1");
    String b2;
    public B() { 
        ABC.echo(" 5: B()"); 
        b1 = ABC.echo(" 6: b1 reset"); 
        a2 = ABC.echo(" 7: a2 reset"); 
    }
}

class C extends B {
    String c1; 
    { c1 = ABC.echo(" 8: c1"); }
    String c2;
    String c3 = ABC.echo(" 9: c3");

    public C() { 
        ABC.echo("10: C()"); 
        c2 = ABC.echo("11: c2");
        b2 = ABC.echo("12: b2");
    }
}

public class ABC {
    static String echo(String arg) {
        System.out.println(arg);
        return arg;
    }

    public static void main(String[] args) { 
        new C(); 
    }
}
Output:
1: a1
 2: a2
 3: A()
 4: b1
 5: B()
 6: b1 reset
 7: a2 reset
 8: c1
 9: c3
10: C()
11: c2
12: b2



Q: What about class initialization?

It is important to distinguish class initialization from instance creation. An instance is created when you call a constructor with new. A class C is initialized the first time it is actively used. At that time, the initialization code for the class is run, in textual order. There are two kinds of class initialization code: static initializer blocks (static { ... }), and class variable initializers (static String var = ...).Active use is defined as the first time you do any one of the following:
  1. Create an instance of C by calling a constructor;
  2. Call a static method that is defined in C (not inherited);
  3. Assign or access a static variable that is declared (not inherited) in C. It does not count if the static variable is initialized with a constant expression (one involving only primitive operators (like + or ||), literals, and static final variables), because these are initialized at compile time.
Here is an example:

Program:
class A {
    static String a1 = ABC.echo(" 1: a1");
    static String a2 = ABC.echo(" 2: a2");
}

class B extends A {
    static String b1 = ABC.echo(" 3: b1");
    static String b2;
    static { 
        ABC.echo(" 4: B()"); 
        b1 = ABC.echo(" 5: b1 reset"); 
        a2 = ABC.echo(" 6: a2 reset"); 
    }
}

class C extends B {
    static String c1; 
    static { c1 = ABC.echo(" 7: c1"); }
    static String c2;
    static String c3 = ABC.echo(" 8: c3");

    static { 
        ABC.echo(" 9: C()"); 
        c2 = ABC.echo("10: c2");
        b2 = ABC.echo("11: b2");
    }
}

public class ABC {
    static String echo(String arg) {
        System.out.println(arg);
        return arg;
    }

    public static void main(String[] args) { 
        new C(); 
    }
}
Output:
1: a1
 2: a2
 3: b1
 4: B()
 5: b1 reset
 6: a2 reset
 7: c1
 8: c3
 9: C()
10: c2
11: b2



Q: I have a class with six instance variables, each of which could be initialized or not. Should I write 64 constructors?

Of course you don't need (26) constructors. Let's say you have a class C defined as follows:
public class C { int a,b,c,d,e,f; }

Here are some things you can do for constructors:
  1. Guess at what combinations of variables will likely be wanted, and provide constructors for those combinations. Pro: That's how it's usually done. Con: Difficult to guess correctly; lots of redundant code to write.
  2. Define setters that can be cascaded because they return this. That is, define a setter for each instance variable, then use them after a call to the default constructor:
    public C setA(int val) { a = val; return this; }
    ...
    new C().setA(1).setC(3).setE(5);

    Pro: This is a reasonably simple and efficient approach. A similar idea is discussed by Bjarne Stroustrop on page 156 of The Design and Evolution of C++Con: You need to write all the little setters, they aren't JavaBean-compliant (since they return this, not void), they don't work if there are interactions between two values.
  3. Use the default constructor for an anonymous sub-class with a non-static initializer:
    new C() {{ a = 1; c = 3; e = 5; }}

    Pro: Very concise; no mess with setters. Con: The instance variables can't be private, you have the overhead of a sub-class, your object won't actually have C as its class (although it will still be an instanceof C), it only works if you have accessible instance variables, and many people, including experienced Java programmers, won't understand it. Actually, its quite simple: You are defining a new, unnamed (anonymous) subclass of C, with no new methods or variables, but with an initialization block that initializes a, c, and e. Along with defining this class, you are also making an instance. When I showed this to Guy Steele, he said "heh, heh! That's pretty cute, all right, but I'm not sure I would advocate widespread use..." As usual, Guy is right. (By the way, you can also use this to create and initialize a vector. You know how great it is to create and initialize, say, a String array with new String[] {"one", "two", "three"}. Now with inner classes you can do the same thing for a vector, where previously you thought you'd have to use assignement statements: new Vector(3) {{add("one"); add("two"); add("three")}}.)
  4. You can switch to a language that directly supports this idiom.. For example, C++ has optional arguments. So you can do this:
    class C {
    public: C(int a=1, int b=2, int c=3, int d=4, int e=5);
    }
    ...
    new C(10); // Construct an instance with defaults for b,c,d,e

    Common Lisp and Python have keyword arguments as well as optional arguments, so you can do this:

    C(a=10, c=30, e=50)            # Construct an instance; use defaults for b and d.


Q:When should I use constructors, and when should I use other methods?

The glib answer is to use constructors when you want a new object; that's what the keyword new is for. The infrequent answer is that constructors are often over-used, both in when they are called and in how much they have to do. Here are some points to consider
  • Modifiers: As we saw in the previous question, one can go overboard in providing too many constructors. It is usually better to minimize the number of constructors, and then provide modifier methods, that do the rest of the initialization. If the modifiers return this, then you can create a useful object in one expression; if not, you will need to use a series of statements. Modifiers are good because often the changes you want to make during construction are also changes you will want to make later, so why duplicate code between constructors and methods.
  • Factories: Often you want to create something that is an instance of some class or interface, but you either don't care exactly which subclass to create, or you want to defer that decision to runtime. For example, if you are writing a calculator applet, you might wish that you could call new Number(string), and have this return a Double if string is in floating point format, or a Long if string is in integer format. But you can't do that for two reasons: Number is an abstract class, so you can't invoke its constructor directly, and any call to a constructor must return a new instance of that class directly, not of a subclass. A method which returns objects like a constructor but that has more freedom in how the object is made (and what type it is) is called afactory. Java has no built-in support or conventions for factories, but you will want to invent conventions for using them in your code.
  • Caching and Recycling: A constructor must create a new object. But creating a new object is a fairly expensive operation. Just as in the real world, you can avoid costly garbage collection by recycling. For example, new Boolean(x) creates a new Boolean, but you should almost always use instead (x ? Boolean.TRUE : Boolean.FALSE), which recycles an existing value rather than wastefully creating a new one. Java would have been better off if it advertised a method that did just this, rather than advertising the constructor. Boolean is just one example; you should also consider recycling of other immutable classes, including Character, Integer, and perhaps many of your own classes. Below is an example of a recycling factory for Numbers. If I had my choice, I would call this Number.make, but of course I can't add methods to the Number class, so it will have to go somewhere else.

    public Number numberFactory(String str) throws NumberFormatException {
        try {
          long l = Long.parseLong(str);
          if (l >= 0 && l < cachedLongs.length) {
            int i = (int)l;
            if (cachedLongs[i] != null) return cachedLongs[i];
            else return cachedLongs[i] = new Long(str);
          } else {
            return new Long(l);
          }
        } catch (NumberFormatException e) {
          double d = Double.parseDouble(str);
          return d == 0.0 ? ZERO : d == 1.0 ? ONE : new Double(d);
        }
      }
    
      private Long[] cachedLongs = new Long[100];
      private Double ZERO = new Double(0.0);
      private Double ONE = new Double(1.0);
We see that new is a useful convention, but that factories and recycling are also useful. Java chose to support only new because it is the simplest possibility, and the Java philosophy is to keep the language itself as simple as possible. But that doesn't mean your class libraries need to stick to the lowest denominator. (And it shouldn't have meant that the built-in libraries stuck to it, but alas, they did.)

Q: Will I get killed by the overhead of object creation and GC?

Suppose the application has to do with manipulating lots of 3D geometric points. The obvious Java way to do it is to have a class Point with doubles for x,y,z coordinates. But allocating and garbage collecting lots of points can indeed cause a performance problem. You can help by managing your own storage in a resource pool. Instead of allocating each point when you need it, you can allocate a large array of Points at the start of the program. The array (wrapped in a class) acts as a factory for Points, but it is a socially-conscious recycling factory. The method callpool.point(x,y,z) takes the first unused Point in the array, sets its 3 fields to the specified values, and marks it as used. Now you as a programmer are responsible for returning Points to the pool once they are no longer needed. There are several ways to do this. The simplest is when you know you will be allocating Points in blocks that are used for a while, and then discarded. Then you do int pos = pool.mark() to mark the current position of the pool. When you are done with the section of code, you call pool.restore(pos) to set the mark back to the position. If there are a few Points that you would like to keep, just allocate them from a different pool. The resource pool saves you from garbage collection costs (if you have a good model of when your objects will be freed) but you still have the initial object creation costs. You can get around that by going "back to Fortran": using arrays of x,y and z coordinates rather than individual point objects. You have a class of Points but no class for an individual point. Consider this resource pool class:

public class PointPool {
  /** Allocate a pool of n Points. **/
  public PointPool(int n) {
    x = new double[n];
    y = new double[n];
    z = new double[n];
    next = 0;
  }
  public double x[], y[], z[];

  /** Initialize the next point, represented as in integer index. **/
  int point(double x1, double y1, double z1) { 
    x[next] = x1; y[next] = y1; z[next] = z1;
    return next++; 
  }

  /** Initialize the next point, initilized to zeros. **/
  int point() { return point(0.0, 0.0, 0.0); }

  /** Initialize the next point as a copy of a point in some pool. **/
  int point(PointPool pool, int p) {
    return point(pool.x[p], pool.y[p], pool.z[p]);
  }

  public int next;
}
You would use this class as follows:

PointPool pool = new PointPool(1000000);
PointPool results = new PointPool(100);
...
int pos = pool.next;
doComplexCalculation(...);
pool.next = pos;

...

void doComplexCalculation(...) {
  ...
  int p1 = pool.point(x, y, z);
  int p2 = pool.point(p, q, r);
  double diff = pool.x[p1] - pool.x[p2];
  ...
  int p_final = results.point(pool,p1);
  ...
}

Allocating a million points took half a second for the PointPool approach, and 6 seconds for the straightforward approach that allocates a million instances of a Point class, so that's a 12-fold speedup.
Wouldn't it be nice if you could declare p1, p2 and p_final as Point rather than int? In C or C++, you could just do typedef int Point, but Java doesn't allow that. If you're adventurous, you can set up make files to run your files through the C preprocessor before the Java compiler, and then you can do #define Point int.



Q: I have a complex expression inside a loop. For efficiency, I'd like the computation to be done only once. But for readability, I want it to stay inside the loop where it is used. What can I do?

Let's assume an example where match is a regular expression pattern match routine, and compile compiles a string into a finite state machine that can be used by match:
for(;;) {
  ...
  String str = ...
  match(str, compile("a*b*c*"));
  ...
}

Since Java has no macros, and little control over time of execution, your choices are limited here. One possibility, although not very pretty, is to use an inner interface with a variable initializer:

for(;;) {
  ...
  String str = ...
  interface P1 {FSA f = compile("a*b*c*);}
  match(str, P1.f);
  ...
}

The value for P1.f gets initialized on the first use of P1, and is not changed, since variables in interfaces are implicitly static finalIf you don't like that, you can switch to a language that gives you better control. In Common Lisp, the character sequence #. means to evaluate the following expression at read (compile) time, not run time. So you could write:

(loop
  ...
  (match str #.(compile "a*b*c*"))
  ...)



Q: What other operations are surprisingly slow?

Where do I begin? Here are a few that are most useful to know about. I wrote a timing utility that runs snippets of code in a loop, reporting the results in terms of thousands of iterations per second (K/sec) and microseconds per iteration (uSecs). Timing was done on a Sparc 20 with the JDK 1.1.4 JIT compiler. I note the following:
  • These were all done in 1998. Compilers have changed since then.
  • Counting down (i.e. for (int i=n; i>0; i--)) is twice as fast as counting up: my machine can count down to 144 million in a second, but up to only 72 million.
  • Calling Math.max(a,b) is 7 times slower than (a > b) ? a : b. This is the cost of a method call.
  • Arrays are 15 to 30 times faster than Vectors. Hashtables are 2/3 as fast as Vectors.
  • Using bitset.get(i) is 60 times slower than bits & 1 << i. This is the cost of a synchronized method call, mostly. Of course, if you want more than 64 bits, you can't use my bit-twiddling example. Here's a chart of times for getting and setting elements of various data structures:
    K/sec     uSecs          Code           Operation 
    =========  ======= ====================  ===========
      147,058    0.007 a = a & 0x100;        get element of int bits
          314    3.180 bitset.get(3);        get element of Bitset
       20,000    0.050 obj = objs[1];        get element of Array
        5,263    0.190 str.charAt(5);        get element of String
          361    2.770 buf.charAt(5);        get element of StringBuffer
          337    2.960 objs2.elementAt(1);   get element of Vector
          241    4.140 hash.get("a");        get element of Hashtable
    
          336    2.970 bitset.set(3);        set element of Bitset
        5,555    0.180 objs[1] = obj;        set element of Array
          355    2.810 buf.setCharAt(5,' ')  set element of StringBuffer
          308    3.240 objs2.setElementAt(1  set element of Vector
          237    4.210 hash.put("a", obj);   set element of Hashtable

  • Java compilers are very poor at lifting constant expressions out of loops. The C/Java for loop is a bad abstraction, because it encourages re-computation of the end value in the most typical case. So for(int i=0; i is three times slower than int len = str.length(); for(int i=0; i


Q: Can I get good advice from books on Java?

There are a lot of Java books out there, falling into three classes:Bad. Most Java books are written by people who couldn't get a job as a Java programmer (since programming almost always pays more than book writing; I know because I've done both). These books are full of errors, bad advice, and bad programs. These books are dangerous to the beginner, but are easily recognized and rejected by a programmer with even a little experience in another language.
Excellent. There are a small number of excellent Java books. I like the official specification and the books by Arnold and GoslingMarty Hall, and Peter van der Linden. For reference I like theJava in a Nutshell series and the online references at Sun (I copy the javadoc APIs and the language specification and its amendments to my local disk and bookmark them in my browser so I'll always have fast access.)
Iffy. In between these two extremes is a collection of sloppy writing by people who should know better, but either haven't taken the time to really understand how Java works, or are just rushing to get something published fast. One such example of half-truths is Edward Yourdon's Java and the new Internet programming paradigm from Rise and Resurrection of the American Programmer[footnote on Yourdon]. Here's what Yourdon says about how different Java is:
  • "Functions have been eliminated" It's true that there is no "function" keyword in Java. Java calls them methods (and Perl calls them subroutines, and Scheme calls them procedures, but you wouldn't say these languages have eliminated functions). One could reasonably say that there are no global functions in Java. But I think it would be more precise to say that there arefunctions with global extent; its just that they must be defined within a class, and are called "static method C.f" instead of "function f".
  • "Automatic coercions of data types have been eliminated" It's true that there are limits in the coercions that are made, but they are far from eliminated. You can still say (1.0 + 2) and 2 will be automatically coerced to a double. Or you can say ("one" + 2) and 2 will be coerced to a string.
  • "Pointers and pointer arithmetic have been eliminated" It's true that explicit pointer arithmetic has been eliminated (and good riddance). But pointers remain; in fact, every reference to an object is a pointer. (That's why we have NullPointerException.) It is impossible to be a competent Java programmer without understanding this. Every Java programmer needs to know that when you do:
    int[] a = {0, 1, 2};
        int[] b = a;
        b[0] = 99;
    
    then a[0] is 99 because a and b are pointers (or references) to the same object.
  • "Because structures are gone, and arrays and strings are represented as objects, the need for pointers has largely disappeared." This is also misleading. First of all, structures aren't gone, they're just renamed "classes". What is gone is programmer control over whether structure/class instances are allocated on the heap or on the stack. In Java all objects are allocated on the heap. That is why there is no need for syntactic markers (such as *) for pointers--if it references an object in Java, it's a pointer. Yourdan is correct in saying that having pointers to the middle of a string or array is considered good idiomatic usage in C and assembly language (and by some people in C++), but it is neither supported nor missed in other languages.
  • Yourdon also includes a number of minor typos, like saying that arrays have a length() method (instead of a length field) and that modifiable strings are represented by StringClass(instead of StringBuffer). These are annoying, but not as harmful as the more basic half-truths.