Saturday, September 22, 2007

I've added some timing information for running Impala in interactive test mode, and the numbers are quite interesting.

I start by firing up a JUnit test class, called WineDAOTest, which looks like the following:


public class WineDAOTest extends BaseDataTest {

public static void main(String[] args) {
PluginTestRunner.run(WineDAOTest.class);
}

public void testDAO() {

WineDAO dao = DynamicContextHolder.getBean(this, "wineDAO", WineDAO.class);

Wine wine = new Wine();
wine.setColor("red");
wine.setVineyard("Chateau X");
wine.setTitle("Cabernet");
wine.setVintage(1996);
dao.save(wine);

Collection winesOfVintage = dao.getWinesOfVintage(1996);
System.out.println("Wines of vintage 1996: " + winesOfVintage.size());
assertEquals(1, winesOfVintage.size());

wine.setVintage(2000);
wine.setColor("rose");
dao.update(wine);

Wine updated = dao.findById(wine.getId());
assertEquals(2000, updated.getVintage());

}

public PluginSpec getPluginSpec() {
return new PluginSpec("parent-context.xml", new String[] { "dao", "hibernate", "merchant" });
}

}






The Impala features have been added by

  • exposing a main method which calls PluginTestRunner
  • obtaining the Spring context bean using DynamicContextHolder.getBean(...)
  • providing an implementation of getPluginSpec(), which tells the PluginTestRunner which plugins to include as part of the test. Note that we have plugins for the basic Hibernate configuration, for the DAO implementations, and for the service classes (the merchant plugin)

We now run the class in interactive mode by running it as a Java application in Eclipse (instead of running it as a JUnit test. Here's the output.


--------------------------------
Enter u to show usage
>u
l [testClass] to load test class
[testName] to run test
reload [plugin name] to reload plugin
reload to reload parent context
s to show test methods
r to rerun last command
r to rerun last run test
e to exit
--------------------------------
Enter u to show usage
>s
Available test methods:
testDAO
--------------------------------
Enter u to show usage
>testDAO
Running test tests.WineDAOTest
.Wines of vintage 1996: 1
Time: 3.264
OK (1 test)

--------------------------------
Enter u to show usage
>reload
Parent context loaded in 0.551 seconds
Used memory: 3.4MB
Max available memory: 63.6MB

--------------------------------
Enter u to show usage
>reload
Parent context loaded in 0.34 seconds
Used memory: 4.2MB
Max available memory: 63.6MB

--------------------------------
Enter u to show usage
>reload
Parent context loaded in 0.521 seconds
Used memory: 3.8MB
Max available memory: 63.6MB

--------------------------------
Enter u to show usage
>reload hibernate
Plugin hibernate loaded in 0.21 seconds
Used memory: 4.3MB
Max available memory: 63.6MB

--------------------------------
Enter u to show usage
>reload merchant
Plugin merchant loaded in 0.16 seconds
Used memory: 4.6MB
Max available memory: 63.6MB

--------------------------------
Enter u to show usage
>reload dao
Plugin dao loaded in 0.141 seconds
Used memory: 4.5MB
Max available memory: 63.6MB

--------------------------------
Enter u to show usage
>reload
Parent context loaded in 0.511 seconds
Used memory: 4.1MB
Max available memory: 63.6MB

--------------------------------
Enter u to show usage
>reload
Parent context loaded in 0.43 seconds
Used memory: 3.9MB
Max available memory: 63.6MB

--------------------------------
Enter u to show usage
>testDAO
Running test tests.WineDAOTest
.Wines of vintage 1996: 1
Time: 0.081
OK (1 test)

--------------------------------
Enter u to show usage
>testDAO
Running test tests.WineDAOTest
.Wines of vintage 1996: 1
Time: 0.07
OK (1 test)

--------------------------------
Enter u to show usage
>



Notice how the initial test run took 3.2 seconds, which is how long we'd expect a small Spring/Hibernate test run to take. After that it gets more interesting.

Subsequent reloads of the application context (shown by the reload call) take only a fraction of the time: 0.55 seconds, 0.34 seconds, 0.51 seconds and 0.34 seconds. This is only a fraction of the time it takes to load the application context the first time.

Reloading individual plugins in much quicker: hibernate took 0.21 seconds, dao took 0.14 and merchant took 0.16 seconds. This means, at least for a small application, we can reflect changes almost instantly. Even for an application which loads 10 times more slowly, the numbers are still quite acceptable.

Running the test without doing the reload is also extremely fast: compare 0.07 and 0.08 seconds with the original 3.2 seconds.

Wednesday, September 19, 2007

The Rationale of Micro Hot Deployment

A big part of Impala's reason for being is that it supports micro hot deployment of Java applications. So what is micro hot deployment, and why is it a valuable concept?

Traditional hot deployment in Java is all about being able to update an application on the fly, typically a WAR or EAR on an application server. In reality, this kind of hot deployment is not terribly useful. Firstly, it often comes with memory leaks, which mean that after a couple of redeployments you may end up running out of memory. Secondly, because it is a coarse-grained redeployment, it can take quite a long time. For example, if the server itself only takes two or three seconds to start up, and the application takes 20 to 30 seconds to load, then in terms of downtime you little worse off doing the server restart.

Micro hot deployment involves hot (re)deployment of parts of an application. Java Servlet containers such as Tomcat already support this in a very limited sense. For example, JSPs, which are compiled into Servlets, can typically be updated without an application restart. This is because it is safe to associate a JSP with its own class loader, because the class which the JSP compiles to it will never be referenced by another application class. It is very much at the end of the application dependency chain.

The only other form of hot deployment considered safe by application servers is redeployment of full applications. This is a fairly brute force tactic for getting around the limitations and pitfalls of Java classloaders.

Other technologies do a much better job of implementing micro hot deployment. The likes of PHP and Ruby on Rails, for example. Even within the Java camp, scripting based solutions such as Grails, based on Groovy, have tackled this problem head-on.

Unfortunately, Java frameworks have been pretty slow to follow suit. Tapestry 5 now promises that application classes will be reloadable on the fly in production, not just development, and Wicket has a reloading filter which can be used to hot-redeploy pages. But these are the exception, not the rule, and most Java frameworks are less ambitious in this department.

Solving the hot redeployment is something that Java application frameworks need to get right if they want developers to stay with the platform in the long term. This means working with classloaders, which can be a tricky business. But tricky does not mean impossible.

Impala tackles the problem of micro micro hot deployment for Spring-based applications. It allows for the division of the application into modules which can be reloaded separately. One of the important principals that it recognises is that in terms of frequency of change, not all application artifacts are created equal. Let's start with the most frequently changed parts of an application:
  • configuration flags: application specific properties which allow for switchable behaviour of the system at runtime. A trivial example would be a flag testMode which would be switched off in production.
  • UI templates: without any changes to the structure of the application, changing these can change the way the application appears.
  • infrastructure configuration: here we're talking about resources such as database connection pools, which existing independent from any application classes.
  • business rules: parts of the application which carry out the business logic of the application. These can, for example, be changed without having to change the domain model of the application.
  • domain model: we're moving closer to the root of the dependency graph here - changes to domain model objects typically can have downstream effects on all of the items listed above.
  • shared utility classes: these are units of code which are shared by different parts of the application, that don't relate directly to the domain model or business processes of the application. Since they haven't been packaged into separate third party libraries, they are technically still part of the application.
  • third party libraries: these tend to change much less often than the artifacts of the application itself.
Impala recognises the different life cycles of the different types of artifacts within an application. For example:
  • it is possible to reload the core of the application (domain model plus shared utility classes) and all of the business components without reloading any of the third party classes. This is important, because one of the things that takes Java apps so long to start is the need to load classes from third party libraries. Typically, the number of third party classes used, directly or indirectly, is much larger than the number of application classes.
  • it is possible to reload one of the business components without reloading the application core or any other business components
  • it will be possible to reload infrastructure configurations without reloading the core of the application, and vice versa. Note that the latter is possible because infrastructure components don't depend directly on the core application classes.
  • it is possible to reload tests without having to reload any of the application classes they are testing. This can dramatically cut down the time to write integration tests.
  • configuration flags and UI templates are less of a challenge to reload dynamically. The former can be done via reloadable configuration files, while the latter is usually supported by good web frameworks or servlet containers.
Micro hot deployment is about finding ways to minimise the granularity of artifact reloading, so that only artifacts that have changed, and those which depend on them, have to reload when changes are made. The benefits for improving developer productivity are obvious, and their are important potential benefits for live updates of in deployment environment, too.






Friday, September 14, 2007

Impala's non-Maven approach to simpler builds

One of the goals of Impala is to have a pure work out-the-box feeling. If you download the distribution, you should be able to set up a new project with just one or two commands. Once you've done this, the project structure should be ready for you. The build infrastructure should be just there. As long as you obey the project structure conventions, then you should be able to plug in an existing build system into your new project.

All of this is behind the ideas that drive Maven. Maven defines a standardised folder locations, and an existing build infrastructure which you can just plug in and use. All of these ideas are great, but I don't want the project to depend on Maven. I'm still trying to decide whether I should make the project structure conventions conform to those of Maven. The advantage is that Maven users would be able to simply Mavenize their project by adding a POM xml. The disadvantage is personal - I don't particularly like the Maven project structure conventions. I wouldn't have chosen them for myself.

Right now I'm pretty close to having a pure out-of-the-box ANT based build system ready for Impala. It's taken quite a lot of time, but it's starting to feel much more right. Basically, the build will be enabled using the following combination:
  • a build.xml in the project root directory. The build.xml needs to have a property called impala.home, which defines where the Impala install files have been dumped to on the file system
  • a set of other project-specific properties which need to be specified, either within the build.xml itself a properties file
  • a set of imports of build scripts sitting in the impala.home folder
Here's an example:

<?xml version="1.0"?>
<project name="Build" basedir=".">

<property name = "workspace.root" location = "..">
<property name = "impala.home" location = "${workspace.root}/../impala">
<property file = "build.properties">
<import file = "${impala.home}/project-build.xml">
<import file = "${impala.home}/download-build.xml">

</project>
Note that the clean, compile, jar and test targets typical in a build system are found in project-build.xml. This file in turn relies on your project structure conventions to find the resources it needs. Similarly, adding download-build.xml adds support for obtaining dependencies, for example from a Maven ibiblio repository. You can make this build file the master build file for a multi-project build, simply by adding an import to shared-build.xml and adding the project.list property, as shown in this example:
<?xml version="1.0"?>
<project name="Build" basedir=".">

<property name = "workspace.root" location = ".."/>
<property name = "impala.home" location = "${workspace.root}/../impala"/>

<echo>Project using workspace.root: ${workspace.root}</echo>
<echo>Project using impala home: ${impala.home}</echo>

<property file = "build.properties"/>
<import file = "${impala.home}/project-build.xml"/>
<import file = "${impala.home}/shared-build.xml"/>
<import file = "${impala.home}/download-build.xml"/>
<import file = "${impala.home}/repository-build.xml"/>

<target name = "get" depends = "shared:get"/>
<target name = "fetch" depends = "repository:fetch-impala"/>
<target name = "clean" depends = "shared:clean"/>
<target name = "dist" depends = "shared:all-no-test"/>
<target name = "test" depends = "shared:test"/>

</project>
with the extra property in build.properties:
project.list=wineorder,wineorder-hibernate,\
wineorder-dao,wineorder-merchant,wineorder-web
I'm looking forward to getting all of this work done, so I can get back to what Impala is really supposed to be doing, which is supporting dynamic Spring modules. But all of this functionality is terribly important for making the whole experience as painless as possible for end users.

Thursday, September 13, 2007

Simple dependency management, Impala style

I've been spending a bit of time working on a simple dependency management system for Impala. You're probably asking: why don't you just use something like Maven or Ivy? Aren't they supposed to be doing that job for you?

Well, the problem is that I've decided I don't want transitive dependency management. What I do want is a simple way for users of the project (including myself, of course) can easily find and download the dependencies they do need. In a funny way, Maven does come to the rescue. It defines a standard format for repository jar entries, which goes like this:
[base url]/[organisation name]/artifact name]/[version]/[artifact name]-[version].jar
For example, I can hold of the Spring framework jar into using the URL
http://ibiblio.org/pub/packages/maven2/org/springframework/spring/2.0.6/spring-2.0.6.jar

Thankfully, Maven also defines a standard format for source jars.
[base url]/[organisation name]/artifact name]/[version]/[artifact name]-[version]-sources.jar
Okay, so I don't want transitive dependencies, then what do I want?
  • an easy reliable way to get hold of dependencies without having to go to each and every individual project web sites
  • source jars that match the downloaded binaries, so that I can easily attach source for debugging
  • a simple way of saying where the downloaded files should go
Impala uses the concept of a project-specific repository. A project specific repository will have a simple structure consisting of parent folders and subfolders as shown below:



Now, all I need is a simple mechanism which says how individual artifacts are added to this repository. The mechanism Impala uses is a simple text file, which has entries such as below:

main from commons-logging:commons-logging:1.1
main from commons-io:commons-io:1.3
main from log4j:log4j:1.2.13
main from org.springframework:spring:2.0.6 source=true
main from cglib:cglib-nodep:2.1_3
test from junit:junit:3.8.1
test from org.easymock:easymock:2.2
test from org.easymock:easymockclassextension:2.2
For each artifact, I say optionally whether I want to include source (overriding whatever the default setting happens to be).

I still need to do the work of figuring out what the dependencies are, but once I've got there, life is pretty peachy. You get the best of both worlds - simplicity and control.

I still need to do some tweaks to get the mechanism fully ship shape, but it's basically working pretty well. Another feature is that you can specify multiple source locations, including your local Maven repository, so if the artifact you need happens to be lurking on your file system from a previous download, you don't need to waste time trying to get it from the net.

Wednesday, September 5, 2007

Impala, ANT and transitive dependencies

Any project needs a build environment. For a project like Impala, which is trying to provide a simpler environment for Spring development, the choice of build tool is important.

ANT has been around for an awful long time, but it's not exactly the most sexy software around these days. For a while it was the only show in town. Nobody likes writing build scripts, and the one thing that ANT does make you do is write build scripts for your projects.

These days, the obvious other choice is Maven. Maven's two selling points are thus: it saves you from having to write build scripts, and it is supposed to handle dependency management for you. Of course, it cannot do everything. You need to tell Maven what libraries you want to use. Maven in turn will find a whole bunch of other libraries that you don't necessarily know about, which the libraries that you do want to use themselves depend on. This is called transitive dependency management. The problem, though, is twofold. Maven relies on dependency information put in POM XML files. This information is only as good as the person editing the POM. It's not foolproof. Secondly, even if the POM is accurate, it may be more complete than it needs to be. How does it deal with optional dependencies? If these are included in the POM, but not needed for the parts of the library that you are using, then you get a bloated repository. And the one thing I can't stand is bloat.

There are other things I don't like about Maven. Frankly, it tries to do too much, without necessarily doing anything very well. Lots of people I speak to have complained about this. It entails a lack of control over your build environment, which scares me. The horror stories you hear about don't seem to happen with ANT. It's a pain, but one way or another, you know you are going to be able to get the job done.

A second option is Gant, based on Groovy. Gant is built on ANT too, but uses a Groovy DSL to make ANT scripting easier. It seems much better for complex scripting, because you can use proper language features such as iteration and conditionals much more easily than with ANT. Two problems with Gant, though. There's more setup involved, because it requires Groovy and Gant to be installed (as well as ANT, perhaps). Also, it's a bit slow. Groovy is much slower to compile than Java, which has a noticeable impact on build times.

For these reasons, I'm back to ANT. If I need to do anything complex, like handle dependency management, or iterate through projects, I'll write a custom task for that. It's easy enough. ANT is not pretty, but it is effective.