I’m writing a tool to verify requests/responses against RAML definitions. Basically, it’s a proxy that analyzes the traffic going through it. To start it up, it always took several seconds.

It’s written in java and does networking stuff, so I thought there’s not much to do in this regard. But using it as command line tool and writing more and more tests, the wait times grew and got more and more annoying.

So I had a closer look at it.

A quick profiler session revealed that most of the time was spent in the first call to InetAddress.getLocalHost. Almost exactly 5 seconds. More than 80% of the startup time was spent in this single method! To fix this problem, there were two possibilities:

  • Find and fix the cause of the slowdown.
  • Avoid this method being called.

I spent a whole (sunny) afternoon (on my balcony) trying in both directions.

Unfortunately, google did not know a lot about this issue. The only hints were some IPv4 vs. IPv6 problems. The preferred IP version can be set using a JVM parameter. I tried it and indeed, the behavior was different: It was now another internal method call that took 5 seconds. I also found that some caching is used for the lookup, but this didn’t help either because the first call caused the problem. I had still no idea what caused the problem.

So why not avoid the method being called? These are the libraries that used it:

  • Logback to show debugging information.
  • Tomcat as a source of entropy for its secure random number generator.
  • SSL factory for the same reason.
  • Jetty.

I just patched the involved class in Logback and thought about how to do it in a clean and secure way. I ended up putting the exact version of Logback in the pom.xml together with a comment that whenever the version is changed, the patch must be reviewed. Good.

Tomcat has a configuration item to change the implementation of its secure random number generator. Changed it, and it worked. Good.

Now the SSL factory. I never understood why the java security stuff has such an unintuitive API. But I was working through it and making progress, when suddenly…

the problem disappeared!

The method call was now fast and the total startup time was below one second. I was not aware of anything that changed. The issue just went away and has never come back since.

WAT.

Something is rotten in the state of Denmark. But I don’t know what. Is it a Java or Mac OS or Network issue, or a strange combination of all?

I suspect it could indeed be a network issue. Because lately, my mobile phone cannot use the WLAN after some time. The only thing that helps then is a complete restart. But then again, it could be because I updated to android 5.0…

Arg, sometimes it would be easier to work with wood.