Why Build Tools Matter for a Well-Grounded Developer
From The Well-Grounded Java Developer, Second Edition by Benjamin Evans, Jason Clark, and Martijn Verburg
The JDK ships with a compiler to turn Java source code into class files. Despite that fact, few projects of any size rely just on javac. Build tools are the norm for a number of reasons:
- Automating tedious operations
- Managing dependencies
- Ensuring consistency between developers
Although many options exist, two choices dominate the landscape today: Maven and Gradle. Understanding what these tools aim to solve, digging below the surface of how they get their job done, and understanding the differences between them — and how to extend them — will pay off for the well-grounded developer.
Automating tedious operations
javac can turn any Java source file into a class file, but there’s more to building a typical Java project than that. Just getting all the files properly listed to the compiler could be tedious in a large project if done by hand. Build tools provide defaults for finding code and let you easily configure if you have a non-standard layout instead.
The default layout popularized by Maven and used by Gradle as well looks like this:
├── main ❶
│ └── java ❷
│ └── com ❸
│ └── wellgrounded
│ └── Main.java
test separate our production code from our test code
❷ Multiple languages easily coexist within one project with this structure
❸ Further directory structure typically mirrors your package hierarchy
As you can see, testing is baked all the way into the layout of our code. Java’s come a long way since the times when folks used to ask whether they really needed to write tests for their code. The build tools have been a key part in making testing available in a consistent manner everywhere.
You probably already know about how to unit test in Java with JUnit or another library.
While compiling to class files is the start of a Java program’s existence, generally it isn’t the end of the line. Fortunately, build tools also provide support for packaging your class files up into a JAR or other format for easier distribution.
In the early days of Java if you wanted to use a library, you had to find its JAR somewhere, download the file, and put it into the classpath for your application. This caused several problems — in particular the lack of a central, authoritative source for all libraries meant that a treasure hunt was sometimes necessary to find the JARs for less-common dependencies.
That obviously wasn’t ideal, and so Maven (among other projects) gave the Java ecosystem repositories where tools could find and install dependencies for us. Maven Central remains to this day one of the most commonly used registries for Java dependencies on the internet.
Downloading all that code can be time-consuming too, so build tools have standardized on a few ways of reducing the pain by sharing artifacts between projects. With a local repository to cache, if a second project needs the same library it doesn’t need to be downloaded again. This approach also saves disk space, of course, but the single source of artifacts is the real win here.
You might be wondering where modules fit in this dependency landscape. Modularized libraries are shipped as JAR files with the addition of the
module-info.class file. A modularized JAR can be downloaded from the standard repositories. The real differences come into play when you start compiling and running with modules – not in the packaging and distribution.
More than just providing a central place to find and download dependencies, though, registries opened the door for better management of transitive dependencies.
In Java, we commonly see this situation when a library that our project uses itself depends on another library.
Recall that JAR files are just zips — they don’t have any metadata that describes the dependencies of the JAR. This means that the dependencies of a JAR are just the union of all of the dependencies of all the classes in the JAR.
To make matters worse, the classfile format does not describe which version of a class is needed to satisfy the dependency — all we have is a symbolic descriptor of the class or method name that the class requires in order to link.
This implies two things:
- An external source of dependency information is required
- As projects get larger, the transitive dependency graph will get increasingly complex
With the explosion of open source libraries and frameworks to support developers the typical tree of transitive dependencies in a real project has only gotten larger and larger.
Java on the other hand, has a runtime library (the JRE) that contains a lot of commonly needed classes — and this is available in every Java environment. However, a real production application will require capabilities beyond those in the JRE — and will almost always have too many layers of dependencies to comfortably manage manually. The only solution is to automate.
A conflict emerges
This automation is a boon for developers building on the rich ecosystem of open source code available, but upgrading dependencies often reveals problems as well. For instances, here’s a dependency tree which might set us up for trouble:
We’ve asked explicitly for version 2.0 of
lib-a, but our dependency
lib-b has asked for the older version 1.0. This is known as a dependency conflict and depending on how it is resolved, it can cause a variety of other problems.
What types of breakage can result from mismatched library versions? This depends on the nature of the changes between the versions. Changes fall into a few categories:
- Stable APIs where only behavior changes between versions
- Added APIs where new classes or methods appear between versions
- Changed APIs where method signatures or interfaces extended change between versions
- Removed APIs where classes or methods are removed between versions
In the case of a) or b), you may not even notice which version of the dependency your build tool has chosen.
The most common case of c) is a change to the signature of a method between library versions. In our example above, if
lib-a 2.0 altered the signature of a method that
lib-b relied upon, when
lib-b tried to call that method it would receive a
Removed methods in case d) would result in the same sorts of
NoSuchMethodError. This includes “renaming” a method, which at the bytecode level isn’t any different from removing a method and adding a new one that just happens to have the same implementation.
Classes are also prone to d) on deletion or renaming, and will cause a
NoClassDefFoundError. It’s also possible removal of interfaces from a class could land you with an ugly
This list of issues with conflicting transitive dependencies is by no means exhaustive. It all boils down to what actually changes between two versions of the same package.
In fact, communicating about the nature of changes between versions is a common problem across languages. One of the most broadly adopted approaches to the handling the problem is Semantic Versioning. Semantic versioning gives us a vocabulary for stating requirements of our transitive dependencies, which in turn allows the machines to help us sorting them out.
When using semantic versioning:
- MAJOR version increments (1.x → 2.x) on breaking changes to your API, like cases c) and d) above
- MINOR version increments (1.1 → 1.2) on backwards compatible additions like case b)
- PATCH increments on bug fixes (1.1.0 → 1.1.1).
While not foolproof, it at least gives expectation to what level of changes come with a version update, and is broadly used in open source.
Having gotten a taste of why dependency management isn’t easy, rest assured that both Maven and Gradle provide tooling to help. Later in the article we’ll look in detail at what each tool provides to unravel problems when you hit dependency conflicts.
Ensuring consistency between developers
As projects grow in volume of code and developers involved, they often get more complex and harder to work with. Your build tooling can lessen this pain, though. Built-in features like ensuring everyone is compiling and running the same tests are a start. But there’s many additions beyond the basics to consider as well.
Tests are good, but how certain are you that all your code is tested? Code coverage tools are key for detecting what code is hit by your tests and what isn’t. While arguments swirl on the internet about the right target for code coverage, the line-level output coverage tools provide can save you from missing a test for that one extra special conditional.
Java as a language also lends itself well to a variety of static analysis tools. From detecting common patterns (i.e. overriding
equals without overriding
hashCode) to sniffing out unused variables, static-analysis lets a computer validate aspects of the code that are legal but will bite you in production.
Beyond the realms of correctness, though, there are style and formatting tools. Ever fought with someone about where the curly braces should go in a statement? How to indent your code? Agreeing once to a set of rules, even if they aren’t all perfectly to your taste, lets you focus forever after in the project on the actual work instead of nitpicking details about how the code looks.
Last and certainly not least, your build tool is a pivotal central point for providing custom functionality. Are there special setup or operational commands folks need to run periodically for your project? Validations your project should run after a build but before you deploy? All of these are excellent to consider wiring into the build tooling so they’re available to everyone working with the code. Both Maven and Gradle provide many ways to extend them for your own logic and needs.
Hopefully you’re now convinced that build tools aren’t just something to set up once on a project, but worth investment in understanding.
That’s all for this article. If you want to learn more about the book, check it out on Manning’s liveBook platform here.