Tools to keep JavaDoc up-to-date
There are many projects where the documentation is not up-to-date. It is easy to forget to change the documentation after the code was changed. The reason is fairly understandable. There is a change in the code, then debug, then hopefully change in the tests (or the other way around in the reverse order if you are more TDD) and then the joy of a new functioning version and the happiness about the new release makes you forget to perform the cumbersome task updating the documentation.
In this article, I will show an example, how to ease the process and ensure that the documentation is at least more up-to-date.
1. The tool
The tool I use in this article is Java::Geci, which is a code generation framework. The original design aim of Java::Geci is to provide a framework in which, it is extremely easy to write code generators that inject code into already existing Java source code or generate new Java source files. Hence the name: GEnerate Code Inline or GEnerate Code, Inject.
What does a code generation support tool do when we talk about documentation?
On the highest level of the framework, the source code is just a text file. Documentation, like JavaDoc, is text. Documentation in the source directory structure, like markdown files, is text. Copying and transforming parts of the text to other location is a special form of code generation. This is exactly what we will do.
2. Two uses for documentation
There are several ways Java::Geci supports documentation. I will describe one of these in this article.
The way is to locate some lines in the unit tests and copy the content after possible transformation into the JavaDoc. I will demonstrate this using a sample from the apache.commons.lang
project current master version after release 3.9. This project is fairly well documented although there is room for improvement. This improvement has to be performed with as little human effort as possible. (Not because we are lazy, but rather because the human effort is error-prone.)
It is important to understand that Java::Geci is not a preprocessing tool. The code gets into the actual source code and it gets updated. Java::Geci does not eliminate the redundancy of copy-paste code and text. It manages it and ensures that the code remains copied and created over and over again whenever something inducing change in the result happens.
3. How Java::Geci works in general
If you have already heard about Java::Geci you can skip this chapter. For the others here is the brief structure of the framework.
Java::Geci generates code when the unit tests run. Java::Geci actually runs as one or more unit tests. There is a fluent API to configure the framework. This essentially means that a unit test that runs generators is one single assertion statement that creates a new Geci
object, calls the configuration methods and then calls generate()
. This method, generate()
returns true when it has generated something. If all the code it generated is exactly the same as it was already in the source files then it returns false
. Using an Assertion.assertFalse
around it will fail the test in case there was any change in the source code. Just run the compilation and the tests again.
The framework collects all the files that were configured to be collected and invokes the configured and registered code generators. The code generators work with abstract Source
and Segment
objects that represent the source files and the lines in the source files that may be overwritten by generated code. When all the generators have finished their work the framework collects all segments, inserts them into Source
objects and if any of them significantly changed then it updates the file.
Finally, the framework returns to the unit test code that started it. The return value is true
if there was any source code file updated and false
otherwise.
4. Examples into JavaDoc
The JavaDoc example is to automatically include examples into the documentation of the method org.apache.commons.lang3.ClassUtils.getAbbreviatedName()
in the Apache Commons Lang3 library. The documentation currently in the master
branch is:
/**
*
Gets the abbreviated class name from a {@code String}.
*
*
The string passed in is assumed to be a class name - it is not checked.
*
*
The abbreviation algorithm will shorten the class name, usually without
* significant loss of meaning.
*
The abbreviated class name will always include the complete package hierarchy.
* If enough space is available, rightmost sub-packages will be displayed in full
* length.
*
*
**
*
*
*
*
*
<table><caption>Examples</caption>
<tbody>
<tr>
<td>className</td>
<td>len</td>
<td>return</td>
<td>null</td>
<td>1</td>
<td>""</td>
<td>"java.lang.String"</td>
<td>5</td>
<td>"j.l.String"</td>
<td>"java.lang.String"</td>
<td>15</td>
<td>"j.lang.String"</td>
<td>"java.lang.String"</td>
<td>30</td>
<td>"java.lang.String"</td>
</tr>
</tbody>
</table>
* @param className the className to get the abbreviated name for, may be {@code null}
* @param len the desired length of the abbreviated name
* @return the abbreviated name or an empty string
* @throws IllegalArgumentException if len <= 0
* @since 3.4
*/
The problem we want to solve is to automatize the maintenance of the examples. To do that with Java::Geci we have to do three things:
-
Add Java::Geci as a dependency to the project
-
Create a unit test that runs the framework
-
Mark the part in the unit test, which is the source of the information
-
replace the manually copied examples text with a Java::Geci
Segment
so that Java::Geci will automatically copy the text from the test there
4.1. Dependency
Java::Geci is in the Maven Central repository. The current release is 1.2.0
. It has to be added to the project as a test dependency. There is no dependency for the final LANG library just as there is no dependency on JUnit or anything else used for the development. There are two explicit dependencies that have to be added:
com.javax0.geci
javageci-docugen
1.2.0
test
com.javax0.geci
javageci-core
1.2.0
test
The artifact javageci-docugen
contains the document handling generators. The artifact javageci-core
contains the core generators. This artifact also bring the javageci-engine
and javageci-api
artifacts. The engine is the framework itself, the API is, well the API.
4.2. Unit test
The second change is a new file, org.apache.commons.lang3.docugen.UpdateJavaDocTest
. This file is a simple and very conventional Unit tests:
/*
* Licensed to the Apache Software Foundation (ASF) ...
*/
package org.apache.commons.lang3.docugen;
import *;
public class UpdateJavaDocTest {
@Test
void testUpdateJavaDocFromUnitTests() throws Exception {
final Geci geci = new Geci();
int i = 0;
Assertions.assertFalse(geci.source(Source.maven())
.register(SnippetCollector.builder().files("\\.java$").phase(i++).build())
.register(SnippetAppender.builder().files("\\.java$").phase(i++).build())
.register(SnippetRegex.builder().files("\\.java$").phase(i++).build())
.register(SnippetTrim.builder().files("\\.java$").phase(i++).build())
.register(SnippetNumberer.builder().files("\\.java$").phase(i++).build())
.register(SnipetLineSkipper.builder().files("\\.java$").phase(i++).build())
.register(MarkdownCodeInserter.builder().files("\\.java$").phase(i++).build())
.splitHelper("java", new MarkdownSegmentSplitHelper())
.comparator((orig, gen) -> !orig.equals(gen))
.generate(),
geci.failed());
}
}
What we can see here is huge Assertions.assertFalse
call. First, we create a new Geci
object and then we tell it where the source files are. Without getting into the details, there are many different ways how the user can specify where the sources are. In this example, we just say that the source files are where they usually are when we use Maven as a build tool.
The next thing we do is that we register the different generators. Generators, especially code generators usually run independent and thus the framework does not guarantee the execution order. In this case, these generators, as we will see later, very much depend on the actions of each other. It is important to have them executed in the correct order. The framework let us achieve this via phases. The generators are asked how many phases they need and in each phase, they are also queried if they need to be invoked or not. Each generator object is created using a builder pattern and in this, each is told which phase it should run. When a generator is configured to run in phase i
(calling .phase(i)
) then it will tell the framework that it will need at least i
phases and for phases 1..i-1
it will be inactive. This way the configuration guarantees that the generators run in the following order:
-
SnippetCollector
-
SnippetAppender
-
SnippetRegex
-
SnippetTrim
-
SnippetNumberer
-
SnipetLineSkipper
-
MarkdownCodeInserter
Technically all these are generators, but they do not "generate" code. The SnippetCollector
collects the snippets from the source files. SnippetAppender
can append multiple snippets together, when some sample code needs the text from different parts of the program. SnippetRegex
can modify the snippets before using regular expressions and replaceAll functionality (we will see that in this example). SnippetTrim
can remove the leading tabs and spaces from the start of the lines. This is important when the code is deeply tabulated. In this case, simply importing the snipped into the documentation could easily push the actual characters off of the printable area on the right side. SnippetNumberer
can number snippet lines in case we have some code where the documentation refers to certain lines. SnipetLineSkipper
can skip certain lines from the code. For example, you can configure it so that the import statements will be skipped.
Finally, the real "generator" that may alter the source code is MarkdownCodeInserter
. It was created to insert the snippets into the Markdown-formatted files, but it works just as well for Java source files when the text needs to be inserted into a JavaDoc part.
The last two but one configuration calls tell the framework to use the MarkdownSegmentSplitHelper
and to compare the original lines and those that were created after the code generation using a simple equals
. SegmentSplitHelper
objects help the framework to find the segments in the source code. In Java files, the segments are usually and by default between
//
and
//
lines. This helps to separate the manual and the generated code. The editor-fold is also collapsible in all advanced editor so you can focus on the manually created code.
In this case, however, we insert into segments that are inside JavaDoc comments. These JavaDoc comments more like Markdown than Java in the sense that they may contain some markup but also HTML friendly. Very specifically, they may contain XML comments that will not appear in the output document. The segment start in this case, as defined by the MarkdownSegmentSplitHelper
object is between
<!-- snip snipName parameters ... -->
and
<!-- end snip -->
lines.
The comparator has to be specified for a very specific reason. The framework has two comparators built-in. One is the default comparator that compares the lines one by one and character by character. This is used for all file types except Java. In the case of Java, there is a special comparator used, which recognizes when only a comment was changed or when the code was only reformatted. In this case, we are changing the content of the comment in a Java file, so we need to tell the framework to use the simple comparator or else it will not relaize we updated anything. (It took 30 minutes to debug why it was not updating the files first.)
The final call is to generate()
that starts the whole process.
4.3. Mark the code
The unit test code that documents this method is org.apache.commons.lang3.ClassUtilsTest.test_getAbbreviatedName_Class()
. This should look like the following:
@Test
public void test_getAbbreviatedName_Class() {
// snippet test_getAbbreviatedName_Class
assertEquals("", ClassUtils.getAbbreviatedName((Class<?>) null, 1));
assertEquals("j.l.String", ClassUtils.getAbbreviatedName(String.class, 1));
assertEquals("j.l.String", ClassUtils.getAbbreviatedName(String.class, 5));
assertEquals("j.lang.String", ClassUtils.getAbbreviatedName(String.class, 13));
assertEquals("j.lang.String", ClassUtils.getAbbreviatedName(String.class, 15));
assertEquals("java.lang.String", ClassUtils.getAbbreviatedName(String.class, 20));
// end snippet
}
I will not present here the original, because the only difference is that the two snippet …
and end snippet
lines were inserted. These are the triggers for the SnippetCollector
to collect the lines between them and store them in the "snippet store" (nothing mysterious, practically a big hash map).
4.4. Define a segment
The really interesting part is how the JavaDoc is modified. At the start of the article, I already presented the whole code as it is today. The new version is:
/**
* Gets the abbreviated class name from a {@code String}.
*
* The string passed in is assumed to be a class name - it is not checked.
*
* The abbreviation algorithm will shorten the class name, usually without
* significant loss of meaning.
* The abbreviated class name will always include the complete package hierarchy.
* If enough space is available, rightmost sub-packages will be displayed in full
* length.
*
*
*
* you can write manually anything here, the code generator will update it when you start it up
*
<table><caption>Examples</caption>
<tbody>
<tr>
<td>className</td>
<td>len</td>
<td>return</td>
<!-- snip test_getAbbreviatedName_Class regex="
replace='/~s*assertEquals~((.*?)~s*,~s*ClassUtils~.getAbbreviatedName~((.*?)~s*,~s*(~d+)~)~);/*
</tr><tr>
<td>{@code $2}</td>
<td>$3</td>
<td>{@code $1}</td>
</tr>
/' escape='~'" --><!-- end snip -->
</tbody>
</table>
* @param className the className to get the abbreviated name for, may be {@code null}
* @param len the desired length of the abbreviated name
* @return the abbreviated name or an empty string
* @throws IllegalArgumentException if len <= 0
* @since 3.4
*/
The important part is where the lines 15…20 are. (You see, sometimes it is important to number the snippet lines.) The line 15 signals the segment start. The name of the segment is test_getAbbreviatedName_Class
and when there is nothing else defines it will also be used as the name of the snippet to insert into. However, before the snippet gets inserted it is transformed by the SnippetRegex
generator. It will replace every match of the regular expression
\s*assertEquals\((.*?)\s*,\s*ClassUtils\.getAbbreviatedName\((.*?)\s*,\s*(\d+)\)\);
with the string
*
{@code $2}$3{@code $1}
Since these regular expressions are inside a string that is also inside a string we would need \\\\
instead of a single \
. That would make our regular expressions look awful. Therefore the generator SnippetRegex
can be configured to use some other character of our choice, which is less fence-phenomenon prone. In this example, we use the tilde character and it usually works. What it finally results when we run it is:
<!-- snip test_getAbbreviatedName_Class regex="
replace='/~s*assertEquals~((.*?)~s*,~s*ClassUtils~.getAbbreviatedName~((.*?)~s*,~s*(~d+)~)~);/*
<tr>
<td>{@code $2}</td>
<td>$3</td>
<td>{@code $1}</td>
</tr>
/' escape='~'" -->
*
{@code (Class) null}1{@code ""}
*
{@code String.class}1{@code "j.l.String"}
*
{@code String.class}5{@code "j.l.String"}
*
{@code String.class}13{@code "j.lang.String"}
*
{@code String.class}15{@code "j.lang.String"}
*
{@code String.class}20{@code "java.lang.String"}
<!-- end snip -->
5. Summary / Takeaway
Document updating can be automatized. At first, it is a bit cumbersome. Instead of copying and reformatting the text the developer has to set up a new unit test, mark the snippet, mark the segment, fabricate the transformation using regular expressions. However, when it is done any update is automatic. It is not possible to forget to update the documentation after the unit tests changed.
This is the same approach that we follow when we create unit tests. At first, it is a bit cumbersome to create unit tests instead of just debugging and running the code in an ad-hoc way and see if it really behaves as we expected, looking at the debugger. However, when it is done any update is automatically checked. It is not possible to forget to check an old functionality when the code affecting that changes.
In my opinion documentation maintenance should be as automatized as testing. Generally: anything that can be automatized in software development has to be automatized to save effort and to reduce the errors.
Comments imported from Wordpress
Béla Újházi 2019-09-18 08:40:41
This tool naming never gets old :D
Peter Verhas 2019-09-18 11:08:53
The naming may be a source of fun, but it is not the important point.
Also, the release 1.2.0 has a JVM8 backport that is available in the maven central repo with the same coordinates as the main release replacing the
javax0
tojavax1
in thegroupId
.
Handling repeated code automatically | Java Deep 2019-09-25 15:00:13
[…] Keep JavaDoc up-to-date […]
Comments
Please leave your comments using Disqus, or just press one of the happy faces. If for any reason you do not want to leave a comment here, you can still create a Github ticket.