Regular expressions have the reputation of being hard to understand and maintain. Probably rightly so, or can you easily say what ^1?$|^(11+?)\1+$ matches?

But using some advanced features of JVM’s regular expressions implementation together with some of Kotlin’s language constructs can lead to quite elegant code.

As an example, we use a date parsing function.

val date = """(?<year>\d{4})-(?<month>\d{1,2})-(?<day>\d{1,2})"""
val time = """T(?<hour>\d{1,2}):(?<min>\d{1,2}):(?<sec>\d{1,2})(\.(?<ms>\d{3}))?"""
val pattern = Regex("$date($time)?")

fun parseDate(s: String): LocalDateTime? =
    pattern.matchEntire(s)?.let { match ->
            match.num("year"), match.num("month"), match.num("day"), 
            match.num("hour"), match.num("min"), match.num("sec"), match.num("ms"))

private fun MatchResult.num(name: String) = this.groups[name]?.value?.toInt() ?: 0
  • With Kotlin’s raw strings delimited by """, it’s not necessary anymore to double all backslashes.
    \d instead of \\d
  • String templates make composing sub expressions more readable.
    "$date($time)?" instead of date + "(" + time + ")?"
  • Named capturing groups (?<year>\d{4}) make it a lot easier to access parts of the match and document their meaning. Traditionally, capturing groups are accessed numerically by their location inside the regular expression. This rises questions how optional or nested groups are counted. And when an expression is changed, the counting can be changed and wrong groups are referenced.
  • The let expression is a more concise form of null checking. The block after let is only executed if the expression to the left of it is not null.
  • The extension function MatchResult.num “adds” the method num to the class MatchResult which allows for nicer code.
    match.num("year") instead of num(match, "year")
  • The null safe call operator ?. avoids a lot of manual null checks.
    a?.b instead of a == null ? null : a.b

The code presented here might look a little strange to the classically Java trained eye, but I think it’s safer and easier to write and understand than a traditional Java solution would be.

By the way, the regular expression from the beginning of this article matches prime numbers!

Update: I just learned that it’s also possible to include comments into regular expressions. So you could write something like this:

val pattern = Regex("""
    (?<year>\d{4})-(?<month>\d{1,2})-(?<day>\d{1,2}) #date part
    (T(?<hour>\d{1,2}):(?<min>\d{1,2}):(?<sec>\d{1,2})(\.(?<ms>\d{3}))?)? #optional time part
    """, RegexOption.COMMENTS)