Readable regular expressions with Kotlin
Regular expressions have the reputation of being hard to understand and maintain. Probably rightly so, or can you easily say what ^1?$|^(11+?)\1+$
matches?
But using some advanced features of JVM’s regular expressions implementation together with some of Kotlin’s language constructs can lead to quite elegant code.
As an example, we use a date parsing function.
val date = """(?<year>\d{4})-(?<month>\d{1,2})-(?<day>\d{1,2})"""
val time = """T(?<hour>\d{1,2}):(?<min>\d{1,2}):(?<sec>\d{1,2})(\.(?<ms>\d{3}))?"""
val pattern = Regex("$date($time)?")
fun parseDate(s: String): LocalDateTime? =
pattern.matchEntire(s)?.let { match ->
LocalDateTime.of(
match.num("year"), match.num("month"), match.num("day"),
match.num("hour"), match.num("min"), match.num("sec"), match.num("ms"))
}
private fun MatchResult.num(name: String) = this.groups[name]?.value?.toInt() ?: 0
- With Kotlin’s raw strings delimited by
"""
, it’s not necessary anymore to double all backslashes.\d
instead of\\d
- String templates make composing sub expressions more readable.
"$date($time)?"
instead ofdate + "(" + time + ")?"
- Named capturing groups
(?<year>\d{4})
make it a lot easier to access parts of the match and document their meaning. Traditionally, capturing groups are accessed numerically by their location inside the regular expression. This rises questions how optional or nested groups are counted. And when an expression is changed, the counting can be changed and wrong groups are referenced. - The
let
expression is a more concise form of null checking. The block afterlet
is only executed if the expression to the left of it is not null. - The extension function
MatchResult.num
“adds” the methodnum
to the classMatchResult
which allows for nicer code.match.num("year")
instead ofnum(match, "year")
- The null safe call operator
?.
avoids a lot of manual null checks.a?.b
instead ofa == null ? null : a.b
The code presented here might look a little strange to the classically Java trained eye, but I think it’s safer and easier to write and understand than a traditional Java solution would be.
By the way, the regular expression from the beginning of this article matches prime numbers!
Update: I just learned that it’s also possible to include comments into regular expressions. So you could write something like this:
val pattern = Regex("""
(?<year>\d{4})-(?<month>\d{1,2})-(?<day>\d{1,2}) #date part
(T(?<hour>\d{1,2}):(?<min>\d{1,2}):(?<sec>\d{1,2})(\.(?<ms>\d{3}))?)? #optional time part
""", RegexOption.COMMENTS)