This post was originally published on Variant’s blog.
So I was reading the C# 9 announcement after an extended break from coding. Lots of cool stuff in the new version of course, I am sure a lot of people are excited about Records, among other things.
But one thing especially caught my eye, and that was the new logical and relational operators for pattern matching.
Pattern matching is best known from functional languages, and it allows you to look at the “shape” of some value to see if it matches a “pattern”, and also extract information from that value. C# has gotten a lot of functional-esque goodness over the last few versions, and version 9 is no exception.
The new logical operators for pattern matching are and, or and not. In short, the two first operators allow you to combine multiple patterns in such a way as to form a new pattern, or negate a pattern with not. Some probably won’t like these keywords, but I believe you can argue their place in the language1.
The relational operators, on the other hand, are the same ones you would use in any if-statement where you would compare sizes, normally numbers. That is to say <, >, <=, and >= can now be used in patterns. With C# 9 you can do a switch on a number like num switch { > 10 => true, _ => false }, and in this case > 10 would be the relational pattern in the switch statement.
The bleeding edge
Anyway, eschewing a discussion on how much people hate new keywords, or if C# needs more of them. I still think more powerful patterns like this is a good thing for the language. Since these new features caught my eye, and I like living on the bleeding edge, I wanted to test them out. But how can I test features which hasn’t even been included in the nightly build yet?
Since all development for the Roslyn-compiler is done in the open, you can actually compile the compiler, and play with some of the features yourself! I had trouble using the compiler directly though:
> dotnet ~/Dev/roslyn/artifacts/bin/csc/Debug/netcoreapp3.1/csc.dll /langVersion:preview Program.cs
Program.cs(2,7): error CS0246: The type or namespace name 'System' could not be found (are you missing a using directive or an assembly reference?)
It turns out that the Roslyn-compiler is not supposed to be used in this manner. It cannot automatically find and link in necessary DLLs in your program code.
But I could use the C# Interactive which is also built along with the compiler, and had a lot more success trying to run a C#-script with that:
> dotnet ~/Dev/roslyn/artifacts/bin/csi/Debug/netcoreapp3.1/csi.dll /langVersion:preview Program.csx
Hello from the script!
This enabled me to play with the new pattern matching features!
The use case
Immediately as I read the C# 9 announcement, I began thinking about a solution I worked on a few years ago, where we had to parse Norwegian social security numbers (ssn) for a registration process.
In Norway we have a few convoluted rules for encoding information into the ssn, which involved a lot of ifs and elses in order to actually parse the information in a C# implementation.
Given the 11 digit ssn 28050389210, the first 6 digits (280503) encodes the birth date, while the last 5 digits (89210) is called the personal number.
So if we want to get the birth date, it’s just a matter of parsing 280503 right? Well, what century is the final digits in though? Is it 1903, or 2003?
This is where the personal number (89210) comes in, because the three first digits (892) there encodes extra information about which year the person was born. The rules are as follows:
- 500–749 denotes the years 1854 till 1899
- 000–499 denotes the years 1900 till 1999
- 900–999 denotes the years 1940 till 1999
- 500–999 denotes the years 2000 till 2039
So looking at this list of rules, 892 hits #4, which means indeed the person was born in 2003.
The implementation
In the olden days, I would probably code this something like:
public static int GetCenturyOffset(int individualNumber)
{
if (individualNumber >= 500 && individualNumber <= 749)
return 1800;
else if (individualNumber >= 0 && individualNumber <= 499)
return 1900;
else if (individualNumber >= 900 && individualNumber <= 999)
return 1900;
else if (individualNumber >= 500 && individualNumber <= 999)
return 2000;
else
throw new ArgumentException("Invalid individual number");
}
Which is fine — but it’s sort of difficult to read through and grasp the rules? With the new relational and logical operators for patterns, we could write this a bit more tersely:
public static int GetCenturyOffset(int individualNumber) => individualNumber switch
{
>= 500 and <= 749 => 1800,
>= 0 and <= 499 => 1900,
>= 900 and <= 999 => 1900,
>= 500 and <= 999 => 2000,
_ => throw new ArgumentException("Invalid individual number")
};
I’m not arguing that terse code is better, far from it. But I do find it easier to read a pattern which states that individualNumber has to be >= 500 and <= 749, rather than an expression like individualNumber >= 500 && individualNumber <= 749. On top of that the expressions are nested in if-statements, which I find decreases readability. I’m quite sure not everybody would agree with me here though.
More than one kind
You also have several flavors of ssn’s, apart from the regular kind, you have D-, H-, and FH-numbers. These have their own rules, depending on the value of the first or third digit in the ssn. These rules are not especially difficult, and you can express them quite elegantly with pattern matching:
public enum SsnType { Regular, DNumber, HNumber, FhNumber }
public static SsnType GetSsnType(string ssn)
{
var first = int.Parse(ssn[0].ToString());
var third = int.Parse(ssn[2].ToString());
return (first, third) switch
{
(>= 8, _) => SsnType.FhNumber,
(>= 4 and <= 7, _) => SsnType.DNumber,
(_, >= 4 and <= 5) => SsnType.HNumber,
_ => SsnType.Regular
};
}
The above rules might not make much sense without more context about why you have these sort of numbers. But that’s not important, the point is that the numbers encode more information than you might think, and these lend themselves well to pattern matching code.
More readable?
I’ve also written code for parsing ssn’s for other countries like Sweden, Denmark2 and Finland. And even though the Nordic countries are culturally similar and share borders, naturally each of them have their own convoluted rules for creating and parsing ssn’s.
Since these are distinct, there’s not much ability to share code between the different implementations, which usually makes them verbose and less maintainable.
Being able to express such rules in a succinct and readable manner is a strength, and I think the new pattern matching operators are a step in the right direction for problems such as these.