October 2, 2024

C# and regular expressions (regex) - part 2

For a change I don't intend to put any C# code here, just talk about regex in general.

Some people say there are two types, POSIX and Perl. This is by no means true, one page claims that there are 418 regex flavors.

A regular expression is a miniature, but very powerful, language for matching strings or parts of strings. My editor of choice (vim) does searches using regex. Knowing about them has utility outside of programming languages.

Quick regex sampler

The simplest case is just to match a word like "bill" or "console" or "return".

The next thing you can do is to add an anchor.
Putting a "^" at the start of a regex anchors it to the start of a line.
Putting a "$" at the end of a regex anchors it to the end of a line.

So the regex "^sam" will match "sam is here", but not "I see sam".
Likewise the regex "sam$" will match "I see sam", but not "sam is here".

The next useful thing is "." which matches any character.
So "f..d" will match both "food" and "ford", but "flood" or "fad".

And the next thing to know about is "*" which says to match any number of whatever character (or character "thing") comes before it.
So "f.*d" will match "food", "ford", "flood" and "fad".

The last thing I will mention in this sampler is character classes. You put stuff inside of square brackets and they match a single character.
So "^[Ss]" will match any line with a big or little "s" at the start.

The pattern "[abc]*" will match any number of a, b, or c letters.
So it will match "a" or "aaaaaaaaa" or "abc" or "aabbcc" or "abcabc" and so on.

You can use a range inside of the brackets.
So "^[0-9]*" will match any number of digits at the start of a string. We are just getting warmed up. When I use vim to do searches, I often anchor a word to the start of a line and search for "^calc" if I have a function named "calc". This skips all the calls to calc in the middle of lines and in general takes me to the function definition. Vim also can do regex subtitution over the whole file, or parts of it, which can be very handy. I can do things like put a comment mark on the first of 100 lines of code by something like:

100,200s/^/#  /
At least I can if I am using python where the "#" is the comment character. But the point is that having the power of regular expressions handy in your editor is very handy and powerful -- and vim is by no means the only editor with such capability.

More about regex

There are a myriad of resources on regex online. I used to have the book "Mastering Regular Expressions" by Jeffrey Friedl (published by O'Reilly) but I have either misplaced it or I gave it away. The above is just a sampler of some basic things.


Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org