February, 2012

Some Tcl Notes

I swore I wouldn't bash Tcl when I wrote this, but as I got more and more irritated writing about Tcl issues, I decided, "why not".

It has been said, and I believe that it is absolutely true, that Tcl would have virtually no following at all today if it were not for Tk. Tk along with Tcl makes it seductively easy to throw together simple graphical interfaces. This is a trap, and will lead to suffering and misery.

I never intend to use Tcl to write any new code, and I encourage you to do likewise. I do need to modify and maintain large bodies of Tcl code, hence these notes.

Perhaps the most important thing to know about Tcl is that everything is stored as a string. This means that you many times do not need to put quotes around a string (leading to endless confusion about whether something is a variable or a string literal). Also all arguments to procedures are passed as strings, which makes passing arrays "interesting" (see below).

The Tcl interpreter is very simple and stupid (or at least it originally was, and that was sort of a design feature because the interpreter was small and embeddable). Now the interpreter is complex and stupid, the worst of both worlds. A key symptom of the brain damage is the lack of infix notation. Here is a code sample that initializes a variable, then increments it:

set a 0
set a [expr $a + 10]
This lets you know what you are in for. Notice the square brackets indicating an expression the interpreter needs to evaluate. Also notice that the variable "a" is used without the dollar sign prefix when it is receiving a value, but when the value of the variable is needed, the prefix is required.

Scoping, Upvar, Uplevel

Variable scoping in Tcl is an epic lesson in bad design. All variables are local by default. Each and every procedure that needs access to a global variable must have a "global" statement that lists that variable.

This scoping disaster is a major contributor to two bizarre warts in the Tcl language: "upvar" and "uplevel". A sensible person will wonder, "what are these and why are they needed" - and a sensible person with any experience with other programming languages (note that I am being generous here in calling Tcl a programming language) will make the observation that "nothing like this exists in any other language I have used." They are symptomatic of fundamentally flawed language design.

Upvar is typically used to access arrays that are passed by name to a procedure. We can invoke "mysub myarray" which passes the name of an array (in this case the string "myarray" to the subroutine. Then within the subroutine we set up a local alias to this array via the statements:

proc mysub array_arg {
    upvar $array_arg local_name
}
Once this is done we can use "local_name" within the routine just as if it was any old array, and it will be referencing the array "myarray" in the routine which called the function. Of course any decent language should (and does) take care of all this nicely without burdening the programmer with nonsense like this.

By default, upvar references variables one level up the stack (the immediate caller), but an optional first argument can be given to explicitly specify the context. A first argument of #0 indicates the global context (this is fairly common). But in order to keep the door open to utter chaos and insanity, other arguments are allowed; an argument of 2 indicates the caller of the caller for example, oh my!

Uplevel allows an argument (a "script") to be evaluated in the context of a different stack level. Like "upvar" the default context is the immediate caller (one level up the stack), but an optional first argument can specify an explicit level.

When used to access variables in the global context, upvar is pretty much like the global statement with the added opportunity to specify a local alias. When used to access variables in the default context of the calling routine, upvar is a patch on the brain-dead way that Tcl passes arguments as strings to procedures. Fundamentally harmless - albeit ugly. When used with some other argument, I can only imagine the bizarre side effects to an unsuspecting caller. Somebody thought this was clever, and no doubt some fools have written scripts that use this opportunity in some way, which of course means that this aspect of the language (which ought to be deprecated) must remain.

Strings and puts

All variables in Tcl store their values as strings. To inject the contents of variables into a string, do this:
set msg "I like $count dogs for $days days so far"
To print a string, use puts. This sounds simple, but Tcl does its own style of overloading on puts. If it has one argument, the output goes to stdout. It is has two arguments, the first is the file descriptor. This leads to mistaken usage like:
puts Hello World
Here puts sees two arguments, tries to use the first (the string Hello) as a file descriptor, and gets upset. Quotes are the answer here. It would be better if Tcl always required strings to be surrounded with quotes, but it just ain't so.

Tcl namespaces

Here we are making a silk purse out of a sows ear. I have maintained (and still do) that any Tcl script longer than 100 lines of code is an out of control disaster. But there are people out there who are busy writing Tcl modules and worrying about code reuse. To try to bring some order into one aspect of the chaos, they have invented Tcl namespaces.

Tcl has a namespace scheme which, as all namespace schemes, is designed to avoid the collisions that result when people write packages with commonly (or not so commonly) used names. This is one step beyond the traditional practice of prefixing all functions in a collection with the collection name (a prefix like "math_" would be used for a math library).

The official Tcl namespace scheme supports a hierarchy of namespaces, with namespaces nested within namespaces to an arbitrary depth (fostering rather than avoiding confusion in my opinion). Most people intentionally use only one level of namespaces and nesting beyond this is incidental. The top level namespace in Tcl is denoted by an empty string and is known as the global namespace. Two colons "::" trigger the Tcl namespace mechanism, so that animals::dog invokes the function "dog" in the "animals" namespace. Near as I can tell this is identical to ::animals::dog which explicitly anchors to the global namespace at the top level. At this point in my understanding I am not clear as to why it is necessary or beneficial to do this.