Tcl HomeTcl Home Hosted by
ActiveState

Google SiteSearch

Example: Command Line Tool

Here is a little program to convert files that have Windows-style line endings (carriage-return, line-feed or "CRLF") to UNIX style. The file format doesn't matter to Tcl, which can read either kind of file (Macintosh file format too.) However, sometimes you want to get rid of the extra carriage-return characters when working on a file that came from Windows.

First we present the program and a summary of the commands in it. After that we'll look at parts of the program in more detail.

Dos2unix

Download Example

#!/usr/local/bin/tclsh

# Dos2Unix
#	Convert a file to Unix-style line endings
#	If the file is a directory, then recursively
#	convert all the files in the directory and below.
#
# Arguments
#	f	The name of a file or directory.
#
# Side Effects:
#	Rewrites the file to have LF line-endings

proc Dos2Unix {f} {
    puts $f
    if {[file isdirectory $f]} {
	foreach g [glob [file join $f *]] {
	    Dos2Unix $g
	}
    } else {
	set in [open $f]
	set out [open $f.new w]
	fconfigure $out -translation lf
	puts -nonewline $out [read $in]
	close $out
	close $in
	file rename -force $f.new $f
    }
}

# Process each command-line argument

foreach f $argv {
    Dos2Unix $f
}

Command Summary

proc name arguments body Define a procedure
if expression body else elsebody Conditional statement
foreach variable values body Loop over a list of values
puts string Print a string to standard output
puts channel string Print a string to the output identified by channel
glob pattern Return a list of file names that match pattern.
open file how Open a file for reading or writing.
fconfigure channel options Control the way file I/O is done using channel
close channel Close an I/O channel.
file isdirectory path Test if path is a directory.
file join name1 name2 Create a new file name by joining name1 and name2.
file rename name1 name2 Rename a file from name1 to name2.

Running a Tcl script

In UNIX you can start a file with

#!/usr/local/bin/tclsh

and it will automatically launch Tcl to interpret the script. On Windows you can set up an association with ".tcl" files endings and Tcl. In practice, I often end up creating Windows shortcuts instead. For a shortcut you need to specify the path to the Tclsh program, e.g., "C:/Program Files/Tcl/bin/tclsh84.exe" and then give the script name as the first argument to Tclsh.

Tcl Procedures

The main part of the program is a procedure named Unix2Dos. This procedure takes one argument, f, that is the name of a file or directory.

proc Dos2Unix {f} {
    # Prodedure body here
}

The f parameter is set when the Dos2Unix procedure is called. If you called Dos2Unix like this:

Dos2Unix myfile.txt

then the f parameter gets the value myfile.txt when executing inside Dos2Unix.

Command Line Arguments

For our program, we want to pass the names of the files to process on the UNIX command line (i.e., when you are invoking the problem from Bash or Cshell). Command line arguments are stored in the argv variable. The foreach loop at the end of the script calls Dos2Unix with each file given on the command line:

foreach f $argv {
    Dos2Unix $f
}

Testing Conditions

The Dos2Unix procedure works differently on files and directories. If it is passed the name of a file, then it reads and writes that file to do the end-of-line conversions. If it is passed a directory, then it processes all the files in that directory. But first, it must test the file to see if it is a directory.

if {[file isdirectory $f]} {
    # Process the directory
} else {
    # Process one file
}

The if command tests the result of its expression. In this case the expression contains a call to another Tcl command, file isdirectory, which returns 1 if the file is a directory. Square brackets are used to delimit the nested command. Curly braces are used to group the expression and the two command bodies (the if-part and the else-part).

Looping over a list of values

In the case of a directory, Dos2Unix loops over all the files in that directory. The glob command returns the list of files given a file name pattern. The * matches all files. This is joined to the name of the directory in a cross-platform way with the file join command, which uses /, \, or : as the pathname separator on Unix, Windows, and Macintosh, respectively. Finally, we get to loop over the list of file names returned by glob:

foreach g [glob [file join $f *]] {
    Dos2Unix $g
}

Each time through the foreach loop the loop variable g takes on the next value from the list returned by glob. The easiest way to process the files is to call Dos2Unix recursively. In the recursive call, a whole new set of variables is allocated for Dos2Unix, so there is no conflict between the variables in the different instances of Dos2Unix. (This is standard recursion.)

Working with Files

The heart of the conversion done by Dos2Unix reads the file into memory and writes it back out again. First, we open the file with the open commands:

set in [open $f]
set out [open $f.new w]

The extra w argument to open causes the file to opened for writing. We open a different file named with a trailing ".new" suffix. Tcl lets us easily add stuff after the variable value. The "." in $f.new terminates the variable name and the ".new" is treated as a literal. So, if $f is myfile.txt, then $f.new is myfile.txt.new.

Reading and writing the files is done in one combination of commands:

puts -nonewline $out [read $in]

This reads the whole file into memory and passes it to the puts command for output. By default, puts will append a trailing newline (\n) to its output, but we don't want that in this case so we pass a flag to turn off that behavior.

When we are done with the files, we must close the channels. An important side-effect of this is to flush any buffered data to disk:

close $in
close $out

The final step renames the new file to the original name. This effectively deletes the original. The -force flag is required when you are replacing an existing file with file rename.

file rename -force $f.new $f

End-of-Line Characters

The heart of the conversion done by Dos2Unix is simply to read the file into memory and write it back out again. Tcl does automatic end-of-line character conversions. In memory all line endings read in (e.g., UNIX-style line feeds (\n), Windows-style carriage-return, line-feed (\r\n), or Macintosh-style carriage-return (\r)) are converted to the newline character (\n). During output, Tcl converts newline characters to the native representation. So, simply by reading and writing the file you convert it to the local convensions. However, for our program we want to convert to Unix-style line endings, so we use the fconfigure command to tune the I/O channel:

fconfigure $out -translation lf

Other Solutions

One-Liner

The shortest possible version of this program is simply:
puts -nonewline stdout [read stdin]

This reads from the standard input channel, stdin, and writes to the standard output channel, stdout. These standard channels are opened for you by Tclsh and Wish. In addition, there is a standard error output channel called stderr. Because Tcl automatically converts end-of-line characters on input and output, the above program will generate a file in the native format of your current system.

Reading in Blocks

One problem with the one-liner and the complete program is that it buffers the whole file into memory. Fore really large files this might be a problem. Here is a loop that reads the file in blocks of 32 Kbytes:
set blocksize [expr 32 * 1024]
while {![eof stdin]} {
    puts -nonewline stdout [read stdin $blocksize]
}

Optional File Arguments

Suppose you want the program to either take the name of a file on the command line, or work on the standard input channel. Here we check if the argument count is 0, in which cased we operate on stdin:
if {$argc == 0} {
    Dos2Unix stdin
} else {
    foreach f $argv {
	Dos2Unix $f
    }
}

Then, inside Dos2Unix you'll have to check for a file named stdin and do something special. (Of course, there is a problem here if you want to convert a file who's name is actually "stdin".)

if {$f == "stdin"} {
    set in stdin
    set out stdout
} else {
    set in [open $f]
    set out [open $f.new]
}

And later you don't need to do the file rename or necessarily close the standard channels.

if {$in != "stdin"} {
    close $in
    close $out
    file rename -force $f.new $f
}