Legal characters in file names

My email signature has my opinion:

Today’s Advice: File names include letters, numbers, periods, dashes, and underscores. NEVER \ / ! * & ^ $ or “. And no spaces.

Spaces are bad, they make everything inconvenient. I’m not a big fan of ( and ) in file names, don’t want to see [ or ], and, generally, wish they would just be letters, numbers, dash and underscore.

When I was an MS Dos user, of course, file names were limited to 8 characters and the 3 letter extension, there could be at most one period, no spaces. When the Mac was introduced, they made spaces and long file names appear sexy, and Windows users used to dream eagerly of the day when they, too, could make life inconvenient for Unix programmers by putting spaces in file names. Nothing in scripting is easier with spaces, some things are more difficult.

There’s a Microsoft page about this, note how ugly their URL is because of unacceptable characters in their name:

http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx

Anyway, they say for Windows NTFS file systems, one can use any characters except:

The following reserved characters:

< (less than) > (greater than)
: (colon)
” (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)

Its obvious why they don’t want the * in there, right? I’m interested to see they ban the colon, that’s new information.

In shell scripts, we can usually work around these problems, but I would rather not create a crisis by assuming that all scritp writers are clever. Several characters are used in scripts, and we’d rather not confuse ourselves by seeing them in file names. That’s a good reason why not to put
|
[ ]
( )
*
&

in file names. I shutter to think what would happen if a user puts spaces on both sides of a period, actually. A period in regular expression means “any character”. Think how exciting ” . ” would be in a file name.

In my script directory in my home account, I keep a script that can replace bad characters, including spaces. It uses a time-tested Pearl program which was distributed with Slackware Linux in the early days (now I call it rename-perl.pl to keep it separate from the horrible rename program they package up now. Honestly, some people …. :) ).

Here is rename-perl.pl in all its glory. I think this goes back to the inventors of Perl, actually,

#!/usr/bin/perl
#
# rename script examples from lwall:
# rename ‘s/\.orig$//’ *.orig
# rename ‘y/A-Z/a-z/ unless /^Make/’ *
# rename ‘$_ .= “.bad”‘ *.f
# rename ‘print “$_: “; s/foo/bar/ if =~ /^y/i’ *

$op = shift;
for (@ARGV) {
$was = $_;
eval $op;
die $@ if $@;
rename($was,$_) unless $was eq $_;
}

If you make that executable and put it in your path (say, ~/bin), then you could run this to replaces of all spaces with underscores in file names in a directory:

$ rename-perl s/\ /_/g *

That middle part is a sed script, the asterix is a file selector (anything).

My script cleanFileNames.sh does that recursively. I have lots of other fixes in there
for other characters I hate, but here’s the part you’d need to just kill file spaces.

#!/bin/sh

dirname=”.”
###if command line argument, it has to be a directory

if [ $# -gt 0 ]; then
dirname=$1
fi

for i in 0 1 2 3 4; do
find $dirname -maxdepth $i -name “*\ -\ *” -exec rename-perl s/\ -\ /-/g {} \;
echo “$dirname”
done

for i in 0 1 2 3 4; do
find $dirname -maxdepth $i -name “*\ *” -exec rename-perl s/\ /_/g {} \;
done

##snip details

Things worth gazing at:

Spaces in Filenames and Why You Should Avoid Them on the Web
http://www.blackbaudknowhow.com/blackbaud-sphere/spaces-in-filenames-and-why-you-should-avoid-them-on-the-web.htm

What technical reasons exist for not using space characters in file names?
http://superuser.com/questions/29111/what-technical-reasons-exist-for-not-using-space-characters-in-file-names

Work the Shell – Dealing with Spaces in Filenames
http://www.linuxjournal.com/article/10954

How to Create Good Filenames for Your Web Pages
http://www.thesitewizard.com/webdesign/create-good-filenames.shtml

UNIX File Names
http://cmgm.stanford.edu/classes/unix/filenames.html

About pauljohn

Paul E. Johnson is a Professor of Political Science at the University of Kansas. He is an avid Linux User, an adequate system administrator and C programmer, and humility is one of his greatest strengths.
This entry was posted in Linux and tagged , . Bookmark the permalink.