Trailing slashes and directories

Contents of this page

Introduction and disclaimer
Non-root directories
- Recommendations
- Rationale
Root directory
Specific examples
- detox
- rsync
Excursion on URLs
Excursion on the motivation for the misuse

Introduction and disclaimer

Improper and harmful handling of a trailing slash (“/”) on a directory name is a recurring problem in the Unix-verse. A discussion follows below.

Other types of systems and contexts might have different circumstances, and I do not guarantee that the contents of this page will apply to these. However, when a roughly similar approach to file/entity paths is taken, the chance is high that a slightly modified version does apply. (E.g. by simply replacing “/” with whatever directory separator might be used.) Also see an excursion on URLs.

Non-root directories

Recommendations

A non-root directory with a trailing slash must be treated exactly as the same directory without a trailing slash. /a/b/c/ has the exact same implications and should be treated exactly as /a/b/c.

Side-note:

Here I assume that all components are directories. If c refers to a non-directory with an unusual name, /a/b/c/ is an outright error, while /a/b/c is not. (With reservations for some type of special file that might be navigable without being a directory. None occurs to me, but I might simply be overlooking one of the many special cases.)

It is an outright best practice to strip such trailing slashes before further processing, in order to ensure a consistent handling in code, reduce the risk of programming errors, avoid inadvertent differences in treatment, etc.

Rationale

The slashes are directory separators—not themselves part of the directory names. Above, the last directory has the name c—not c/ (or /c/, or /c). Likewise, its parent is named b—not b/ (whatnot). The internal slashes simple tell where the name of the one directory ends and the next begins. The final directory is not followed by another directory, the slash has no natural meaning, and is best ignored.

Looking at practical effects, the results of implementing different (and usually extremely arbitrary, unexpected, and, to boot, poorly documented) semantics for inputs with and without trailing slashes can not just be annoying but highly error prone. This in particular when semi-replaceable tools are replaced.

Side-note:

Consider a script that (a) takes a directory as a command-line argument, (b) originally used cp internally to transfer files, (c) then is switched to use rsync. The former tool is agnostic to trailing slashes; the latter has a very odd semantic (cf. below). Someone used to calling this script with a trailing slash might now see very different results—while someone who calls it without a trailing slash might see the same result. If the latter is the developer, he might not even become aware that an issue exists during the modification and the ensuing tests.

As an aside, this is an example of why stripping trailing slashes when coding is a best practice: someone who does so is less likely to run into problems with one of these misbehaving tools.

A particular complication is that a strong command-line user takes advantage of “completion” to avoid unnecessary typing and errors. For instance, if a Bash user intends to enter a directory named “directoryNamedToIllustrateThePrinciple”, he would not type cd directoryNamedToIllustrateThePrinciple but e.g. cd dir<Tab> to have the shell expand the full name. (The exact details of approach will depend on what other files and directories are, or are suspected to be, present, what personal customizations have been made, and similar, but the preceding should give the right general idea.) However, per default, Bash would append a slash at the end of the newly expanded directory. Normally, this is either good (e.g. because the user wants to go into a subdirectory and is saved the bother to type the slash himself) or does no harm (because whatever command is called with the directory ignores trailing slashes)—but then some jackass decides that his tool should have an odd semantic difference. The user is now forced to pay much closer attention, might see himself forced to manually strip slashes because he does not know what tools can be trusted, or might inadvertently do something that requires considerable manual corrections in the wake of whatever the misbehaving tool did.

Root directory

The case of the root directory (“/”) is trickier, as the naming scheme is not logically consistent in this regard (likely for historical reasons that certainly go back further than my own first contacts with the Unix-verse in 1994; it might or might not relate to the role as a mount point).

Upfront, however, this is yet another argument against using special semantics through trailing slashes: How is a call involving the root directory to be handled in a consistent manner? Should [program] / be treated with the logic of [program] /etc or [program] /etc/? Should some third type of logic be imposed?

The complication is that “/” typically serves as both the name of a directory and as an implicit directory separator. For instance, /usr refers to the subdirectory usr in the root directory, while a plain usr would typically be seen as a directory (or other entry) by that name in the current working directory (equivalent to ./usr). In contrast, the second slash in /usr/share is only a directory separator. We also see that /usr/ does have a trailing slash, while the slash in / is not usually seen as trailing (because of its dual nature as name and separator).

Side-note:

The root directory is also special in some other regards. For instance, with reservations for differences between systems, the special link “..”, present in all directories, usually leads to a parent directory. Hence, /usr/share/.. is equivalent to /usr and /usr/share/../.. is equivalent to /. For the root directory, however, this entry points back to the root directory and /.. and / are equivalent—as are /usr/share/../../../../../../.. and /.

I have pondered some ways to make the treatment consistent, but have not found one that is entirely satisfactory. The least bad way might be to view the root directory as having a null name (in the sense of “null” used in e.g. Java), for which the slash is the separator, and to view / as a convenient way to say the-null-name-followed-by-a-slash. If (!) directory names with and without trailing slashes are treated consistently, we can now in good conscience use / to refer to the root directory. (While /usr has the implication “Root directory; directory separator; directory named ‘usr’.”, with results consistent with current use.

There are practical problems with this, however:

Firstly, many or most shells and command-line tools do not have an equivalent of null. Ditto many programming languages (C, the traditional language of the Unix-verse, has something of the idea of a null, but it is very different from the Java idea, the ability to differ between something null and something empty is smaller, etc.). Thinking in terms of an empty name is conceptually less sound, but would partially remove that problem.

Secondly, it could be technically tricky to keep track of what is what (be it with a null name or an empty name). For instance, in Bash, an uninitialized (string/generic) variable is the same as a variable explicitly assigned an empty value (and there is no native null). How do we keep the two apart without additional efforts? (Vice versa, languages that do have a null often use that as the default for an uninitialized variable.)

Thirdly, existing tools are not necessarily compatible with this approach. For instance, a practical experiment with the Bash-builtin cd on my local system, shows that cd ” leaves me in the directory where I already was, while the “empty name” approach would require it to be equivalent to cd / and cd ”/ and take me to the root directory. (However, it is clear that some differentiation between empty and absent is made, as a plain cd, with no argument at all, follows the traditional behavior of moving me to the home directory of the current user: directory is not specified (as opposed to specified-but-empty); ergo, use default directory.)

Meta-information:

For technical reasons, the commands involving quotes are currently rendered incorrectly. The intent is two consecutive apostrophes.

TODO fix this when more time is available.

Specific examples

detox

I often use the tool detox, which sanitizes (“detoxes”) file names. For instance, a file named "e f(g) h.txt" could cause problems or require special care when used in a command line environment. A detox "e f(g) h.txt" turns the name into e_f_g_h.txt, which is far less likely to cause problems. (Per default. The exact translations made are configurable.)

In a twist, however, detox is, it self, not truly safe for the command line: If it is called on a directory, instead of a file, the contents of the directory are always detoxed, but whether the name of the directory, it self, is detoxed depends on whether a trailing slash is (no detoxing) or is not (detoxing) present. Not only is this a clear violation of expectation, but it is also highly counter-intuitive and unexpected. Moreover, should someone want that scenario (contents of directory detoxed, directory name left alone), this could (with sane semantics) easily be achieved by a call like detox -r [directory name]/*—with a mere one character more, we now have the same effect in a sane and predictable manner, unlikely to ever trip someone up.

Side-note:

To make matters worse, recursion is screwed up, as I notice during some experiments for this text. (I normally truly want everything to be detoxed, so these complications had not previously appeared in my own experiences.) The flag “-r” is supposed to cause recursion, and the natural expectation would be that detox [directory name] (with or without a slash) would only alter the name of the directory, it self. In reality, the contents one step below the directory are always altered, while the directory, it self, is or is not altered depending on that trailing slash. Idiotic. (Here, we also see something of the arbitrariness of the semantic: if (!!!) some special semantic should be enforced based on a trailing slash, it would have made far more sense to always detox the directory name and to let the slash control whether the contents one level down are detoxed. As is, there is no obvious way to detox just the directory name without jumping through hoops.)

What does the “-r” flag actually do? It controls whether the recursion goes beyond the first level, i.e. whether the contents of sub-directories of sub-directories are recursively detoxed—a horribly misconceived approach.

rsync

rsync could be viewed as a cp on steroids, with the ability to continue interrupted transfers, to transfer files between servers per ssh, to make incremental backups, and quite a few other things. Unfortunately, it also has a very odd, unexpected, counter-intuitive, and, here, potentially dangerous special semantics for trailing slashes: Call it without a trailing slash on the source directory and the entire directory is copied into the destination directory (as expected); call it with a trailing slash and ... only the contents are copied. Assume, e.g., that we transfer a terabyte’s worth of data from the one server to another with a call like rsync source destination (I have cut the call down to what is needed for illustration; this is not a realistic full call). A week later, we wish to make an incremental update. A call of rsync source/ destination is made. Now, there is a trailing slash, rsync assumes a copy equivalent to rsync source/* destination, fails to see the already transferred data (which is not where rsync would now expect it)—and proceeds to transfer the entire terabyte again. (In contrast, rsync source destination would see the old data and merely update what has changed.)

Worse, any existing data with a collision in file name could now accidentally be over-written.

Side-note:

rsync also has functionality to automatically move and delete files (and, maybe, take other actions) based on transfer results. There might be some situation where such actions can be accidentally triggered by such faulty transfers. I have not investigated this.

Some other special semantics with trailing slashes are present, but I have never, myself, run afoul of them.

Excursion on URLs

An interesting similar case is given by http(s) URLs that use suffix-less end-components, and other URLs that follow a sufficiently similar scheme. (E.g. https://fake-domain.com/a/b/c over the more traditional https://fake-domain.com/a/b/c.html.)

If such a scheme is used, a trailing slash should have no effect whatever: https://fake-domain.com/a/b/c should be equivalent to https://fake-domain.com/a/b/c/. I might go as far as to consider the inclusion of a trailing slash an outright error of use (but one that should be silently tolerated for reasons of user-friendliness), as the last component typically has a significance more similar to a file than a directory when compared to a file path (and, indeed, often causes a specific file on the server to be accessed/processed/whatnot).

(Whether to use such a scheme is another point of debate, which would require a separate page, and where the results could depend on circumstances, e.g. to what degree, and how, contents are dynamically generated, what the likelihood of a technology change is, and similar.)

If the URL at hand does identify an underlying directory or file in a direct manner, the main discussion applies to a high degree. As of 2024, this is (almost always) the case with e.g. ftp and is often the case even with http(s). (And was almost always the case in the early days of the Web for any URL.) This website (barring later changes) is a good example of such a direct correspondence: files and directories are generated once based on markup files and then served statically based on what is found in the file system.

Side-note:

Note that in https://fake-domain.com/a/b/c there are two or three different types of slashes, and that I speak only of the virtual directory separators. The first two slashes, “//” signify that “fake-domain.com” is a domain (server, authority, whatnot; the correct word seems to have varied over time and context) and are not separators at all. The third can either be viewed as a separator between domain and the “domain local” resource path or as a virtual root directory. (But, yes, https://fake-domain.com and https://fake-domain.com/ should also be treated as being the same.) Only the remaining slashes are fully equivalent to directory separators.

Excursion on the motivation for the misuse

A potential explanation for the misuse is that the slash cannot occur in a directory name. If someone were to try to impose a similar semantics by e.g. /a/b/c@, it might be hard or impossible to tell whether the final directory was named “c” and had a trailing “@”, to impose semantics, or was simply named “c@”. If worst comes to worst, however, even that would be better, provided that some sufficiently rare sign, or combination of signs, is chosen, or that something more than a single slash was used. (The problems would be far smaller with e.g. /a/b/c/// than with /a/b/c/, including the risk of accidentally trailing symbols and defeated expectations of sane behavior.

The correct solution, however, is something entirely different, namely to treat trailing slashes as non-existent and to use some other and more standard means, e.g. flags, to govern behavior.

The following is an automatically generated list of other pages linking to this one. These may or may not contain further content relevant to this topic.

Sitemap