Motivation Behind Some Bash Syntax that is Different from General-purpose Languages

Bash is different from general-purpose languages, anyone who think it from their Java, C, Python routines will get in trouble sooner or later.

Compared with general-purpose languages, bash as a shell language is centered around command(ls, cd, echo, sed, etc.), while general-purpose languages are more about data structure and algorithm(in principle, although in practice modern programming is full of glue code[1]).

This fundamental difference is correct but oversimplified. So in this post, a few distinct bash syntax will be checked, to be specific:

During the analysis, a little internal knowledge of bash is involved. Hope you can get a new way in thinking bash after reading this post. Let's begin.

1 Everything is string in bash by default

The first syntax we'll check is string. In most general-purpose languages, string is a concatenation of characters enclosed with single or double quotes, for example:


String msg = "hello~";

However, quotation marks is optional for bash string. In the following example , zero, single or double quotes are all legal and equivalent, although it's not always true, we'll talk about that later.


msg=hello~
msg='hello~'
msg="hello~"

The reason behind this syntax decision is obvious. Since bash is all about command and both the command name and its arguments are string, requiring string must be enclosed with either single or double quotes will be verbose and inconvenient. Imagine you have to type "ls" "3.html" rather than ls 3.html (although, in fact, "ls" "3.html" just work fine in bash. You can try it in your own terminal).

By the way, YAML[11] share the same syntax decision with bash, quotation marks is also optional for string in most cases.

In fact, everything in bash is deemed as string by default, even if the string is something looks like arithmetic expression which leads to the shell arithmetic syntax. Consider the following example:


$ a=1+1
$ echo $a
1+1
$ ((b=1+1))
$ echo $b
2

In bash, 1+1 is treated as string and no evaluation happens unless it's enclosed by (( and )). Of course there are other ways to get shell arithmetic, you can find more in bash reference manual[3], but the motivation behind the syntax decision remains the same.

2 Quoting

Quotes is optional for bash string, but what it really means when string is enclosed with quotes? Actually it's more complicated than it seems at the first glance.

2.1 Quoting disable special treatment for special characters

Let's begin with the easy one. There many special characters in bash, quoting can be used to disable special treatment for special characters.

For example, roughly speaking, * represented all files and directories of the current directory(this saying is not accurate, the comprehensive definition can be found at bash reference manual[4]).

So echo * will display all names of files and directories in the current directory. If you want to display * itself, you must quote it by one of the three ways: single quotes, double quotes or escape character.


$ echo *
1.md 2.md 3.md
$ echo '*'
*
$ echo "*"
*
$ echo \*
*

The difference between single and double quotes is:

2.2 Quoting can avoid word splitting

In the fisrt place, look at the following example to get some sense of splitting.


$ echo 1 2  3    4
1 2 3 4
$ echo '1 2  3    4'
1 2  3    4

Why is that? It's related to how bash process user input before executing the command(echo here) finally. After reading input, bash will breaks the input into words and operators, the delimiter used here is call metacharacters.

Space is one of metacharacters[5], so the unquoted input echo 1 2 3 4 will be separated in to five words: echo, 1, 2, 3, 4. The first is command name, others are arguments, the spaces between these digits have been removed, so the output is just 1 2 3 4.

By contrast, spaces in the quoted input echo '1 2 3 4' lose special meaning, this input will be only separated into 2 words echo and '1 2 3 4', so spaces are remained in output.

Things get really frustrating when I must tell you the above splitting is not the so called word splitting by bash's definition. It do similar things , but the real word splitting in bash refer to the words separation during the so called shell expansions step.

A main difference between the two splitting is that they occur at different step of the input process. Here is a brief description of the shell’s operation when it reads and executes a command[6]:

1. Reads its input
2. Breaks the input into words and operators
3. Parses the tokens into simple and compound commands
4. Performs the various shell expansions(words splitting happens here)
5. Performs any necessary redirections
6. Executes the command
7. Optionally waits for the command to complete

Word splitting do contribute to the output in the following example:


$ foo='1 2  3    4'
$ echo $foo
1 2 3 4

After step 2, the input echo $foo is broken into two words echo and $foo, in step 4 $foo is subject to Shell Parameter Expansion and expanded to string 1 2 3 4, then word splitting will break it into 4 words 1, 2 , 3, 4 without any spaces.

However, quoting prevent word splitting.


$ foo='1 2  3    4'
$ echo "$foo"
1 2  3    4

A little mess here, words splitting has more details not mentioned here, for example:

2.3 Another example of word splitting

Input process before command execution is a key to understand many things include quoting and word splitting. To get a better understanding, an interesting example from Stack Overflow[7] can be studied.


$ echo foo | wc -l
1
$ a="echo foo | wc -l"
$ $a
foo | wc -l

Step 3 takes effect here. echo foo | wc -l will be separated into two commands, because | is pipe operator.

By contrast, step 3 has nothings to do with $a, it's always treated as one command in the following steps.

2.4 Backslash curse

Before moving to the next section, I would like to say more about escape character. It's often confusing with nested escape characters. What nested mean? It means two related systems use the same escape character.

For example, bash use backslash \ to escape special characters and echo also use backslash \ as escape character in -e mode. Four backslashes in bash input produce only one backslash in display under such circumstance.


$ echo -e \\\\
\

This is surprising and become annoying when you need more backslashes to display.

3 if command

Conditional statement in Java, C, Python are very similar, easy to learn and write. That's not the case in bash. When bash newbies start writing if statements, some mistakes are very common. That maybe the then keyword or space is forgotten or > is excepted to trigger an integer comparison or && is excepted as logical AND operator......

Let's analysis them one by one, I'd say it's reasonable rather than easy, so much details here.

3.1 [ and test are equivalent

Let's start by a simple example: check whether variable x is greater than 3, the correct code is:


x=4
if [ "$x" -gt 3 ]
then
    echo x is greater than 3
fi

The first question is why is a space required around [ and ]? The reason is that a space is needed between command name and arguments and [ is a command.

In fact, [ is equivalent to test command and it's just another form of test command. The syntax get easy to understand in test form.


x=4
if test "$x" -gt 3
then
    echo x is greater than 3
fi

In summary, the form if [ ... ] merely looks like if statement in other languages, but the condition expression is actually evaluated by a test command.

This also explains why the expression is "$x" -gt 3 not "$x" > 3. In Bash's context, the conditional expression is actually arguments of test command. After parameter expansion and word splitting, three argument 4, -gt, 3 is passed with test command. Basically it's test command choose -gt over >.

In test command, -gt is called primary operator and 4, 3 is called primary operand. Other primary operators are -d to check whether directory exists, -f to check whether file exists, etc. So I guess, they choose-gt due to naming convention.

The other reason I guess is that > is a metacharacter, avoiding it in test primary operator makes sense.

3.2 How to AND two expressions

If you want to check whether x is greater than 3 and litter than 8, you have two options: 1. test command with -a primary operator; 2. bash AND command List. Method 2 is recommended by POSIX due to the portability concern[8]. Method 2 is also more readable in my own opinion.


x=4
if test "$x" -gt 3 -a "$x" -lt 8
then
    echo x is greater than 3
fi

# or in [ form
if [ "$x" -gt 3 -a "$x" -lt 8 ]
then
    echo x is greater than 3
fi

x=4
if test "$x" -gt 3 && test "$x" -lt 8
then
    echo x is greater than 3
fi

# or in [ form
if [ "$x" -gt 3 ] && [ "$x" -lt 8 ]
then
    echo x is greater than 3
fi

3.3 Other forms of if

According to bash reference manual[9]:

Commands separated by a ‘;’ are executed sequentially; the shell waits for each command to terminate in turn.

Based on that, if and then can be put in one line separated with ;. So the first example can also be written as:


if [ "$x" -gt 3 ]; then
    echo x is greater than 3
fi

Besides test and [ ... ], [[ ... ]] is also supported in bash conditional constructs, but not in POSIX. You can find the reason in the RATIONALE section of POSIX test command documents[8].

4 for command

A simple for example in bash is:


# print 1 to 5
for i in $(seq 1 5)
do
    echo $i
done

Once again, for constructs look like general-purpose languages for statement, but they are very different. Bash for commands has nothing to do with collection, iterable interface or something similar, it's all about words splitting.

After command substitution, command seq 1 5 will be executed, $(seq 1 5) will be replaced by the execution result 1 2 3 4 5 which is then be separated to 5 arguments: 1, 2, 3, 4, 5.

Passing these five arguments to for command directly makes no difference.


# print 1 to 5
for i in 1 2 3 4 5
do
    echo $i
done

Many command outputs are separated by space, new line or tab, the word splitting mechanism in for makes sense in this regard.

I have intended to analyze variable assignment without spaces, but I'm sorry that I can't find a reasonable answer, there are some discussions in Stack Overflow [10].

Too much bash syntax can be analyzed, but i will stop here. You can explore them by yourself if that's interesting for you.

5 Reference

1. Antirez's homepage Fundamental things I believe about society and life
2. Bash Reference Manual 3.1.2 Quoting
3. Bash Reference Manual 6.5 Shell Arithmetic
4. Bash Reference Manual 3.5.8 Filename Expansion
5. Bash Reference Manual 2 Definitions
6. Bash Reference Manual 3.1.1 Shell Operation
7. Stack Overflow Bash word splitting mechanism
8. POSIX test command
9. Bash Reference Manual 3.2.4 Lists of Commands
10. Stack Overflow What is the rationale behind variable assignment without space in bash script
11. YAML homepage

Written by Songziyu @China Sep. 2023