Bash is different from general-purpose languages, anyone who think it from their Java, C, Python routines will get in trouble sooner or later.
Compared with general-purpose languages, bash as a shell language is centered
around command(ls
, cd
, echo
, sed
, etc.), while general-purpose
languages are more about data structure and algorithm(in principle, although in
practice modern programming is full of glue code[1]).
This fundamental difference is correct but oversimplified. So in this post, a few distinct bash syntax will be checked, to be specific:
quoting
[2] in bash
During the analysis, a little internal knowledge of bash is involved. Hope you can get a new way in thinking bash after reading this post. Let's begin.
The first syntax we'll check is string. In most general-purpose languages, string is a concatenation of characters enclosed with single or double quotes, for example:
String msg = "hello~";
However, quotation marks is optional for bash string. In the following example , zero, single or double quotes are all legal and equivalent, although it's not always true, we'll talk about that later.
msg=hello~
msg='hello~'
msg="hello~"
The reason behind this syntax decision is obvious. Since bash is all about
command and both the command name and its arguments are string, requiring
string must be enclosed with either single or double quotes will be verbose and
inconvenient. Imagine you have to type "ls" "3.html"
rather than ls 3.html
(although, in fact, "ls" "3.html"
just work fine in bash. You can try it in
your own terminal).
By the way, YAML[11] share the same syntax decision with bash, quotation marks is also optional for string in most cases.
In fact, everything in bash is deemed as string by default, even if the string is something looks like arithmetic expression which leads to the shell arithmetic syntax. Consider the following example:
$ a=1+1
$ echo $a
1+1
$ ((b=1+1))
$ echo $b
2
In bash, 1+1
is treated as string and no evaluation happens unless it's
enclosed by ((
and ))
. Of course there are other ways to get shell
arithmetic, you can find more in bash reference manual[3], but the
motivation behind the syntax decision remains the same.
Quotes is optional for bash string, but what it really means when string is enclosed with quotes? Actually it's more complicated than it seems at the first glance.
Let's begin with the easy one. There many special characters in bash, quoting can be used to disable special treatment for special characters.
For example, roughly speaking, *
represented all files and directories of the
current directory(this saying is not accurate, the comprehensive definition
can be found at bash reference manual[4]).
So echo *
will display all names of files and directories in the current
directory. If you want to display *
itself, you must quote it by one of the
three ways: single quotes, double quotes or escape character.
$ echo *
1.md 2.md 3.md
$ echo '*'
*
$ echo "*"
*
$ echo \*
*
The difference between single and double quotes is:
$
, `
, \
) still remain special meaning under
double quotes
In the fisrt place, look at the following example to get some sense of splitting.
$ echo 1 2 3 4
1 2 3 4
$ echo '1 2 3 4'
1 2 3 4
Why is that? It's related to how bash process user input before executing the
command(echo
here) finally. After reading input, bash will breaks the input
into words and operators, the delimiter used here is call metacharacters
.
Space is one of metacharacters
[5], so the unquoted input
echo 1 2 3 4
will be separated in to five words: echo
, 1
, 2
, 3
,
4
. The first is command name, others are arguments, the spaces between these
digits have been removed, so the output is just 1 2 3 4
.
By contrast, spaces in the quoted input echo '1 2 3 4'
lose special
meaning, this input will be only separated into 2 words echo
and
'1 2 3 4'
, so spaces are remained in output.
Things get really frustrating when I must tell you the above splitting is not
the so called word splitting
by bash's definition. It do similar things , but
the real word splitting
in bash refer to the words separation during the so
called shell expansions
step.
A main difference between the two splitting is that they occur at different step of the input process. Here is a brief description of the shell’s operation when it reads and executes a command[6]:
1. Reads its input
2. Breaks the input into words and operators
3. Parses the tokens into simple and compound commands
4. Performs the various shell expansions(words splitting
happens here)
5. Performs any necessary redirections
6. Executes the command
7. Optionally waits for the command to complete
Word splitting
do contribute to the output in the following example:
$ foo='1 2 3 4'
$ echo $foo
1 2 3 4
After step 2, the input echo $foo
is broken into two words echo
and $foo
,
in step 4 $foo
is subject to Shell Parameter Expansion
and expanded to
string 1 2 3 4
, then word splitting
will break it into 4 words 1
, 2
, 3
, 4
without any spaces.
However, quoting prevent word splitting
.
$ foo='1 2 3 4'
$ echo "$foo"
1 2 3 4
A little mess here, words splitting
has more details not mentioned here, for
example:
IFS
Input process before command execution is a key to understand many things
include quoting
and word splitting
. To get a better understanding, an
interesting example from Stack Overflow[7] can be studied.
$ echo foo | wc -l
1
$ a="echo foo | wc -l"
$ $a
foo | wc -l
Step 3 takes effect here. echo foo | wc -l
will be separated into two
commands, because |
is pipe operator.
By contrast, step 3 has nothings to do with $a
, it's always treated as one
command in the following steps.
Before moving to the next section, I would like to say more about escape character. It's often confusing with nested escape characters. What nested mean? It means two related systems use the same escape character.
For example, bash use backslash \
to escape special characters and echo also
use backslash \
as escape character in -e
mode. Four backslashes in bash
input produce only one backslash in display under such circumstance.
$ echo -e \\\\
\
This is surprising and become annoying when you need more backslashes to display.
if
command
Conditional statement in Java, C, Python are very similar, easy to learn and
write. That's not the case in bash. When bash newbies start writing if
statements, some mistakes are very common. That maybe the then
keyword or
space is forgotten or >
is excepted to trigger an integer comparison or &&
is excepted as logical AND
operator......
Let's analysis them one by one, I'd say it's reasonable rather than easy, so much details here.
[
and test
are equivalentLet's start by a simple example: check whether variable x is greater than 3, the correct code is:
x=4
if [ "$x" -gt 3 ]
then
echo x is greater than 3
fi
The first question is why is a space required around [
and ]
? The reason is
that a space is needed between command name and arguments and [
is a command.
In fact, [
is equivalent to test
command and it's just another form of
test
command. The syntax get easy to understand in test
form.
x=4
if test "$x" -gt 3
then
echo x is greater than 3
fi
In summary, the form if [ ... ]
merely looks like if statement in other
languages, but the condition expression is actually evaluated by a test
command.
This also explains why the expression is "$x" -gt 3
not "$x" > 3
. In Bash's
context, the conditional expression is actually arguments of test
command.
After parameter expansion
and word splitting
, three argument 4
, -gt
,
3
is passed with test
command. Basically it's test
command choose -gt
over >
.
In test
command, -gt
is called primary operator and 4
, 3
is called
primary operand. Other primary operators are -d
to check whether
directory exists, -f
to check whether file exists, etc. So I guess, they
choose-gt
due to naming convention.
The other reason I guess is that >
is a metacharacter, avoiding it in test
primary operator makes sense.
AND
two expressions
If you want to check whether x is greater than 3 and litter than 8, you have
two options: 1. test
command with -a
primary operator; 2. bash AND
command List. Method 2 is recommended by POSIX due to the portability
concern[8]. Method 2 is also more readable in my own opinion.
test
command with -a
x=4
if test "$x" -gt 3 -a "$x" -lt 8
then
echo x is greater than 3
fi
# or in [ form
if [ "$x" -gt 3 -a "$x" -lt 8 ]
then
echo x is greater than 3
fi
test
commands separated by &&
x=4
if test "$x" -gt 3 && test "$x" -lt 8
then
echo x is greater than 3
fi
# or in [ form
if [ "$x" -gt 3 ] && [ "$x" -lt 8 ]
then
echo x is greater than 3
fi
if
According to bash reference manual[9]:
Commands separated by a ‘;’ are executed sequentially; the shell waits for each command to terminate in turn.
Based on that, if
and then
can be put in one line separated with ;
. So
the first example can also be written as:
if [ "$x" -gt 3 ]; then
echo x is greater than 3
fi
Besides test
and [ ... ]
, [[ ... ]]
is also supported in bash conditional
constructs, but not in POSIX. You can find the reason in the RATIONALE
section of POSIX test
command documents[8].
for
command
A simple for
example in bash is:
# print 1 to 5
for i in $(seq 1 5)
do
echo $i
done
Once again, for
constructs look like general-purpose languages for statement,
but they are very different. Bash for
commands has nothing to do with
collection
, iterable interface
or something similar, it's all about
words splitting
.
After command substitution
, command seq 1 5
will be executed, $(seq 1 5)
will be replaced by the execution result 1 2 3 4 5
which is then be separated
to 5 arguments: 1
, 2
, 3
, 4
, 5
.
Passing these five arguments to for
command directly makes no difference.
# print 1 to 5
for i in 1 2 3 4 5
do
echo $i
done
Many command outputs are separated by space, new line or tab, the
word splitting
mechanism in for
makes sense in this regard.
I have intended to analyze variable assignment without spaces, but I'm sorry that I can't find a reasonable answer, there are some discussions in Stack Overflow [10].
Too much bash syntax can be analyzed, but i will stop here. You can explore them by yourself if that's interesting for you.
1. Antirez's homepage Fundamental things I believe about society and life 2. Bash Reference Manual 3.1.2 Quoting 3. Bash Reference Manual 6.5 Shell Arithmetic 4. Bash Reference Manual 3.5.8 Filename Expansion 5. Bash Reference Manual 2 Definitions 6. Bash Reference Manual 3.1.1 Shell Operation 7. Stack Overflow Bash word splitting mechanism 8. POSIX test command 9. Bash Reference Manual 3.2.4 Lists of Commands 10. Stack Overflow What is the rationale behind variable assignment without space in bash script 11. YAML homepage
Written by Songziyu @China Sep. 2023