07 September 2014

According to my observation, quoting and escaping has long been one of the largest source of confusion with regard to shell scripting. Sometimes, even veterans are caught by its pitfalls. However, hard it might seem, it is actually easy to grasp if you understand the rule. The key to understanding is to keep in mind that commands you write are interpreted in two steps. In this long blog, I explain that and also summarize some frequently met escaping rules.

Questions

Let's start with a couple of questions.

How to replace two-character the string \A with one character - using 'sed'?

Why all commands in the first group work while none in the second groups works?

# Works
echo 'hi \A there' | sed 's/\\A/-/'
echo 'hi \A there' | sed "s/\\\A/-/"
echo 'hi \A there' | sed "s/\\\\A/-/"
echo 'hi \A there' | sed s/\\\\A/-/
# Oops
echo 'hi \A there' | sed 's/\A/-/'
echo 'hi \A there' | sed "s/\\A/-/"
echo 'hi \A there' | sed s/\\\A/-/

Why things change when strings being processed change? Say, if the string to be replaced is not \A but \$, none of above commands works, even those which worked for \A. We have to add more backslashes.

echo 'hi \$ there' | sed 's/\\\$/-/'
echo 'hi \$ there' | sed "s/\\\\\\$/-/"
echo 'hi \$ there' | sed "s/\\\\\\\$/-/"

Why sometimes adding one more backslash does not matter but other times it does?

There are three steps

As I always say, whenever you type a command and press the <enter>, the command line is processed in following steps:

  1. The shell gets the command line and, if needed, do some "magic" stuff like expansion etc.;
  2. The shell evokes the command and passes cooked string to the command as argument(s);
  3. The command runs.

Consequently, it is crucial to be very clear about what shell will do with the input string and what input the command expects. For instance:

find . -name *.txt              # WRONG!
find . -name '*.txt'            # Correct
ls *.txt                        # Correct
ls '*.txt'                      # WRONG!

Why we shall quote the operand of 'find' but not that of 'ls'? The reason is that we want to pass '*.txt' to 'find' literally while having shell to do file name globing to '*.txt' before passing it to 'ls'. In other words,

  • 'find' expects a regexp string.
  • 'ls' expects a list of file names.

Character escaping can happen at both steps

When it comes to character escaping, it can happend at both steps as well. Take 'echo' for example, when we type the following line on command line, the five-character string (' a \ n b ') is processed by shell in the first place and then passed to 'echo'. 'echo' then process it again.

echo -e 'a\nb'

Another example is that both shell and 'sed' handle character escaping.

sed 's/\A/nul/' input.txt

Escaping rules of 'bash'

As we now see that character escaping can happen at different stages of command processing, the key to write commands involve character escaping correctly is to understand what the escaping rules for each stage are. In this section, let's take a look at bash.

Enclosing characters in single quotes (`'') preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.

Enclosing characters in double quotes (`"') preserves the literal value of all characters within the quotes, with the exception of `$', ``', `\'

The backslash retains its special meaning only when followed by one of the following characters: `$', ``', `"', `\', or `newline'

NOTE:

  • NOTHING within single quotes will be escaped.
    $ echo It\'s hard    # <--- works
    It's hard
    
    $ echo "It's hard"   # <--- looks better
    It's hard
    
    $ echo 'It\'s hard'' # <--- OOPS (also note the trailling quote)
    It\s hard
    
    $ echo 'It'\''s hard'# <--- if you do need "nested" single quote
    It's hard
    
  • newline here means NOT \n but the newline itself (i.e. what you got when you press <enter> key.)

Escaping rules of 'echo'

`echo'

… If the `-e' option is given, interpretation of the following backslash-escaped characters is enabled….

`\n' newline

`\\' backslash

`\a' alert (bell)

NOTE:

  • Escaping only happens when called with -e.

Some examples

This section presents some examples to deepen your understanding of aforementioned rules. If you are familiar with them, feel free to skip this section.

How bash escapes characters

Note that since we are focused on examine how backslash works in bash, we are running 'echo' WITHOUT -e to avoid confusion might be caused by escapes done by 'echo'.

$ echo a\
> b
ab

$ echo "a\
> b"
ab

$ echo 'a\
> b'
a\
b

$ echo a\\
a\

$ echo a\nb
anb

$ echo "a\nb"
a\nb

$ echo 'a\nb'
a\nb

$ echo a\\nb
a\nb

$ echo "a\\nb"
a\nb

$ echo 'a\\nb'
a\\nb

How echo escape works

Since, for bash, nothing within single quotes escaped, to examine how escape works in echo, we enclose the strings within SINGLE QUOTES to avoid confusion.

$ echo -e 'a\'
a\

$ echo -e 'a\\'
a\

$ echo -e 'a\nb'
a
b

$ echo -e 'a\\nb'
a\nb

$ echo -e 'a\\\nb'
a\
b

$ echo -e 'a\\\\nb'
a\\nb

A case study

Now, let's do an exercise. What should I do if I want to write a string some like this into a file?

cmd 1 \
cmd 2

Before looking at the answer, check the output of underneath commands and explain why.

$ echo -e "1- a \\ b"; echo -e "2- a \n b"; echo -e "3- a \\\n b"
1- a \ b
2- a
 b
3- a \n b

More confusing things:

$ echo -e "1: a \\\n b"; echo -e "2: a \\\\n b"; echo -e "3: a \\\\\n b"
1: a \n b
2: a \n b
3: a \
 b

Here is the answer:

$ cmd1=hi; cmd2=there

$ echo -e "$cmd1\\\\\n$cmd2"
hi\
there

# bash:    \\ \\ \n -> \\\n
# echo -e: \\ \n -> \ newline

$ echo -e "$cmd1\\\\n$cmd2"
hi\nthere

# bash:    \\ \\ n -> \\n
# echo -e: \\ n -> \ n

$ echo -e "$cmd1\\\n$cmd2"
hi\nthere

# bash:    \\ \n -> \\n
# echo -e: \\ n -> \ n

Answer to the opening questions

The key is understand escape rules of both 'bash' and 'sed'. That way, we can split the task into two steps and figure out the expected input for each step. Here we go!

  • For \A to -.
    1. The command (string) passed to 'sed'

      It should be s/\\A/-/.

      The command is some like s/from/to/ but s/\A/-/ is incorrect since 'sed' will interpret the \ as an escape character. Hence, we need to use an extra backslash to escape it.

    2. The string passed to 'bash', i.e. what to type on command line.

      This depends on quotes you choose.

      • If use single quotes, since nothing in single quotes will be escaped by bash. We can simply type
        sed 's/\\A/-/'
        
      • If use double quotes, we'll need to add one backslash escape a backslash. Consequently, number of backslashes are doubled.
        echo 'hi \A there' | sed "s/\\\\A/-/"
        
  • For \$ to -
    1. The command (string) passed to 'sed'

      It should be s/\\\$/-. The "from" string is \$, but since both \ and $ should be escaped the expected string becomes \\\$.

    2. The string passed to 'bash', i.e. what to type on command line.

      Again, the type of quotes we choose decides if we may need to escape backslashes and dollar sign or not. Hence either of the following is correct:

      sed 's/\\\$/-/'
      sed "s/\\\\\\\$/-/"
      

Exercises:

  • Explain why the commands without quotes listed at the beginning of this blog work/do not work?
  • Why sometimes use one less backslashes also works?
  • Notice that both of the following commands (six and seven backslashes) work. But, if $ is followed by other character(s), using six backslashes will result in error. Why?
    echo 'hi \$name there' | sed "s/\\\\\\\$name/-/"
    

Bash ansi-c quoting

Just FYI. In addition to single quotes and double quotes, bash also supports ansi-c quotes, which can be really handy sometimes.

Words of the form `$'STRING'' are treated specially. The word expands to STRING, with backslash-escaped characters replaced as specified by the ANSI C standard.

$ cmd1=hi; cmd2=there; escaped_newline=$'\\\n'

$ echo "$cmd1 $escaped_newline $cmd2"
hi \
 there


blog comments powered by Disqus