22 May 2015

The BASH extension of "[", i.e. "[[", supports an additional operator "=~" for regexp matching. It is convenient, but with a pitfall. This blog shows how to make use of this operator and also demonstrates the pitfall.

Basic Usage

Just like other binary operators:

if [[ STRING =~ PATTERN ]]; then
    match
else
    does_not_match
fi

For example:

if [[ "$blade_list" =~ \<1\> ]]; then
    # \< and \> stand for the beginning/end of words respectively
    echo "blade 1 is in blade list"
fi

The Caveat

One thing really to keep in mind is NOT to quote the pattern since

Any part of the pattern may be quoted to force it to be matched as a string.

In cases which we are composing long and complicated regular expressions, this restriction can incur excessive use of backslashes (for escaping), which renders the resulted regular expression hard to read. Even worse, sometimes the backslashes do not help.

if [[ "$blade_list" =~ "\<1\>" ]]; then # OOPS, does NOT work
    echo "blade 1 is in the list"
fi

if [[ "$blade_list" =~ \\\<1\\\> ]]; then # does NOT work either
    echo "blade 1 is in the list"
fi

A simple but effective way to circumvent this is to always use a variable instead of the string of pattern directly. That is:

pattern="\<1\>"
if [[ "$blade_list" =~ $pattern ]]; then
    echo "blade 1 is in the list"
fi

Another way, which I do not recommend, is to shopt -s compat31 (Related Shell Options).

A Real-world Example

Given a number (in fact, in our application, it is the slot number of a blade), we need to tell if it is in a given list or not. This feature allows us to achieve that without loop.

function is_member_of {
    local slot="$1"
    local slot_list="$2"
    local pattern="\<${slot}\>"

    if [[ "$slot_list" =~ $pattern ]]; then
        return 0
    else
        return 1
    fi
}

Get Matched Substrings

This involves the array variable "BASH_REMATCH", usage of which is very similar to its counterparts in other programming languages.

pattern='(^[^-]*)-(.*)$'
if [[ "$name" =~ $pattern ]]; then
    echo "prefix:   ${BASH_REMATCH[1]}"
    echo "hostname: ${BASH_REMATCH[2]}"
fi

Related Shell Options

compat31
changes its behavior to that of version 3.1 with respect to quoted arguments to the conditional command's "=~" operator.
nocasematch
matches patterns in a case-insensitive fashion


blog comments powered by Disqus