RaminHAL9001 · November 18, 2017 15:54
diff --git a/bash-basics--tokenizing.html b/bash-basics--tokenizing.html
 <html><head>
 <style type="text/css">
 div.top-level {
 	width: 20cm;
 	left-margin: 0.5cm;
 	right-margin: 0.5cm;
 }
 p, ol, td, blockquote {
 	font-family: serif;
 	color: #000000;
 	line-height: 1.5em;
 }
 h1 {
 	font-family: serif;
 	color: #000000;
 }
 h2 {
 	font-family: serif;
 	color: #000000;
 }
 h3 {
 	font-family: serif;
 	color: #000000;
 }
 code {
 	background-color: #F0F0F0;
 	color: #400000;
 }
 pre {
 	background-color: #F0F0F0;
 	color: #000000;
 	line-height: 1.4em;
 }
 .vocabulary-word {
 	font-weight: bold;
 	font-style: oblique;
 	color: #008000;
 	vertical-align: top;
 }
 span.prompt {
 	color: #808080;
 }
 span.file-name, a.file-name {
 	font-family: monospace;
 	text-decoration: underline;
 	color: #204020
 }
 .user-input {
 	color: #400000;
 }
 a.section-link {
 	color: #000080;
 	text-decoration: none;
 }
 a:hover.section-link {
 	color: #0000FF;
 	text-decoration: underline;
 }
 code.single-char {
 	color: #400000;
 	border: 0.0625em solid;
 	border-radius: 0.25em;
 	background-color: #F0F0FF;
 	font-size: 1.4em;
 }
 .output {
 	color: #000040;
 }
 span.keystroke {
 	color: #000000;
 	border: 0.0625em solid;
 	border-radius: 0.25em;
 	font-family: sans-serif;
 	font-style: oblique;
 }
 code.token {
 	color: #0000F8;
 	border: 0.0625em solid;
 	border-radius: 0.25em;
 	background-color: #F0F0FF;
 }
 td.source-code {
 	background-color: #FFFFFF;
 	border: 1px solid black;
 	padding: 2px;
 }
 pre.source-code {
 	background-color: #FFFFFF;
 }
 table.source-code {
 	background-color: #F0F0F0;
 	border: 1px solid black;
 	margin: 20px;
 }
 </style>
 <title>Bash Basics: How The Command Shell "Sees" the Words it Reads</title>
 </head><body>
 <div class="top-level">

 <h1>Bash Basics: How The Command Shell "Sees" the Words it Reads</h1>

 <p>As someone who can read, you may take for granted that every word is
 separated by a space. But to an unintelligent computer program like Bash, the
 process of breaking input into individual words must be defined in computer
 code as a grammatical algorithm.

 <p>Often times it can be very helpful if you understand these rules. When you
 enter a command and Bash does not do what you expected, could it be because it
 is simply reading or <q>understanding</q> your command in a way that you don't
 expect it to? Often this is the problem, and a thorough understanding of how
 Bash actually reads and understands commands can make your life much easier as
 you become more skilled in using Ubuntu, Linux, or MacOS.</p>

 <p>Fortunately, you don't need to be an expert to understand the grammatical
 algorithm of Bash's tokenizing grammar, in fact most of the ordinary Bash token
 grammar is quite simple for anyone to understand, although there are a few
 complicated rules that experts need to worry about, but we will will worry
 about that another day. Lets keep things simple for now:</p>

 <em><b>When you type anything into Bash, the first thing it does is break down what
 you typed into a list of words called <q>tokens</q>.
 </b></em>

 <p>Once Bash has it's list of words (tokens), it then <q>thinks</q> about each
 word one by one. In this lesson, we learn the six most basic rules Bash uses to
 read the command you typed, and how it breaks your command down into tokens
 that it can understand and think about individually. There are actually a few
 more than six rules, but this chapter goes over the most basic rules.</p>

 <ol>
 <li> <a class="section-link" href="#hash_tags">Ignore hash tags.</a>
 <li> <a class="section-link" href="#space_separated">Tokens are separated by spaces (usually).</a>
 <li> <a class="section-link" href="#quotes">Tokens can have spaces in them if they are quoted.</a>
 <li> <a class="section-link" href="#join_adjacent">Tokens that are not separated by spaces are joined together.</a>
 <li> <a class="section-link" href="#special_punctuation">Some punctuation marks are special and are not joined together.</a>
 <li> <a class="section-link" href="#backslash">The backslash turns a special punctuation mark into ordinary token.</a>
 </ol>

 <table class="glossary"><thead class="glossary" colspan="2"><h3>Terminology</h3></thead>
 <tr>
 <td class="vocabulary-word"><span class="vocabulary-word">Token</span>:</td><td>A single, atomic unit of computer code that is constructed
 from a sequence of <q>characters</q> in accordance with the grammatical rules
 of the computer language.</td></tr>
 <td class="vocabulary-word"><span class="vocabulary-word">Character</span>:</td><td>A single letter, number, punctuation mark, or whitespace
 value which contains the smallest amount of human-readable information.</td></tr>
 <td class="vocabulary-word"><span class="vocabulary-word">String</span>:</td><td>A unit of data containing a sequence of characters. A string
 is different from a token in that tokens are elements taken out of a computer
 program according to token grammar rules, whereas a string can contain any data
 without regard for token grammar. In the Bash programming language there is
 almost no practical difference between tokens and strings, but many other
 programming languages do not allow you to treat tokens as strings.</td>
 </tr>
 </table>

 <h2>Before we begin...</h2>

 <p>Bash is everywhere, so it is incredibly easy to open a Terminal window and
 just start experimenting. It takes no effort, and you can do it any time you
 want. So as you read, there is no need take this instruction as mere computer
 science theory, you can actually put theory into practice!</p>

 <p>So before we begin learning the rules, here is a two-line Bash program you
 can try right now to experiment with the various examples below. Enter this
 code into the TextEdit program and save the file as <q><a name="#tokenizer.sh" class="file-name">tokenizer.sh</a></q> in your
 Home folder: </p>

 <table class="source-code">
 <thead>
 <tr><td>
 &#x1F5CE; <a name="tokenizer.sh"><span class="file-name">~/tokenizer.sh</code></a></span>
 </td></tr></thead>
 <tr><td class="source-code">
 <pre class="source-code">
 #!/bin/bash
 ( for x in "${@}" ; do echo "$x"; done; ) | cat -n
 </pre>
 </td></tr>
 </table>

 <p>Now open Terminal and check if you did it right:</p>

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">cat tokenizer.sh</span>
 <span class="output">#!/bin/bash</span>
 <span class="output">( for x in "${@}" ; do echo "$x"; done; ) | cat -n</span>
 </pre>

 <p>If instead you get an error:</p>

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">cat tokenizer.sh</span>
 <span class="output">cat: tokenizer.sh: No such file or directory</span>
 </pre>

 <p>then please check to make sure you saved the <q><a href="#tokenizer.sh" class="file-name">tokenizer.sh</a></q> in the Home
 folder, or else use the <code>cd</code> command to change to the directory in
 which you did save the <q><a href="#tokenizer.sh" class="file-name">tokenizer.sh</a></q> file.
 </p>

 <p>Lets try running the <q><a href="#tokenizer.sh" class="file-name">tokenizer.sh</a></q> in the Terminal. Enter the text
 <code class="user-input">bash tokenizer.sh This is an example.</code> as the
 command text, the rest will be generated by Bash as soon as you press the enter
 key. The whole interaction will look like this in your Terminal window:</p>

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">bash tokenizer.sh This is an example.</span>
 <span class="output">       1. This</span>
 <span class="output">       2. is</span>
 <span class="output">       3. an</span>
 <span class="output">       4. example.</span>
 <span class="prompt">YourName@ComputerName:~$ </span>
 </pre>

 <p>Did it work? Great! So from now on, if you see an example in the text below,
 which looks like this:</p>

 <div><code>This is an example.</code></div>

 <ol>
 <li><code class="token">This</code></li>
 <li><code class="token">is</code></li>
 <li><code class="token">an</code></li>
 <li><code class="token">example.</code></li>
 </ol>

 <p>don't be afraid to try the example out using <q><span class="file-name">tokenizer.sh</span></q>.</p>

 <h3>If you get stuck...</h3>

 <p>Never forget that <span class="keystroke">Ctrl C</span> will <B>C</b>ancel
 the command you were typing and let you start again from nothing.</p>

 <p>Occasionally you may mis-type something and Bash will freeze. For example,
 if you type an apostrophe <code class="single-char">'</code> all alone: </p>

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">bash tokenizer.sh Why won't this work?</span>
 <span class="prompt">&gt; </span>
 <span class="prompt">&gt; </span>
 <span class="prompt">&gt; </span><span class="user-input"> echo try again</span>
 <span class="prompt">&gt; </span>
 <span class="prompt">&gt; </span>
 <span class="prompt">&gt; </span><span class="user-input"> askdjaczxca</span>
 <span class="prompt">&gt; </span><span class="user-input"> jasiuhq fidjfn ZXzxucaccas asdasdf </span>
 <span class="prompt">&gt; </span>
 <span class="prompt">&gt; </span>
 <span class="prompt">&gt; </span><span class="user-input"> aaaaaaaaaaaaaaaaaaaaaaaaaaaa </span>
 <span class="prompt">&gt; </span>
 </pre>

 <p>I kept pressing <span class="keystroke">Enter</span> but the command prompt
 never came back, all I got was the <code class="single-char">&gt;</code>
 symbols, and it wouldn't do anything! What is happening here is that the
 apostrophe is actually a opening single-quote character (discussed in <a
 href="#quotes" class="section-link">rule #3</a>) and Bash waits for you to write the closing
 single-quote character; it waits even after you press the Enter key.
 Double-quotes <code class="single-char">&quot;</code> will cause the same
 problem, as will parentheses or <code class="single-char">(</code> brackets
 <code class="single-char">{</code>, as we will see with <a class="section-link"
 href="#special_punctuation">rule #5</a>.</p>

 <p>If you ever make this mistake, <span
 class="keystroke">Ctrl C</span> is your friend.
 </p>

 <h3>So lets get started learning about the Bash tokenizer rules!</h3>

 <hr />

 <h2><a name="hash_tags">Rule 1: Ignore hash tags</a></h2>

 <p>This is the simplest rule: the <code class="single-char">#</code> character
 is ignored.  You can use this to write comments to yourself:</p>

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">#This line starts with a hash tag. It does absolutely nothing.</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">#But you can use a hash tag in the middle of a command as well:</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo Some people #just don't</span>
 <span class="output">Some people</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo know how #useful Bash can be</span>
 <span class="output">know how</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo to use a computer #with a command line interface.</span>
 <span class="output">to use a computer</span>
 <span class="prompt">YourName@ComputerName:~$ </span>
 </pre>

 <hr />

 <h2><a name="space_separated">Rule 2: Words are separated by spaces (usually).</a></h2>

 <p>When you type <q><code>This is some text.</code></q> into Bash, what are the
 list of tokens it sees? Well, first thing Bash will do is just tear the sentence
 up along the spaces between the tokens:</p>

 <div class="user-input"><code>This is some text.</code></div>
 <ol>
 <li><code class="token">This</code></li>
 <li><code class="token">is</code></li>
 <li><code class="token">some</code></li>
 <li><code class="token">text.</code></li>
 </ol>

 <p>But do you see how <q><code class="token">text.</code></q> has a dot after it? This is
 because there is no space between the token "text" and the dot. That means the
 dot is part of the token. So what would happen if you wrote a space between the
 token <q><code>text</code></q> and the dot:
 <q><code>This is some text .</code></q>?
 Well then, Bash would see this list of tokens:</p>

 <div class="user-input"><code>This is some text .</code></div>
 <ol>
 <li><code class="token">This</code></li>
 <li><code class="token">is</code></li>
 <li><code class="token">some</code></li>
 <li><code class="token">text</code></li>
 <li><code class="token">.</code></li>
 </ol>

 <p>With a space between the token <q><code>text</code></q> and the dot, the dot
 becomes it's own token.

 <blockquote class="remember">
 <b>Good to remember:</b> while Bash does not think the lone dot <code
 class="single-char">.</code> is special, the dot <em>could</em> be special to
 functions Bash is using, like the <code class="user-input">ls</code> function. For
 some functions, the the token means <q><b>right here</b>,</q> as in, <q>save a
 file <b>right here</b>.</q> Other times, dot just means a dot, like when it is
 part of a file's name, e.g. <code class="output">photo.jpg</code>. But in the
 Bash language, dots (and also commas) have no special grammatical meaning, they
 are just part of ordinary tokens, and get mixed together with other tokens
 according to the usual tokenizer rules.
 </blockquote>

 <p>Words are made of letters, numbers and the non-special punctuation marks discussed below.</p>

 <div><code>This sentence has 7 tokens in it #and this is ignored.</code></div>

 <ol>
 <li><code class="token">This</code></li>
 <li><code class="token">sentence</code></li>
 <li><code class="token">has</code></li>
 <li><code class="token">7</code></li>
 <li><code class="token">tokens</code></li>
 <li><code class="token">in</code></code></li>
 <li><code class="token">it</code></li>
 </ol>

 <hr />

 <h2><a name="quotes">Rule 3: Quoted tokens can have spaces in them</a></h2>

 <p>It is often useful to tell bash to use a whole bunch of tokens as just one
 token. This comes in handy when telling Bash to use a file, where the file name
 has spaces in it. To do this, we use a single-quote character, also known as
 the apostrophe, for example:

 <div class="user-input"><code>His exact words were, 'Yes, I think so.'</code></p></div>
 <ol>
 <li><code class="token">His</code></li>
 <li><code class="token">exact</code></li>
 <li><code class="token">words</code></li>
 <li><code class="token">were,</code></li>
 <li><code class="token">Yes, I think so.</code></li>
 </ol>

 <p>The fifth token above is everything between the single-quotes. Notice that
 the single-quotes do not exist in the token itself. Dots and commas have no
 special grammatical meaning to Bash, but single-quotes do. A single-quote says
 to Bash, <b>take all the letters you see until the next single-quote and treat
 them as one big token,</b> and remove the single-quotes.</p>

 <p>Which character is more powerful, the hash <code
 class="single-char">#</code> or the single quote <code
 class="single-char">'</code>? The answer is: <u>which ever one comes first</u>
 is the one Bash uses. Remember what happened when we tried this command:
 <code>echo ### Welcome! ###</code>?  The first hash <code
 class="single-char">#</code> character commented everything after it. Lets try
 this again
 with the single-quote character:</p>

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo '### Welcome! ###'</span>
 <span class="output">### Welcome! ###</span>
 <span class="prompt">YourName@ComputerName:~$ </span>
 </pre>

 <p>Putting the <code class="token">### Welcome! ###</code> inside of the
 single-quotes made the hash characters into part of the token. But if we flipped
 it around and wrote the hash character before the single-quote, the hash would
 win:</p>

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo ### 'Welcome!' ###</span>
 <span class="output"></span>
 <span class="prompt">YourName@ComputerName:~$ </span>
 </pre>

 <p>It is also possible to use a double-quote <code class="single-char">&quot;</code>
 character to construct tokens with spaces, <b>but be careful!</b> Double-quote
 characters have an entirely different set of rules they follow when
 constructing tokens. For simple tokens, they behave like single-quote <code
 class="token">'</code> characters. Let's retry the <q>Yes, I think so.</q>
 example above but with double-quotes <code class="token">&quot;</code> instead
 of single quotes:
 </p>

 <div class="user-input"><code>His exact words were, "Yes, I think so."</code></p></div>
 <ol>
 <li><code class="token">His</code></li>
 <li><code class="token">exact</code></li>
 <li><code class="token">words</code></li>
 <li><code class="token">were,</code></li>
 <li><code class="token">Yes, I think so.</code></li>
 </ol>

 <b>However</b>, things start to go wrong if you aren't careful when you make
 tokens using double-quotes <code class="single-char">&quot;</code>:

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo "It costs less than $5 in the USA."</span>
 <span class="output">It costs less than  in the USA.</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo "Hello, world!"</span>
 <span class="output">Hello, world!</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo "Hello, world!!"</span>
 <span class="output">echo "Hello, worldecho "Hello, world!""</span>
 <span class="output">Hello, worldecho Hello, world!</span>
 </pre>

 <p>Double-quotes <code class="single-char">&quot;</code> tokens are used to expand
 variables into character strings, a function known as
 <q><a href="https://en.wikipedia.org/wiki/String_interpolation">String
 Interpolation</a></q>. We will talk more about string interpolation in the
 lesson about Bash variables, but here is a quick preview of what double-quote
 tokens can do when used correctly:
 </p>

 <pre>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">ITEM='Apple Cinnamon Cappuccino'</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">COST=3.95</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo "You can buy a delicious ${ITEM} for only \$${COST}"\!</span>
 <span class="output">You can buy a delicious Apple Cinnamon Cappuccino for only $3.95!</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input"># Lets try the exact same thing with single quotes...</span>
 <span class="prompt">YourName@ComputerName:~$ </span><span class="user-input">echo 'You can buy a delicious ${ITEM} for only \$${COST}'\!</span>
 <span class="output">You can buy a delicious ${ITEM} for only \$${COST}!</span>
 </pre>

 <hr />

 <h2><a name="join_adjacent">Rule 4: Tokens not separated by spaces are joined together into a single token.</a></h2>

 <p>How is this different from <a class="section-link" href="#space_separated">rule #2</a>? If you have two tokens, like a number
 and a word, right next to each other, for example <q><code>123hello</code></q>,
 it is probably obvious to you that Bash will treat treat those as a single
 token.  But what about input like this:

 <div><code>'Working hard?''Hardly working!'</code></div>

 <p>Will this be two tokens or just one token? The answer is: <u>one token</u>,
 because there is no space between the first and second single-quoted tokens.
 The two individual tokens:</p>

 <p>
 <code class="token">Working hard?</code> <code
 class="token">Hardly working!</code>
 </p>

 <p>are joined into a single token.</p>

 <ol>
 <li><code class="token">Working hard?Hardly working!</code>
 </ol>

 <p>Each token is a single-quoted token, but there is no space between the two
 single-quoted tokens, so Bash joins these two tokens into one big token. This
 is a very useful feature which will come up again in the lesson about
 variables.
 </p>

 <p>But what happens if we type something like this:

 <div><code>He won't do it because he doesn't even know how.</code></div>

 <ol>
 <li><code class="token">He</code>
 <li><code class="token">wont do it because he doesnt</code>
 <li><code class="token">even</code>
 <li><code class="token">know</code>
 <li><code class="token">how.</code>
 </ol>

 <p>Only 5 tokens. Why? Because Bash thinks the apostrophes in the tokens
 <q><i>won't</i></q> and <q><i>doesn't</i></q> are actually single-quotes, and
 all of the letters and spaces between those single-quotes, <q><code>'t do it
 because he doesn'</code></q> becomes one long token <q><code class="token">t do
 it because he doesn</code></q>. So the input <q><code>won't do it because he
 doesn't</code></q> is actually a three-part token:</p> <div><code
 class="token">won</code> <code class="token">t do it because he doesn</code>
 <code class="token">t</code></div> <p>And since there is no space between these
 tokens, they are all joined into one big token, as you can see above.</p>

 <hr />

 <h2><a name="special_punctuation">Rule 5: Most punctuation marks have special grammatical meaning</a></h2>

 <p>We have seen how the hash <code class="single-char">#</code> and
 single-quote <code class="single-char">'</code> characters have special
 grammatical meaning to bash. It is important to note that most punctuation
 marks have special grammatical meaning.</p>

 <p><b>That means, never use the following characters without quoting them</b>
 unless you know what special thing they do. (Listed for reference, don't worry
 about what this means for now)</p>

 <table>
 <tbody>
 	<tr><td valign="top"><code class="token">#</code></td><td valign="top">Hash</td><td valign="top">&mdash; Comment</td></tr>
 	<tr><td valign="top"><code class="token">'</code></td><td valign="top">Single Quote</td><td valign="top">&mdash; String delimiter</td></tr>
 	<tr><td valign="top"><code class="token">"</code></td><td valign="top">Double Quote</td><td valign="top">&mdash; Interpolating string delimiter</td></tr>
 	<tr><td valign="top"><code class="token">`</code></td><td valign="top">Back Quote</td><td valign="top">&mdash; Sub-process expansion</td></tr>
 	<tr><td valign="top"><code class="token">\</code></td><td valign="top">Backslash</td><td valign="top">&mdash; Escape special character</td></tr>
 	<tr><td valign="top"><code class="token">$</code></td><td valign="top">Dollar Sign</td><td valign="top">&mdash; Variable dereferencing</td></tr>
 	<tr><td valign="top"><code class="token">*</code></td><td valign="top">Asterisk</td><td valign="top">&mdash; Glob (a.k.a. Wildcard) pattern</td></tr>
 	<tr><td valign="top"><code class="token">&amp;</code></td><td valign="top">Ampersand</td><td valign="top">&mdash; Launch background command</td></tr>
 	<tr><td valign="top"><code class="token">&semi;</code></td><td valign="top">Semicolon</td><td valign="top">&mdash; Command delimiter</td></tr>
 	<tr><td valign="top"><code class="token">&lt;</code></td><td valign="top">Less Than</td><td valign="top">&mdash; Pull stream input from file</td></tr>
 	<tr><td valign="top"><code class="token">&gt;</code></td><td valign="top">Greater Than</td><td valign="top">&mdash; Push stream output to file</td></tr>
 	<tr><td valign="top"><code class="token">=</code></td><td valign="top">Equal Sign</td><td valign="top">&mdash; Assign variable</td></tr>
 	<tr><td valign="top"><code class="token">|</code></td><td valign="top">Pipe</td><td valign="top">&mdash; Command pipeline constructor</td></tr>
 	<tr><td valign="top"><code class="token">(</code></td><td valign="top">Open Round Bracket</td><td valign="top">&mdash; Sub-process command delimiter</td></tr>
 	<tr><td valign="top"><code class="token">)</code></td><td valign="top">Close Round Bracket</td><td valign="top">&mdash; Sub-process command delimiter</td></tr>
 	<tr><td valign="top"><code class="token">{</code></td><td valign="top">Open Curly Brackets</td><td valign="top">&mdash; Token choice pattern, or subroutine delimiter</td></tr>
 	<tr><td valign="top"><code class="token">}</code></td><td valign="top">Close Curly Brackets</td><td valign="top">&mdash; Token choice pattern, or subroutine delimiter</td></tr>
 </tbody>
 </table>

 <p>Some characters are <b>sometimes</b> special and sometimes not. Avoid using
 the following characters (again, unless you know what special thing they
 do):</p>

 <table>
 <tbody>
 	<tr><td valign="top"><code class="single-char">%</code></td><td valign="top">Percent</td><td valign="top">&mdash; Background process selector (only special when used alone or with a number)</td></tr>
 	<tr><td valign="top"><code class="single-char">~</code></td><td valign="top">Tilde</td><td valign="top">&mdash; Abbreviation for home directory (only special at the start of a non-quoted token)</td></tr>
 	<tr><td valign="top"><code class="single-char">!</code></td><td valign="top">Exclamation Point</td><td valign="top">&mdash; Command history selection (only special when used alone, or with a number)</td></tr>
 	<tr><td valign="top"><code class="single-char">[</code></td><td valign="top">Open Square Bracket</td><td valign="top">&mdash; Character set pattern (only special when files matching the pattern exist)</td></tr>
 	<tr><td valign="top"><code class="single-char">]</code></td><td valign="top">Close Square Bracket</td><td valign="top">&mdash; Character set pattern (only special when files matching the pattern exist)</td></tr>
 </tbody>
 </table>

 <p>All other punctuation marks are used as parts of tokens, or become their own
 token if they are separated by spaces. <b>So it is OK to use the following characters:</b></p>

 <table>
 <tbody>
 	<tr><td valign="top"><code class="single-char">_</code></td><td>Underscore</td></tr>
 	<tr><td valign="top"><code class="single-char">+</code></td><td>Plus Sign</td></tr>
 	<tr><td valign="top"><code class="single-char">-</code></td><td>Minus Sign</td></tr>
 	<tr><td valign="top"><code class="single-char">@</code></td><td>At Sign</td></tr>
 	<tr><td valign="top"><code class="single-char">/</code></td><td>Slash</td></tr>
 	<tr><td valign="top"><code class="single-char">:</code></td><td>Colon</td></tr>
 	<tr><td valign="top"><code class="single-char">.</code></td><td>Dot</td></tr>
 	<tr><td valign="top"><code class="single-char">,</code></td><td>Comma</td></tr>
 	<tr><td valign="top"><code class="single-char">^</code></td><td>Carrot</td></tr>
 </tbody>
 </table>

 <p>So lets see the kind of tokens we can make with the non-special characters:</p>

 <div><code class="user-input">one/two/three four-five-six seven.eight.nine@ 10:11 p.m. me+my_date</code></div>
 <ol>
 <li><code class="token">one/two/three</code></li>
 <li><code class="token">four-five-six</code></li>
 <li><code class="token">seven.eight.nine@</code></li>
 <li><code class="token">10:11</code></li>
 <li><code class="token">p.m.</code></li>
 <li><code class="token">me+my_date</code></li>
 </ol>

 <p>The non-special characters are just a part of the token in which they
 appear, as if they were no different from a letter or number. But the spaces
 between the words still separate tokens according to <a class="section-link" href="#space_separated">rule #2</a>.</p>

 <h3>Rule #5.1: special tokens do not join with other tokens</h3>

 <p>So <a class="section-link" href="#join_adjacent">rule #4</a> does not apply to special tokens.
 Lets take a quick look at how special tokens are tokenized.  </p>

 <div><code>if(true);then{echo yes;}fi|cat -n;</code></div>

 <p>In this example, there are several special characters used:
 <code class="single-char">&semi;</code>, 
 <code class="single-char">|</code>, 
 <code class="single-char">(</code>, 
 <code class="single-char">)</code>, 
 <code class="single-char">{</code>, and
 <code class="single-char">}</code> (the hyphen
 <code class="single-char">-</code> is not special). So how do you think this will tokenize?
 </p>

 <blockquote class="notice">
 <b>BE AWARE</b> that this example will <b>NOT</b> work with the <q><a href="#tokenizer.sh" class="file-name">tokenizer.sh</a></q> program. If
 you do try it, it will report an error:
 <pre>
 <span class="output">bash: syntax error near unexpected token `('</span>
 </pre>
 </blockquote>

 <p>The answer is that it will tokenize like this (but again don't try this with
 <q><a href="#tokenizer.sh" class="file-name">tokenizer.sh</a></q>):</p>

 <ol>
 <li><code class="token">if</code></li>
 <li><code class="token">(</code></li>
 <li><code class="token">true</code></li>
 <li><code class="token">)</code></li>
 <li><code class="token">then</code></li>
 <li><code class="token">{</code></li>
 <li><code class="token">echo</code></li>
 <li><code class="token">yes</code></li>
 <li><code class="token">&semi;</code></li>
 <li><code class="token">}</code></li>
 <li><code class="token">fi</code></li>
 <li><code class="token">|</code></li>
 <li><code class="token">cat</code></li>
 <li><code class="token">-n</code></li>
 <li><code class="token">&semi;</code></li>
 </ol>

 <p>However these tokens are swept up by Bash and <b>immediately</b> crunched
 into something else (in this case, a "conditional statement"), and this happens
 even before the tokens are handed off to other programs like our
 </q><a href="#tokenizer.sh" class="file-name">tokenizer.sh</a></q> program.  There is a more advanced Bash grammar that
 occurs after the tokenization step which allows you to control if and when
 certain commands are run, which we will learn more about in another lesson.</p>

 <hr />

 <h2><a name="backslash">Rule 6: Backslash makes a special punctuation mark ordinary</a></h2>

 <p>The last rule to remember is that all of the above mentioned special
 characters become ordinary tokens, or parts of tokens, if they follow a
 backslash <code class="single-char">\</code>. For example, if you want a token
 to contain an apostrophe without bash thinking it is a single-quote, you could
 write this:

 <div><code>I won\'t make that mistake again.</code></div>

 <ol>
 <li><code class="token">I</code></li>
 <li><code class="token">won't</code></li>
 <li><code class="token">make</code></li>
 <li><code class="token">that</code></li>
 <li><code class="token">mistake</code></li>
 <li><code class="token">again.</code></li>
 </ol>

 <div><code>He won\'t do it because he doesn\'t even know how.</code></div>

 <ol>
 <li><code class="token">He</code></li>
 <li><code class="token">won't</code></li>
 <li><code class="token">do</code></li>
 <li><code class="token">it</code></li>
 <li><code class="token">because</code></li>
 <li><code class="token">he</code></li>
 <li><code class="token">doesn't</code></li>
 <li><code class="token">even</code></li>
 <li><code class="token">know</code></li>
 <li><code class="token">how.</code></li>
 </ol>

 <p>Any character at all, even spaces and hash tags, can be made to be part of a
 token with the backslash:</p>

 <div><code>This\ is\ one\ long\ token. \# These are separate tokens.</code></div>

 <ol>
 <li><code class="token">This is one long token.</code></li>
 <li><code class="token">#</code></li>
 <li><code class="token">These</code></li>
 <li><code class="token">are</code></li>
 <li><code class="token">separate</code></li>
 <li><code class="token">tokens.</code></li>
 </ol>

 <p>Notice above that there is no backslash right after the <code
 class="token">token.</code> token, so the space is not "escaped" by the
 backslash, and the token breaks there. All preceding tokens are joined
 together into one large token.
 </p>

 <p>But a backslash is only good for one character:</p>

 <div><code>With single-quotes '###' but with a backslash \### all the rest is ignored.</code></div>

 <ol>
 <li><code class="token">With</code></li>
 <li><code class="token">single-quotes</code></li>
 <li><code class="token">###</code></li>
 <li><code class="token">but</code></li>
 <li><code class="token">with</code></li>
 <li><code class="token">a</code></li>
 <li><code class="token">backslash</code></li>
 <li><code class="token">#</code></li>
 </ol>

 <p>The backslash only worked it's magic on the first hash <code
 class="single-char">#</code> character, the one after it was ignored.</p>

 </p>How would we write an apostrophe in the middle of a single-quoted token? Like this:
 <div><code>She said, 'Well isn'\''t that something!'</code></div>
 <ol>
 <li><code class="token">She</code>
 <li><code class="token">said,</code>
 <li><code class="token">Well isn't that something!</code>
 </ol>
 </p>

 <p>Why? Because the string
 <code class="user-input">'Well isn'\''t that something!'</code>
 contains three tokens
 <div>
 <code class="token">Well isn</code>&nbsp;<code class="token">'</code>&nbsp;<code class="token">t that something!</code>
 </div>
 which are not separated by white spaces, so the three tokens are joined into
 one according to <a class="section-link" href="#join_adjacent">rule #4</a>.
 </p>

 </p>And backslashes treat <em>themselves</em> as ordinary tokens as well. That is to
 say, if a one backslash is followed by a second backslash, the second backslash
 is treated as an ordinary token. For sequences of backslashes, every two backslash
 <code class="single-char">\</code><code class="single-char">\</code>
 characters become a single backslash character.
 <code class="single-char">\</code>
 <div><code>1 \\ 2 \\\\ 3 \\\\\\ 4 \\\\\\\\ 5 \\\\\\\\\\</code></div>
 <ol>
 <li><code class="token">1</code>
 <li><code class="token">\</code>
 <li><code class="token">2</code>
 <li><code class="token">\\</code>
 <li><code class="token">3</code>
 <li><code class="token">\\\</code>
 <li><code class="token">4</code>
 <li><code class="token">\\\\</code>
 <li><code class="token">5</code>
 <li><code class="token">\\\\\</code>
 </ol>

 </p>

 <hr />

 <h2>Conclusion</h2>

 <p>So those are the most fundamental tokenizer rules for bash. There will be more
 rules, but these are the most important to remember. Usually, if we just never
 use file names with spaces or punctuation marks in them, we never have to worry
 about single-quoting or backslashes, and our life becomes easier. We can just
 write tokens as they are and Bash will work as we expect it to.</p>

 <p>This is why Linux and UNIX programmers like to name files like this:
 <q><code>a-file-name-should-never-have-spaces.txt</code></q>. Because they use
 Bash, and working with file names in Bash can get a bit tedious if they have
 spaces or special characters in their name.</p>

 <p>So here are all the basic tokenizer rules in Bash in a handy table which you
 may want to keep in your notebook.</p>

 <table>
 <tr><td>1. Ignore hash tags</td><td><code>token token token # ignored ignored ignored</code></td></tr>
 <tr><td>2. tokens are separated by spaces</td><td><code>this sentence has 7 tokens in it</code></td></tr>
 <tr><td>3. tokens can have spaces.</td><td><code>'this is just one token'</code> <code>this\ is\ also\ just\ one\ token</code></td></tr>
 <tr><td>4. tokens not separated by spaces are joined together into a single token.</td><td><code>firsttoken</code> <code>second' token'</code> <code>'third''token'</code> <code>fourth\ 'token'</code></td></tr>
 <tr><td>5. Most punctuation marks have special meaning.</td><td>Special characters are: <code class="token">#</code> <code class="token">'</code> <code class="token">"</code> <code class="token">`</code> <code class="token">\</code> <code class="token">%</code> <code class="token">$</code> <code class="token">*</code> <code class="token">&amp;</code> <code class="token">&semi;</code> <code class="token">!</code> <code class="token">~</code> <code class="token">&lt;</code> <code class="token">&gt;</code> <code class="token">=</code> <code class="token">|</code> <code class="token">(</code> <code class="token">)</code> <code class="token">{</code> <code class="token">}</code></td></tr>
 <tr><td>6. The backslash makes special characters ordinary.</td><td>You can enter <q><code>\#</code></q> or <q><code>\'</code></q> to use those characters alone.</td></tr>
 </table>

 <hr />

 <H5>Copyright &copy; Ramin Honary 2017. This document is published under the <a
 href="https://creativecommons.org/license/by-nc/3.0/legalcode">Creative Commons
 Attribution-NonCommercial 3.0 Unported</a> license. The markup source code is
 available on GitHub at <a
 href="https://gist.github.com/RaminHAL9001/22bcb9c32786f089fb973444adfa0619">https://gist.github.com/RaminHAL9001/22bcb9c32786f089fb973444adfa0619</a>.</H4>

 </div>
 </body></html>