Last active
November 1, 2024 18:55
Revisions
-
glv revised this gist
Apr 5, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,7 +18,7 @@ documentation is different ways, so it's hard to even map between them easily. T table is aimed at making it easier. (The [Regular-Expressions.info][rei] site does support both of these flavors of regular expressions, and lets you do side-by-side comparisons, although [finding the Postgres flavor][reip] isn't obvious. But I find it helpful to have all of the information on one page; and the process of building this file taught me a lot about both flavors of regular expression.) -
glv revised this gist
Apr 5, 2018 . 1 changed file with 8 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,6 +17,14 @@ the syntaxes that are hard to remember. Furthermore, the two systems organize th documentation is different ways, so it's hard to even map between them easily. This table is aimed at making it easier. (The [Regular-Expressions.info][rei] site does support both of these flavors of regular expressions, and lets you do side-by-side comparisons, although [finding the Postgres][reip] flavor isn't obvious. But I find it helpful to have all of the information on one page; and the process of building this file taught me a lot about both flavors of regular expression.) [rei]: http://www.regular-expressions.info/ [reip]: http://www.regular-expressions.info/postgresql.html ## Operators and Functions | Ruby | Postgres | Explanation | -
glv revised this gist
Apr 5, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,7 +12,7 @@ in both environments, so that one can reliably query for data in a particular format and then transform the results in Ruby using the same pattern. In any case, if you're working in a Postgres-based Rails app, you occasionally need to work with both kinds of regular expression. Unfortunately, there are subtle differences between the syntaxes that are hard to remember. Furthermore, the two systems organize their regexp documentation is different ways, so it's hard to even map between them easily. This table is aimed at making it easier. -
glv revised this gist
Apr 5, 2018 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,8 +12,8 @@ in both environments, so that one can reliably query for data in a particular format and then transform the results in Ruby using the same pattern. In any case, if you're working in a Postgres-based Rails app, you occasionally need to work with both kinds of regular expression, but there are subtle differences between the syntaxes that are hard to remember. Furthermore, the two systems organize their regexp documentation is different ways, so it's hard to even map between them easily. This table is aimed at making it easier. -
glv revised this gist
Apr 5, 2018 . 1 changed file with 5 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -11,9 +11,11 @@ And sometimes, it's very useful to have a single regular expression that works in both environments, so that one can reliably query for data in a particular format and then transform the results in Ruby using the same pattern. In any case, if you're working in a Postgres-based Rails app, you occasionally need to work with both kinds of regular expression, Bbut there are subtle differences between the syntaxes that are hard to remember, and the two systems organize their regexp documentation is different ways, so it's hard to even map between them easily. This table is aimed at making it easier. ## Operators and Functions -
glv revised this gist
Apr 5, 2018 . 1 changed file with 3 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -287,11 +287,11 @@ Do not assume that an option means the same thing in Ruby and Postgres just beca | e | | regexp is encoded as EUC-JP | | s | | regexp is encoded as Windows-31J | | n | | regexp is encoded as ASCII-8BIT | | | \*\*\*: | (at beginning of pattern) the rest of the pattern is an ARE (Postgres advanced regular expression) | | | \*\*\*= | (at beginning of pattern) the rest of the pattern is a literal string | | | b | rest of RE is a [BRE](https://www.postgresql.org/docs/9.5/static/functions-matching.html#POSIX-BASIC-REGEXES) (POSIX basic regular expression) | | | c | case-sensitive matching (overrides operator type) | | | e | rest of RE is an ERE (POSIX extended regular experession) | | | i | case-insensitive matching (overrides operator type) | | | m | historical synonym for n | | | n | newline-sensitive matching | -
glv revised this gist
Apr 3, 2018 . 1 changed file with 5 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -272,6 +272,11 @@ In Ruby, the single-letter version of an option can be specified after the closi In Postgres, the options can be included at the very start of an expression (possibly after an initial `***=`) using the syntax <code>(?*opts*)</code>; the options included in the string *opts* are in effect for the entire regular expression. The options can also be included in a string passed as a parameter to various pattern-related functions. Do not assume that an option means the same thing in Ruby and Postgres just because it has the same letter! Consult the details of [Ruby][rubyopts] and [Postgres][pgopts] options. [rubyopts]: http://ruby-doc.org/core-2.4.0/Regexp.html#class-Regexp-label-Options "Ruby Regexp options" [pgopts]: https://www.postgresql.org/docs/9.5/static/functions-matching.html#POSIX-MATCHING-RULES "PostgreSQL regular expression matching rules" | Ruby | Postgres | Explanation | |-|-|-| | i or `Regexp::IGNORECASE` | | case-insensitive | -
glv revised this gist
Apr 3, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -165,7 +165,7 @@ In Postgres, these can be used in the second argument to `regexp_replace`. | `$&`, `\&`, `md[0]` | `\&` | the complete matched text | | <code>$\`</code>, <code>\\\`</code>, `md.pre_match` | | the text of the string preceding the match | | `$'`, `\'`, `md.post_match` | | the text of the string after the match | | `$1`, `\1`, `md[1]` | `\1` | the first capture group (and so on for other numbered capture groups) | | `$+`, `\+` | | the last capture group | | <code>\k<*name*></code>, <code>md[:*name*]</code> | | the named capture group with name *name* | -
glv revised this gist
Apr 3, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -165,7 +165,7 @@ In Postgres, these can be used in the second argument to `regexp_replace`. | `$&`, `\&`, `md[0]` | `\&` | the complete matched text | | <code>$\`</code>, <code>\\\`</code>, `md.pre_match` | | the text of the string preceding the match | | `$'`, `\'`, `md.post_match` | | the text of the string after the match | | `$*n*`, `\*n*`, `md[*n*]` | `\*n*` | numbered capture group *n* | | `$+`, `\+` | | the last capture group | | <code>\k<*name*></code>, <code>md[:*name*]</code> | | the named capture group with name *name* | -
glv revised this gist
Apr 3, 2018 . 1 changed file with 21 additions and 15 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,12 @@ # Ruby and Postgres Regular Expression Syntaxes [Ruby's regular expressions][rubyre] are unusually powerful. [Postgres' regular expressions][pgre] are not as powerful, but they come close; close enough that it's possible to do many pattern-based queries and string transformations entirely in a query. [rubyre]: http://ruby-doc.org/core-2.4.0/Regexp.html "Ruby Regexp class reference" [pgre]: https://www.postgresql.org/docs/9.5/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP "Postgres regular expression reference" And sometimes, it's very useful to have a single regular expression that works in both environments, so that one can reliably query for data in a particular format and then transform the results in Ruby using the same pattern. @@ -72,6 +75,9 @@ Atoms match a sequence of one or more characters (or *zero* or more characters, Quantifiers can follow atoms, and they change the number of occurrences of the atom that can be matched. (Without a subsequent quantifier, an atom will always match *exactly one* consecutive occurrence.) *Greedy* quantifiers match as many occurrences as possible while still allowing the overall match to succeed. By contrast, *lazy* quantifiers match the fewest number of occurrences as possible for overall success. A *possessive* quantifier does not backtrack once it has matched. It behaves like a greedy quantifier, but having matched it refuses to “give up” its match even if this jeopardises the overall match. This is sometimes helpful for optimizing the performance of a regular expression. | Ruby | Postgres | Explanation | |-|-|-| | * | * | zero or more (greedy) | @@ -81,20 +87,20 @@ Quantifiers can follow atoms, and they change the number of occurrences of the a | {*m*,} | {*m*,} | at least *m* (greedy) | | {,*m*} | | at most *m* (greedy) | | {*m*,*n*} | {*m*,*n*} | at least *m* but no more than *n* (greedy) | | *? | *? | zero or more (lazy) | | +? | +? | one or more (lazy) | | ?? | ?? | zero or one (lazy) | | {*m*}? | {*m*}? | exactly *m* | | {*m*,}? | {*m*,}? | at least *m* (lazy) | | {,*m*}? | | at most *m* (lazy) | | {*m*,*n*}? | {*m*,*n*}? | at least *m* but no more than *n* (lazy) | | *+ | | zero or more (possessive) | | ++ | | one or more (possessive) | | ?+ | | zero or one (possessive) | | {*m*}+ | | exactly *m* | | {*m*,}+ | | at least *m* (possessive) | | {,*m*}+ | | at most *m* (possessive) | | {*m*,*n*}+ | | at least *m* but no more than *n* (possessive) | ## Constraints / Anchors -
glv revised this gist
Apr 3, 2018 . 1 changed file with 25 additions and 24 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,16 +16,17 @@ map between them easily. This table is aimed at making it easier. | Ruby | Postgres | Explanation | |-|-|-| | =~ | ~ | matches | | | ~* | matches (case-independent) | | !~ | !~ | does not match | | | !~* | does not match (case-independent) | | *string*.match(*re*) | | returns a MatchData instance or nil if *re* does not match *string* | | *re*.match(*string*) | | returns a MatchData instance or nil if *re* does not match *string* | | | substring(*string* from *pattern*) | returns NULL if no match, the match text for the first capture group (if any), or the entire matched substring | | | regexp_replace(*source*, *pattern*, *replacement* [, *flags* ]) | returns *source*, transformed per *pattern* and *replacement* (and *flags*, which can include `g`) | | | regexp_matches(*string*, *pattern* [, *flags* ]) | returns no rows (if no match); a row containing an array of capture matches (if the pattern uses capture groups), or a row containing the entire matched substring. (The `g` flag, if supplied, results in one returned row for each match in *string*.) | | | regexp_split_to_table(*string*, *pattern* [, *flags* ]) | splits *string* using *pattern* as a delimiter, returning a row for each fragment (ignoring zero-length fragments) | | | regexp_split_to_array(*string*, *pattern* [, *flags* ]) | same as `regexp_split_to_table` except it returns an array of strings rather than rows | ## Alternation @@ -73,20 +74,20 @@ Quantifiers can follow atoms, and they change the number of occurrences of the a | Ruby | Postgres | Explanation | |-|-|-| | * | * | zero or more (greedy) | | + | + | one or more (greedy) | | ? | ? | zero or one (greedy) | | {*m*} | {*m*} | exactly *m* | | {*m*,} | {*m*,} | at least *m* (greedy) | | {,*m*} | | at most *m* (greedy) | | {*m*,*n*} | {*m*,*n*} | at least *m* but no more than *n* (greedy) | | *? | *? | zero or more (non-greedy) | | +? | +? | one or more (non-greedy) | | ?? | ?? | zero or one (non-greedy) | | {*m*}? | {*m*}? | exactly *m* | | {*m*,}? | {*m*,}? | at least *m* (non-greedy) | | {,*m*}? | | at most *m* (non-greedy) | | {*m*,*n*}? | {*m*,*n*}? | at least *m* but no more than *n* (non-greedy) | | *+ | | | | ++ | | | | ?+ | | | -
glv revised this gist
Apr 3, 2018 . 1 changed file with 4 additions and 17 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,18 +1,5 @@ # Ruby and Postgres Regular Expression Syntaxes Ruby's regular expressions are unusually powerful. Postgres' regular expressions are not as powerful, but they come close; close enough that it's possible to do many pattern-based queries and string transformations entirely in a query. @@ -29,10 +16,10 @@ map between them easily. This table is aimed at making it easier. | Ruby | Postgres | Explanation | |-|-|-| | `=~` | `~` | matches | | | `~*` | matches (case-independent) | | `!~` | `!~` | does not match | | | `!~*` | does not match (case-independent) | | `String#match(re)`, `Regexp#match(string)` | | returns a MatchData instance or nil if no match | | | <code>substring(*string* from *pattern*)</code> | returns NULL if no match, the match text for the first capture group (if any), or the entire matched substring | | | <code>regexp_replace(*source*, *pattern*, *replacement* [, *flags* ])</code>| returns *source*, transformed per *pattern* and *replacement* (and *flags*, which can include `g`) | -
glv revised this gist
Apr 3, 2018 . 1 changed file with 7 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,10 +1,17 @@ # Ruby and Postgres Regular Expression Syntaxes <div> <ul> <li>one</li> <li>two</li> </ul> <style type="text/css"> code { padding: unset; } </style> </div> Ruby's regular expressions are unusually powerful. Postgres' regular expressions are not as powerful, but they come close; -
glv revised this gist
Apr 3, 2018 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,11 +1,11 @@ # Ruby and Postgres Regular Expression Syntaxes <style type="text/css"> code { padding: unset; } </style> Ruby's regular expressions are unusually powerful. Postgres' regular expressions are not as powerful, but they come close; close enough that it's possible to do many pattern-based queries and string transformations entirely in a query. -
glv revised this gist
Apr 3, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ <style type="text/css"> code { padding: unset; } -
glv revised this gist
Apr 3, 2018 . 1 changed file with 6 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,9 @@ <style> code { padding: unset; } </style> # Ruby and Postgres Regular Expression Syntaxes Ruby's regular expressions are unusually powerful. -
glv revised this gist
Apr 3, 2018 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -29,7 +29,7 @@ map between them easily. This table is aimed at making it easier. ## Alternation <table> <thead> <tr> <th>Ruby</th> @@ -41,7 +41,7 @@ map between them easily. This table is aimed at making it easier. <tr> <td>|</td> <td>|</td> <td>combines two expressions into a single one that matches either of the expressions; each expression is an <em>alternative</em></td> </tr> </tbody> </table> -
glv revised this gist
Apr 3, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -29,7 +29,7 @@ map between them easily. This table is aimed at making it easier. ## Alternation <table markdown="1"> <thead> <tr> <th>Ruby</th> -
glv created this gist
Apr 3, 2018 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,292 @@ # Ruby and Postgres Regular Expression Syntaxes Ruby's regular expressions are unusually powerful. Postgres' regular expressions are not as powerful, but they come close; close enough that it's possible to do many pattern-based queries and string transformations entirely in a query. And sometimes, it's very useful to have a single regular expression that works in both environments, so that one can reliably query for data in a particular format and then transform the results in Ruby using the same pattern. But there are subtle differences between the syntaxes that are hard to remember, and the two organize the documentation is different ways, so it's hard to even map between them easily. This table is aimed at making it easier. ## Operators and Functions | Ruby | Postgres | Explanation | |-|-|-| | =~ | ~ | matches | | | ~* | matches (case-independent) | | !~ | !~ | does not match | | | !~* | does not match (case-independent) | | `String#match(re)`, `Regexp#match(string)` | | returns a MatchData instance or nil if no match | | | <code>substring(*string* from *pattern*)</code> | returns NULL if no match, the match text for the first capture group (if any), or the entire matched substring | | | <code>regexp_replace(*source*, *pattern*, *replacement* [, *flags* ])</code>| returns *source*, transformed per *pattern* and *replacement* (and *flags*, which can include `g`) | | | <code>regexp_matches(*string*, *pattern* [, *flags* ])</code> | returns no rows (if no match); a row containing an array of capture matches (if the pattern uses capture groups), or a row containing the entire matched substring. (The `g` flag, if supplied, results in one returned row for each match in *string*.) | | | <code>regexp_split_to_table(*string*, *pattern* [, *flags* ])</code> | splits *string* using *pattern* as a delimiter, returning a row for each fragment (ignoring zero-length fragments) | | | <code>regexp_split_to_array(*string*, *pattern* [, *flags* ]) | same as `regexp_split_to_table` except it returns an array of strings rather than rows | ## Alternation <table> <thead> <tr> <th>Ruby</th> <th>Postgres</th> <th>Explanation</th> </tr> </thead> <tbody> <tr> <td>|</td> <td>|</td> <td>combines two expressions into a single one that matches either of the expressions; each expression is an *alternative*</td> </tr> </tbody> </table> ## Atoms Atoms match a sequence of one or more characters (or *zero* or more characters, in the case of subpatterns). | Ruby | Postgres | Explanation | |-|-|-| | (*re*) | (*re*) | a sub-pattern (capturing) | | (?<*name*>*re*) | | a sub-pattern (named capture) | | (?"*name*"*re*) | | a sub-pattern (named capture) | | (?:*re*) | (?:*re*) | a sub-pattern (non-capturing) | | (?>*re*) | | a sub-pattern (atomic, non-capturing) | | . | . | any single character | | [*chars*] | [*chars*] | a character class | | [^*chars*] | [^*chars*] | a negated character class | | \\*k* | \\*k* | where *k* is non-alphanumeric: matches *k* | | \\*c* | \\*c* | where *c* is alphanumeric: an escape | | { | { | if followed by a digit, introduces a bound quantifier; otherwise matches { | | *x* | *x* | other characters match themselves | (Additionally, backreferences and escapes function as atoms.) ## Quantifiers Quantifiers can follow atoms, and they change the number of occurrences of the atom that can be matched. (Without a subsequent quantifier, an atom will always match *exactly one* consecutive occurrence.) | Ruby | Postgres | Explanation | |-|-|-| | * | * | | | + | + | | | ? | ? | | | {*m*} | {*m*} | | | {*m*,} | {*m*,} | | | {,*m*} | | | | {*m*,*n*} | {*m*,*n*} | | | *? | *? | | | +? | +? | | | ?? | ?? | | | {*m*}? | {*m*}? | | | {*m*,}? | {*m*,}? | | | {,*m*}? | | | | {*m*,*n*}? | {*m*,*n*}? | | | *+ | | | | ++ | | | | ?+ | | | | {*m*}+ | | | | {*m*,}+ | | | | {,*m*}+ | | | | {*m*,*n*}+ | | | ## Constraints / Anchors | Ruby | Postgres | Explanation | |-|-|-| | ^ | ^ | beginning of line | | \\A | \\A | beginning of string | | $ | $ | end of line | | \\Z | \\Z | end of string (just before a terminating newline, if any) | | \\z | | end of string | | (?=*re*) | (?=*re*) | empty string when following characters match *re* | | (?!*re*) | (?!*re*) | empty string when following characters do not match *re* | | (?<=*re*) | | empty string when preceding characters match *re* | | (?<!*re*) | | empty string when preceding characters do not match *re* | | \\b | \\y | word boundary | | \\B | \\Y | non-word-boundary | | | \\m | beginning of word | | | \\M | end of word | ## Character Classes The following syntaxes are available *within* character class (bracket) expressions to enumerate the characters included (or excluded, in the case of negated classes). | Ruby | Postgres | Explanation | |-|-|-| | *c* | *c* | a single character | | *a*-*z* | *a*-*z* | (where *a* and *z* are any characters) the entire sequence of characters from *a* to *z* inclusive | | [*class*] | | embedded class (OR'd with neighboring characters) | | && | | intersection operator on embedded classes : `[a-w&&[^c-g]z]` means "(`[a-w]` AND (`[^c-g]` OR `z`))" (in other words, `[abh-w]`) | | [:alnum:] | [:alnum:] | all alphanumeric characters | | [:alpha:] | [:alpha:] | all alphabetic characters | | [:blank:] | [:blank:] | all space and tab characters | | [:cntrl:] | [:cntrl:] | all control characters | | [:digit:] | [:digit:] | all decimal digits | | [:graph:] | [:graph:] | all non-blank characters | | [:lower:] | [:lower:] | all lowercase characters | | [:print:] | [:print:] | all printable characters (like [:graph:], but includes space) | | [:punct:] | [:punct:] | all punctuation characters | | [:space:] | [:space:] | all whitespace characters (except tab) | | [:upper:] | [:upper:] | all uppercase characters | | [:xdigit:] | [:xdigit:] | all hexadecimal digits (same as `[:digit:]A-Fa-f`) | | [:word:] | | all characters in the following Unicode general categories: *Letter*, *Mark*, *Number*, *Connector_Punctuation* | | [:ascii:] | | all ASCII characters | ## Backreferences | Ruby | Postgres | Explanation | |-|-|-| | \\*nnn* | \\*nnn* | (up to three digits, with no leading zeroes) a backreference if not greater than the number of capturing groups; octal escape otherwise | | \\k<*name*> | | reference to a named capture | | \\g<*name*> | | reference to a named subpattern (re-evaluates the subpattern, rather than matching the same text) ### Substitution Backreferences In Ruby, information from the match may be used in any of three ways: as global variables (`$something`), as substitution references in the second argument of `sub` or `gsub` (`\\something`) or by calling into the MatchData object returned from `match` (`md.something` or `md[:something]`). In Postgres, these can be used in the second argument to `regexp_replace`. | Ruby | Postgres | Explanation | |-|-|-| | `$-`, `Regexp.last_match` | | the MatchData object from the most recent match | | `$&`, `\&`, `md[0]` | `\&` | the complete matched text | | <code>$\`</code>, <code>\\\`</code>, `md.pre_match` | | the text of the string preceding the match | | `$'`, `\'`, `md.post_match` | | the text of the string after the match | | `$1`, `\1`, `md[1]` | `\1` | the first capture group (and so on for other numbered capture groups) | | `$+`, `\+` | | the last capture group | | <code>\k<*name*></code>, <code>md[:*name*]</code> | | the named capture group with name *name* | ## Escapes ### Pattern Escapes | Ruby | Postgres | Explanation | |-|-|-| | \\d | \\d | [[:digit:]] | | \\D | \\D | [^[:digit:]] | | \\h | | [[:xdigit:]] | | \\H | | [^[:xdigit:]] | | \\s | \\s | [[:space:]] | | \\S | \\S | [^[:space:]] | | \\w | \\w | [[:alnum:]_] (note underscore is included) | | \\W | \\W | [^[:alnum:]_] (note underscore is included) | | \p{Alnum} | | Alphabetic and numeric character | | \p{Alpha} | | Alphabetic character | | \p{Blank} | | Space or tab | | \p{Cntrl} | | Control character | | \p{Digit} | | Digit | | \p{Graph} | | Non-blank character (excludes spaces, control characters, and similar) | | \p{Lower} | | Lowercase alphabetical character | | \p{Print} | | Like \p{Graph}, but includes the space character | | \p{Punct} | | Punctuation character | | \p{Space} | | Whitespace character ([:blank:], newline, carriage return, etc.) | | \p{Upper} | | Uppercase alphabetical | | \p{XDigit} | | Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F) | | \p{Word} | | A member of one of the following Unicode general category Letter, Mark, Number, Connector_Punctuation | | \p{ASCII} | | A character in the ASCII character set | | \p{Any} | | Any Unicode character (including unassigned characters) | | \p{Assigned} | | An assigned character | | \p{L} | | Any character with Unicode *General Category* 'Letter' | | \p{Ll} | | Any character with Unicode *General Category* 'Letter: Lowercase' | | \p{Lm} | | Any character with Unicode *General Category* 'Letter: Mark' | | \p{Lo} | | Any character with Unicode *General Category* 'Letter: Other' | | \p{Lt} | | Any character with Unicode *General Category* 'Letter: Titlecase' | | \p{Lu} | | Any character with Unicode *General Category* 'Letter: Uppercase | | \p{Lo} | | Any character with Unicode *General Category* 'Letter: Other' | | \p{M} | | Any character with Unicode *General Category* 'Mark' | | \p{Mn} | | Any character with Unicode *General Category* 'Mark: Nonspacing' | | \p{Mc} | | Any character with Unicode *General Category* 'Mark: Spacing Combining' | | \p{Me} | | Any character with Unicode *General Category* 'Mark: Enclosing' | | \p{N} | | Any character with Unicode *General Category* 'Number' | | \p{Nd} | | Any character with Unicode *General Category* 'Number: Decimal Digit' | | \p{Nl} | | Any character with Unicode *General Category* 'Number: Letter' | | \p{No} | | Any character with Unicode *General Category* 'Number: Other' | | \p{P} | | Any character with Unicode *General Category* 'Punctuation' | | \p{Pc} | | Any character with Unicode *General Category* 'Punctuation: Connector' | | \p{Pd} | | Any character with Unicode *General Category* 'Punctuation: Dash' | | \p{Ps} | | Any character with Unicode *General Category* 'Punctuation: Open' | | \p{Pe} | | Any character with Unicode *General Category* 'Punctuation: Close' | | \p{Pi} | | Any character with Unicode *General Category* 'Punctuation: Initial Quote' | | \p{Pf} | | Any character with Unicode *General Category* 'Punctuation: Final Quote' | | \p{Po} | | Any character with Unicode *General Category* 'Punctuation: Other' | | \p{S} | | Any character with Unicode *General Category* 'Symbol' | | \p{Sm} | | Any character with Unicode *General Category* 'Symbol: Math' | | \p{Sc} | | Any character with Unicode *General Category* 'Symbol: Currency' | | \p{Sc} | | Any character with Unicode *General Category* 'Symbol: Currency' | | \p{Sk} | | Any character with Unicode *General Category* 'Symbol: Modifier' | | \p{So} | | Any character with Unicode *General Category* 'Symbol: Other' | | \p{Z} | | Any character with Unicode *General Category* 'Separator' | | \p{Zs} | | Any character with Unicode *General Category* 'Separator: Space' | | \p{Zl} | | Any character with Unicode *General Category* 'Separator: Line' | | \p{Zp} | | Any character with Unicode *General Category* 'Separator: Paragraph' | | \p{C} | | Any character with Unicode *General Category* 'Other' | | \p{Cc} | | Any character with Unicode *General Category* 'Other: Control' | | \p{Cf} | | Any character with Unicode *General Category* 'Other: Format' | | \p{Cn} | | Any character with Unicode *General Category* 'Other: Not Assigned' | | \p{Co} | | Any character with Unicode *General Category* 'Other: Private Use' | | \p{Cs} | | Any character with Unicode *General Category* 'Other: Surrogate' | | \p{*script*} | | Any character from the Unicode *script*, where *script* is one of *Arabic*, *Armenian*, *Balinese*, *Bengali*, *Bopomofo*, *Braille*, *Buginese*, *Buhid*, *Canadian_Aboriginal*, *Carian*, *Cham*, *Cherokee*, *Common*, *Coptic*, *Cuneiform*, *Cypriot*, *Cyrillic*, *Deseret*, *Devanagari*, *Ethiopic*, *Georgian*, *Glagolitic*, *Gothic*, *Greek*, *Gujarati*, *Gurmukhi*, *Han*, *Hangul*, *Hanunoo*, *Hebrew*, *Hiragana*, *Inherited*, *Kannada*, *Katakana*, *Kayah_Li*, *Kharoshthi*, *Khmer*, *Lao*, *Latin*, *Lepcha*, *Limbu*, *Linear_B*, *Lycian*, *Lydian*, *Malayalam*, *Mongolian*, *Myanmar*, *New_Tai_Lue*, *Nko*, *Ogham*, *Ol_Chiki*, *Old_Italic*, *Old_Persian*, *Oriya*, *Osmanya*, *Phags_Pa*, *Phoenician*, *Rejang*, *Runic*, *Saurashtra*, *Shavian*, *Sinhala*, *Sundanese*, *Syloti_Nagri*, *Syriac*, *Tagalog*, *Tagbanwa*, *Tai_Le*, *Tamil*, *Telugu*, *Thaana*, *Thai*, *Tibetan*, *Tifinagh*, *Ugaritic*, *Vai*, and *Yi* | (Any of the above escapes of the form \p{*something*} can be negated by using the `^` character, as \p{^*something*}.) ### Literal Character Escapes | Ruby | Postgres | Explanation | |-|-|-| | \\\\ | \\\\ | backslash | | \\a | \\a | alert (bell) character, as in C | | \\b | | backspace (only in character class) | | | \\b | backspace, as in C | | | \\B | synonym for backslash (\\) to help reduce the need for backslash doubling | | | \\c*X* | (where *X* is any character) the character whose low-order 5 bits are the same as those of *X*, and whose other bits are all zero | | \\e | | the escape character | | | \\e | the character whose collating-sequence name is ESC, or failing that, the character with octal value 033 | | \\f | \\f | form feed, as in C | | \\n | \\n | newline, as in C | | \\r | \\r | carriage return, as in C | | \\t | \\t | horizontal tab, as in C | | | \\u*hhhh* | (where *hhhh* is exactly four hexadecimal digits) the character whose hexadecimal value is 0x*hhhh* | | | \\U*hhhhhhhh* | (where *hhhhhhhh* is exactly eight hexadecimal digits) the character whose hexadecimal value is 0x*hhhhhhhh* | | \\v | \\v | vertical tab, as in C | | | \\x*hhh* | (where *hhh* is any sequence of hexadecimal digits) the character whose hexadecimal value is 0x*hhh* (a single character no matter how many hexadecimal digits are used) | | | \\0 | the character whose value is 0 (the null byte) | | | \\*oo* | (where *oo* is exactly two octal digits, and is not a back reference) the character whose octal value is 0*oo* | | | \\*ooo* | (where *ooo* is exactly three octal digits, and is not a back reference) the character whose octal value is 0*ooo* | ## Options In Ruby, the single-letter version of an option can be specified after the closing delimiter of the regexp. Options `i`, `m`, and `x` can also be embedded within the expression using the <code markdown=true>(?*on*-*off*:*re*)</code> syntax, which turns on options *on* and turns off options *off* while interpreting subpattern *re*. The `Regexp::CONSTANT` version can be passed as the second parameter to `Regexp.new` (optionally combined with other constants with `|`). In Postgres, the options can be included at the very start of an expression (possibly after an initial `***=`) using the syntax <code>(?*opts*)</code>; the options included in the string *opts* are in effect for the entire regular expression. The options can also be included in a string passed as a parameter to various pattern-related functions. | Ruby | Postgres | Explanation | |-|-|-| | i or `Regexp::IGNORECASE` | | case-insensitive | | m or `Regexp::MULTILINE` | | multiline (treat newline as a character matched by `.`) | | x or `Regexp::EXTENDED` | | ignore whitespace and comments in pattern | | o | | perform `#{}` interpolation only once || | u | | regexp is encoded as UTF-8 | | e | | regexp is encoded as EUC-JP | | s | | regexp is encoded as Windows-31J | | n | | regexp is encoded as ASCII-8BIT | | | \*\*\*: | (at beginning of pattern) the rest of the pattern is an ARE | | | \*\*\*= | (at beginning of pattern) the rest of the pattern is a literal string | | | b | rest of RE is a BRE | | | c | case-sensitive matching (overrides operator type) | | | e | rest of RE is an ERE | | | i | case-insensitive matching (overrides operator type) | | | m | historical synonym for n | | | n | newline-sensitive matching | | | p | partial newline-sensitive matching | | | q | rest of RE is a literal ("quoted") string, all ordinary characters | | | s | non-newline-sensitive matching (default) | | | t | tight syntax (default) | | | w | inverse partial newline-sensitive ("weird") matching | | | x | expanded syntax | | | g | (with `regexp_replace` and `regexp_matches` only) operate on all matches, not just the first |