Skip to content

Instantly share code, notes, and snippets.

@leollon
Last active January 10, 2020 13:38
Show Gist options
  • Save leollon/0c12b952e067176252c557f0007a5a78 to your computer and use it in GitHub Desktop.
Save leollon/0c12b952e067176252c557f0007a5a78 to your computer and use it in GitHub Desktop.
hexo markdown regular expression
import re
pat = re.compile(
r"\-*\s*"
r"\w*:\s*([\w\d\-,.!?'\"&/<>:\ \u4e00-\u9fa5\u30a0-\u30ff\u3040-\u309f\u4e00-\u9fcf\(\)\[\]]*)\s*(?#title)"
r"\w*:\s*(\d{4}\-\d{2}\-\d{2}\ *\d{2}:\d{2}:\d{2})\s*(?#date)"
r"\w*:\s*([\w\d\-'\ \[\]/.\u4e00-\u9fa5\u30a0-\u30ff\u3040-\u309f\u4e00–\u9fcf]*)\s*(?#categories)"
r"\w*:\s*\[([\w\d\ \-.\"',/\u4e00-\u9fa5\u30a0-\u30ff\u3040-\u309f\u4e00–\u9fcf]*)\]\s*(?#tags)"
r"\-*\s?([\w\d\s\u4e00-\u9fa5\u30a0-\u30ff\u3040-\u309f\u4e00–\u9fcf#@$%^&,.:;!?\"'`。,、;:“”"
r"——!?()\+\-*/=|\\\(\)\[\]{}<>]*)(?#content)")
# references
https://gist.github.com/oanhnn/9043867
Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf) ~ The Big Kahuna!
([一-龯])
Regex for matching Hirgana or Katakana (*)
([ぁ-んァ-ン])
Regex for matching Non-Hirgana or Non-Katakana
([^ぁ-んァ-ン])
Regex for matching Hirgana or Katakana or basic punctuation (、。’)
([ぁ-んァ-ン\w])
Regex for matching Hirgana or Katakana and random other characters
([ぁ-んァ-ン!:/])
Regex for matching Hirgana
([ぁ-ん])
Regex for matching full-width Katakana (zenkaku 全角)
([ァ-ン])
Regex for matching half-width Katakana (hankaku 半角)
([ァ-ン゙゚])
Regex for matching full-width Numbers (zenkaku 全角)
([0-9])
Regex for matching full-width Letters (zenkaku 全角)
([A-z])
Regex for matching Hiragana codespace characters (includes non phonetic characters)
([ぁ-ゞ])
Regex for matching full-width (zenkaku) Katakana codespace characters (includes non phonetic characters)
([ァ-ヶ])
Regex for matching half-width (hankaku) Katakana codespace characters (this is an old character set so the order is inconsistent with the hiragana)
([ヲ-゚])
Regex for matching Japanese Post Codes
/^\d{3}-\d{4}$/
/^\d{3}-\d{4}$|^\d{3}-\d{2}$|^\d{3}$/
Regex for matching Japanese mobile phone numbers (keitai bangou)
/^\d{3}-\d{4}-\d{4}$|^\d{11}$/
/^0\d0-\d{4}-\d{4}$/
Regex for matching Japanese fixed line phone numbers
/^[0-9-]{6,9}$|^[0-9-]{12}$/
/^\d{1,4}-\d{4}$|^\d{2,5}-\d{1,4}-\d{4}$/
Note: Katakana without mentioning "full-width" or "half-width" means "full-width katakana".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment