Python正则速查
特殊情况
‘\’: special characters
‘\\’: match a literal backslash
r’ ’ : Python’s raw string notation for regular expression patterns
‘\number’: ‘\x00’
Special characters:
• '.' : any character except a newline (DOTALL: newline)
• '^' : the start of the string (MULTILINE)
• '$' : the end of the string (MULTILINE)
• '*' : 0 or more repetitions of the preceding RE
• '+' : 1 or more repetitions of the preceding RE
• '?' : 0 or 1 repetitions of the preceding RE
• '*?', '+?', '??' : non-greedy
• {m} : specifies that exactly m copies of the previous RE should be matched
• {m, n} : match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible. {m, } and {,n} are useful
• {m, n}? : attempting to match as few repetitions as possible
• '/' : either escapes special characters, or signals a special sequence
• [] : used to indicate a set of characters
○ [amk] match 'a', 'm', 'k'
○ [a-z], [0-9], [0-9A-Fa-f]
○ Special characters lose their special meaning inside sets. [(*+)] match '(', ')', '+', '*'
○ Character classes like \w or \S are also accepted inside a set
○ [^5] match any character except '5'
○ [()[\]{}] and []()[{}] will both match a parenthesis
• '|' : A|B match either A or B (tried from left to right)
• (…) : matches whatever regular expression is inside the parentheses (can be retrieved after a match)
• (?...) : extension notation
○ (?iLmsux) : one or more letters from the set 'i', 'L', 'm', 's', 'u', 'x', the group matches the empty string
○ (?:…) : a non-capturing version of regular parentheses (the substring matched by the group cannot be retrieved after performing a match)
○ (?P<name>…) : each group name must be defined only once within a regular expression
○ (?P=name) : a backreference to a named group
○ (?#...) : a comment
○ (?=…) : matches if … matches next
○ (?!...) : matches if … doesn't match next
○ (?<=…) : a positive lookbehind assertion
○ (?<!...) : negative lookbehind assertion
○ (?(id/name)yes-pattern|no-pattern) : try to match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn't
• \number : matches the contents of the group of the same number
• \A : matches only at the start of the string
• \b : matches the empty string, but only at the beginning or end of a word
• \B : matches the empty string, but only when it is not at the beginning or end of a word
• \d : decimal digit
• \D : non-digit character
• \s : any whitespace character
• \S : any non-whitespace character
• \w : alphanumeric character and the underscore
• \W : any non-alphanumeric character
• \Z : match only at the end of the string
Module contents
• re.compile(pattern, flags=0): compile a regular expression pattern, prepare for search and match
○ re.I : ignore case
○ re.L : locale dependent
○ re.M : Multi-line
○ re.S : dot matches all
○ re.U : Unicode dependent
○ re.X : verbose
○ re.DEBUG
• re.search(): look for the first location
• re.match(): at the beginning of string
• re.split(): split string
• re.findall(): all non-overlapping matches of pattern in string
• re.finditer(): return an iterator
• re.sub(pattern, rep1): replace the leftmost non-overlapping occurrences of pattern in string by the replacement rep1.
• re.subn(): return a tuple
• re.escape(): return a string with all non-alphanumerics backslashed
• re.purge(): clear the regualr expression cache
• re.error