How to implement \p{L} in python regex -


i trying match string contain 1 word in language. search led me \p{...} absent in python's re module. found https://pypi.python.org/pypi/regex. should work \p{...} commands. although doesn't.

i tried parsing lines:

7652167371  apéritif 78687   attaché 78687   époque 78678   kunngjøre 78678   ærbødig 7687    vår 12312   dfsdf 23123   322432 1321    23123 2312    привер 32211   оипвыола 

with:

def pattern_compile(pattern_array):     regexes = [regex.compile(p) p in pattern_array]     return regexes  def main():     line in sys.stdin:         regexp in pattern_compile(p_a):             if regexp.search (line):                 print line.strip('\n')  if __name__ == '__main__':     p_a = ['^\d+\t(\p{l}|\p{m})+$', ]     main() 

the result latin-character word:

12312   dfsdf 

you should pass unicode. (both regular expression , string)

import sys  import regex   def main(patterns):     patterns = [regex.compile(p) p in patterns]     line in sys.stdin:         line = line.decode('utf8')         regexp in patterns:             if regexp.search (line):                 print line.strip('\n')  if __name__ == '__main__':     main([ur'^\d+\t(\p{l}|\p{m})+$', ]) 

Comments