i trying match string contain 1 word in language. search led me \p{...} absent in python's re module. found https://pypi.python.org/pypi/regex. should work \p{...} commands. although doesn't.
i tried parsing lines:
7652167371 apéritif 78687 attaché 78687 époque 78678 kunngjøre 78678 ærbødig 7687 vår 12312 dfsdf 23123 322432 1321 23123 2312 привер 32211 оипвыола
with:
def pattern_compile(pattern_array): regexes = [regex.compile(p) p in pattern_array] return regexes def main(): line in sys.stdin: regexp in pattern_compile(p_a): if regexp.search (line): print line.strip('\n') if __name__ == '__main__': p_a = ['^\d+\t(\p{l}|\p{m})+$', ] main()
the result latin-character word:
12312 dfsdf
you should pass unicode. (both regular expression , string)
import sys import regex def main(patterns): patterns = [regex.compile(p) p in patterns] line in sys.stdin: line = line.decode('utf8') regexp in patterns: if regexp.search (line): print line.strip('\n') if __name__ == '__main__': main([ur'^\d+\t(\p{l}|\p{m})+$', ])
Comments
Post a Comment