linux - Scalable way of deleting all lines from a file where the line starts with one of many values -
given input file of variable values (example):
a b d   what script remove lines another file start 1 of above values? example, file contents:
a b c d   would end being:
c   the input file of order of 100,000 variable values. file mangled of order of several million lines.
awk '      nr==fnr {     # if first file in arg list         list[$0]  #     store contents of current record index or array "list"         next      #     skip rest of script , move on next input record     }             # endif      {                                # must second file in arg list         (i in list)              # each index "i" in array "list"             if (index($0,i) == 1)    #     if "i" starts @ 1st char on current record                 next                 #         move on next input record      }       1  # specify true condition , invoke default action of printing current record.  ' file1 file2   an alternative approach building array , doing string comparison on each element build regular expression, e.g.:
... list = list "|" $0 ...   and doing re comparison:
... if ($0 ~ list)     next ...   but i'm not sure that'd faster loop , you'd have worry re metacharacters appearing in file1.
if of values in file1 single characters, though, approach of creating character list use in re comparison might work you:
awk 'nr==fnr{list = list $0; next} $0 !~ "^[" list "]"' file1 file2      
Comments
Post a Comment