linux - Scalable way of deleting all lines from a file where the line starts with one of many values -


given input file of variable values (example):

a b d 

what script remove lines another file start 1 of above values? example, file contents:

a b c d 

would end being:

c 

the input file of order of 100,000 variable values. file mangled of order of several million lines.

awk '      nr==fnr {     # if first file in arg list         list[$0]  #     store contents of current record index or array "list"         next      #     skip rest of script , move on next input record     }             # endif      {                                # must second file in arg list         (i in list)              # each index "i" in array "list"             if (index($0,i) == 1)    #     if "i" starts @ 1st char on current record                 next                 #         move on next input record      }       1  # specify true condition , invoke default action of printing current record.  ' file1 file2 

an alternative approach building array , doing string comparison on each element build regular expression, e.g.:

... list = list "|" $0 ... 

and doing re comparison:

... if ($0 ~ list)     next ... 

but i'm not sure that'd faster loop , you'd have worry re metacharacters appearing in file1.

if of values in file1 single characters, though, approach of creating character list use in re comparison might work you:

awk 'nr==fnr{list = list $0; next} $0 !~ "^[" list "]"' file1 file2 

Comments