linux - Scalable way of deleting all lines from a file where the line starts with one of many values -
given input file of variable values (example):
a b d
what script remove lines another file start 1 of above values? example, file contents:
a b c d
would end being:
c
the input file of order of 100,000 variable values. file mangled of order of several million lines.
awk ' nr==fnr { # if first file in arg list list[$0] # store contents of current record index or array "list" next # skip rest of script , move on next input record } # endif { # must second file in arg list (i in list) # each index "i" in array "list" if (index($0,i) == 1) # if "i" starts @ 1st char on current record next # move on next input record } 1 # specify true condition , invoke default action of printing current record. ' file1 file2
an alternative approach building array , doing string comparison on each element build regular expression, e.g.:
... list = list "|" $0 ...
and doing re comparison:
... if ($0 ~ list) next ...
but i'm not sure that'd faster loop , you'd have worry re metacharacters appearing in file1.
if of values in file1 single characters, though, approach of creating character list use in re comparison might work you:
awk 'nr==fnr{list = list $0; next} $0 !~ "^[" list "]"' file1 file2
Comments
Post a Comment