c# SortedList<string, TValue>.ContainsKey for successfully added key returns false -


check update 3 below found out issue ran related known serious problem c# string comparers .net 4.0, 4.0 client , 4.5, lead inconsistent sort order of lists of strings (causing output depend on order in input , sort algorithm used). problem reported microsoft in dec 2012 , closed "won't fixed". work around available, slower hardly practical large collections.

while implementing immutable patriciatrie, wanted compare performance system.collections.generic.sortedlist. used following file https://github.com/rkapsi/patricia-trie/blob/master/src/test/resources/org/ardverk/collection/hamlet.txt create input wordlist testing.

when inserting each of words in c# sortedlist, using either comparer<string>.default or stringcomparer.invariantculture key comparer, number of entries inserted cannot retrieved using normal search methods (e.g. containskey returns false) key present in list observed iterating list.

even more curious, comparer returns value '0' when comparing key retrieved sorted list search key not found using containskey.

the complete example below demonstrates issue on system.

using system; using system.io; using system.linq; using system.collections.generic;  class program {     static void main(string[] args)     {         // problem possibly related comparison.         var fail = true;         var comparer = fail ? stringcomparer.invariantculture : stringcomparer.ordinal;          // read hamlet (contains duplicate words)         var words = file             .readalllines("hamlet.txt")             .selectmany(l => l.split(new[] { ' ', '\t' }, stringsplitoptions.removeemptyentries))             .select(w => w.trim())             .where(w => !string.isnullorempty(w))             .distinct(comparer)             .toarray();          // insert hamlet's words in sorted list.         var list = new sortedlist<string, int>(comparer);         var ndx = 0;         foreach (var word in words)             list[word] = ndx++;          // search each of added words.         foreach (var keytosearch in words)         {             if (!list.containskey(keytosearch))             {                 // inserted, cannot retrieved.                 console.writeline("error - key not found: \"{0}\"", keytosearch);                 // when iterate on list, see entry present                 var prefix = keytosearch.substring(0, math.min(keytosearch.length, 3));                 foreach (var wordclosetosearchkey in list.keys.where(s => s.startswith(prefix)))                 {                     // , using sortedlist's supplied comparison returns 0, signaling equality                     var comparisonresult = list.comparer.compare(wordclosetosearchkey, keytosearch);                     console.writeline("{0} - comparison result = {1}", wordclosetosearchkey, comparisonresult);                 }             }         }          // check sort order of list.keys correct          var keys = list.keys.toarray();         binarysearchall("list.keys", keys, list.comparer);         checkcorrectsortorder("list.keys", keys, list.comparer);          // check sort order of array.sort(list.keys) correct          var arraysortedkeys = copysortsearchandcheck("array.sort(list.keys)", keys, list.comparer);          // check sort order of array.sort(input words) correct          var sortedinput = copysortsearchandcheck("array.sort(input words)", words, list.comparer);          console.readline();     }      static string[] copysortsearchandcheck(string arraydesc, string[] input, icomparer<string> comparer)     {         // copy input         var sortedinput = new string[input.length];         array.copy(input, sortedinput, sortedinput.length);          // sort         array.sort(sortedinput, comparer);          // check can find keys in array using bin. search         binarysearchall(arraydesc, sortedinput, comparer);          // check sort order correct         checkcorrectsortorder(arraydesc, sortedinput, comparer);          return sortedinput;     }      static void binarysearchall(string arraydesc, string[] sortedinput, icomparer<string> comparer)     {         // check each key in input can found using bin. search         foreach (var word in sortedinput)         {             var ix = array.binarysearch(sortedinput, word, comparer);             if (ix < 0)                 // , appears cannot!                 console.writeline("error - {0} - key not found: \"{1}\"", arraydesc, word);         }     }      static void checkcorrectsortorder(string arraydesc, string[] sortedkeys, icomparer<string> comparer)     {         (int n = 0; n < sortedkeys.length; n++)         {             (int = n + 1; < sortedkeys.length; up++)             {                 var cmp = comparer.compare(sortedkeys[n], sortedkeys[up]);                 if (cmp >= 0)                 {                     console.writeline(                         "{0}[{1}] = \"{2}\" not < {0}[{3}] = \"{4}\"  - cmp = {5}",                         arraydesc, n, sortedkeys[n], up, sortedkeys[up], cmp);                 }             }             (int down = n - 1; down > 0; down--)             {                 var cmp = comparer.compare(sortedkeys[n], sortedkeys[down]);                 if (cmp <= 0)                 {                     console.writeline(                         "{0}[{1}] = \"{2}\" not > {0}[{3}] = \"{4}\"  - cmp = {5}",                         arraydesc, n, sortedkeys[n], down, sortedkeys[down], cmp);                 }             }         }     } } 

does have explanation unexpected , odd behaviour?

when changing comparer used sortedlist stringcomparer.ordinal (by e.g. changing fail false in above example) problem disappears, seems point comparison issue, don't quite understand why.

updated noted sébastien issue described here not show on .net 3.5 , 3.5 client profiles. on .net 4.0, 4.0 client , 4.5.

after more digging, noticed if take sorted keys list , run array.binarysearch on keys, returns negative (not found) values same keys not found using sortedlist.containskey. suggest sort order of keys not correct.

if take sorted keys list , sort them using array.sort sort order of output different keys problematic.

so added function check (using list's comparer) if sort order of given array correct (i.e. preceding key smaller, succeeding key larger) , restricted input words distinct according comparer. applied function on 3 different inputs (all using same comparer):

  1. the sortedlist's keys collection.
  2. the output of array.sort on these keys.
  3. the output of array.sort on input file.

the output of (2) , (3) identical , different (1). performing array.binarysearch on array.sort outputs of (2) , (3) again fails find same keys (returning < 0). function checks correct sort order indicates 3 cases, sort order around involved problematic keys not correct.

at point hoping did incredibly stupid , there simple explanation. can point out me.

the example code updated additional troubleshooting experiments, screenshot of output can found here http://imgur.com/du8scsa.

update 2 ok, have narrowed problem down seems me serious problem c# string comparers introduced of .net 4.0.

in summary, suppose have 3 values: a1, a2 , a3. kind of sorting work correctly, expect if a1 < a2 , a2 < a3 in order comparison consistent, consequence following holds: a1 < a3.

this not case c# string comparers (at least comparer<string>.default , stringcomparer.invariantculture).

the little program below illustrates exact problem:

class program {     static void main(string[] args)     {         var comparer = stringcomparer.invariantculture;         var a1 = "a";         var a2 = "a\'";         var a3 = "\'a";         printcomparison("a1", a1, "a2", a2, comparer);         printcomparison("a2", a2, "a3", a3, comparer);         printcomparison("a1", a1, "a3", a3, comparer);         console.readline();     }     public static void printcomparison(string firstsymbol, string first, string secondsymbol, string second, icomparer<string> comparer)     {         var cmp = comparer.compare(first, second);         var result = cmp == 0 ? "=" : cmp < 0 ? "<" : ">";         console.writeline("{0} {1} {2}   ({3} {1} {4})", firstsymbol, result, secondsymbol, first, second);     } } 

this output:

a1 < a2   (a < a') a2 < a3   (a' < 'a) a1 > a3   (a > 'a) 

the conclusion seem not safe rely on sort order determined using c# string comperators, or missing something?

update 3 issue seems have been reported ms in december 2012, , closed status "won't fixed", rather disappointing; refer link posted in comments below (it appears can't post in here due limited reputation points). lists workaround, have implemented , used verify indeed resolves problems observed standard comparers.

public class workaroundstringcomparer : stringcomparer {     private static readonly func<compareinfo, string, compareoptions, int> _gethashcodeofstring;     private readonly compareinfo _compareinfo;     private readonly compareoptions _compareoptions;      static workaroundstringcomparer()     {         // need internal method compute hashcode         // iequalitycomparer implementation.         _gethashcodeofstring = buildgethashcodeofstringdelegate();     }      static func<compareinfo, string, compareoptions, int> buildgethashcodeofstringdelegate()     {         var compareinfotype = typeof(compareinfo);         var argtypes = new[] { typeof(string), typeof(compareoptions) };         var flags = bindingflags.nonpublic | bindingflags.instance;         var methods = compareinfotype.getmethods(flags).toarray(); ;         var method = compareinfotype.getmethod("gethashcodeofstring", flags, null, argtypes, null);          var instance = expression.parameter(compareinfotype, "instance");         var stringarg = expression.parameter(typeof(string), "string");         var optionsarg = expression.parameter(typeof(compareoptions), "options");         var methodcall = expression.call(instance, method, stringarg, optionsarg);         var expr = expression.lambda<func<compareinfo, string, compareoptions, int>>(methodcall, instance, stringarg, optionsarg);         return expr.compile();     }      public workaroundstringcomparer()         : this(cultureinfo.invariantculture)     {     }      public workaroundstringcomparer(cultureinfo cultureinfo, compareoptions compareoptions = compareoptions.none)     {         if (cultureinfo == null)             throw new argumentnullexception("cultureinfo");         this._compareinfo = cultureinfo.compareinfo;         this._compareoptions = compareoptions;     }      public override int compare(string x, string y)     {         if (referenceequals(x, y))             return 0;         if (referenceequals(x, null))             return -1;         if (referenceequals(y, null))             return 1;          var sortkeyfor_x = _compareinfo.getsortkey(x, _compareoptions);         var sortkeyfor_y = _compareinfo.getsortkey(y, _compareoptions);         return sortkey.compare(sortkeyfor_x, sortkeyfor_y);     }      public override bool equals(string x, string y)     {         return compare(x, y) == 0;     }      public override int gethashcode(string obj)     {         return _gethashcodeofstring(_compareinfo, obj, _compareoptions);     } } 

the problem workaround is hardly practical sizable collections, because order of magnitude slower e.g. stringcomparer.invariantculture.

time taken when sorting given word list 1000 times using both comparers:

stringcomparer.invariantculture : 00:00:15.3120013 workaroundstringcomparer        : 00:01:35.8322409 

so still hoping either microsoft reconsiders or knows viable alternative. otherwise option remains falling on using stringcomparer.ordinal.

could related .net framework 4/4.5? have adapted example .net 3.5 this:

var words = readfile("hamlet.txt");  //...  private static string[] readfile(string path) {     list<string> lines = new list<string>();     using (streamreader sr = new streamreader(path))     {         string text = sr.readtoend();         lines.add(text);     }      return lines.selectmany(l => l.split(new[] { ' ', '\t' }, stringsplitoptions.removeemptyentries).select(w => w.trim()))         .where(w => !(w.tochararray().all(c => c == ' ')))         .toarray(); } 

and both comparers work fine on xp using .net 3.5.


Comments