Fast and safe weak value dictionary
AUTHORS:
Python’s weakref module provides WeakValueDictionary. This behaves similar to a dictionary, but it does not prevent its values from garbage collection. Hence, it stores the values by weak references with callback functions: The callback function deletes a key-value pair from the dictionary, as soon as the value becomes subject to garbage collection.
However, a problem arises if hash and comparison of the key depend on the value that is being garbage collected:
sage: import weakref
sage: class Vals(object): pass
sage: class Keys:
....: def __init__(self, val):
....: self.val = weakref.ref(val)
....: def __hash__(self):
....: return hash(self.val())
....: def __eq__(self, other):
....: return self.val() == other.val()
sage: ValList = [Vals() for _ in range(10)]
sage: D = weakref.WeakValueDictionary()
sage: for v in ValList:
....: D[Keys(v)] = v
sage: len(D)
10
sage: del ValList, v
Exception KeyError: (<__main__.Keys instance at ...>,) in <function remove at ...> ignored
Exception KeyError: (<__main__.Keys instance at ...>,) in <function remove at ...> ignored
Exception KeyError: (<__main__.Keys instance at ...>,) in <function remove at ...> ignored
Exception KeyError: (<__main__.Keys instance at ...>,) in <function remove at ...> ignored
...
sage: len(D) > 1
True
Hence, there are scary error messages, and moreover the defunct items have not been removed from the dictionary.
Therefore, Sage provides an alternative implementation sage.misc.weak_dict.WeakValueDictionary, using a callback that removes the defunct item not based on hash and equality check of the key (this is what fails in the example above), but based on comparison by identity. This is possible, since references with callback function are distinct even if they point to the same object. Hence, even if the same object O occurs as value for several keys, each reference to O corresponds to a unique key. We see no error messages, and the items get correctly removed:
sage: ValList = [Vals() for _ in range(10)]
sage: import sage.misc.weak_dict
sage: D = sage.misc.weak_dict.WeakValueDictionary()
sage: for v in ValList:
....: D[Keys(v)] = v
sage: len(D)
10
sage: del ValList
sage: len(D)
1
sage: del v
sage: len(D)
0
Another problem arises when iterating over the items of a dictionary: If garbage collection occurs during iteration, then the content of the dictionary changes, and the iteration breaks for weakref.WeakValueDictionary:
sage: class Cycle:
....: def __init__(self):
....: self.selfref = self
sage: C = [Cycle() for n in range(10)]
sage: D = weakref.WeakValueDictionary(enumerate(C))
sage: import gc
sage: gc.disable()
sage: del C[:5]
sage: len(D)
10
sage: for k in D.iterkeys():
....: gc.enable()
....: _ = gc.collect()
Traceback (most recent call last):
...
RuntimeError: dictionary changed size during iteration
With WeakValueDictionary, the behaviour is safer. Note that iteration over a WeakValueDictionary is non-deterministic, since the lifetime of values (and hence the presence of keys) in the dictionary may depend on when garbage collection occurs. The method implemented here will at least postpone dictionary mutations due to garbage collection callbacks. This means that as long as there is at least one iterator active on a dictionary, none of its keys will be deallocated (which could have side-effects). Which entries are returned is of course still dependent on when garbage collection occurs. Note that when a key gets returned as “present” in the dictionary, there is no guarantee one can actually retrieve its value: it may have been garbage collected in the mean time.
Note that Sage’s weak value dictionary is actually an instance of dict, in contrast to weakref‘s weak value dictionary:
sage: issubclass(weakref.WeakValueDictionary, dict)
False
sage: issubclass(sage.misc.weak_dict.WeakValueDictionary, dict)
True
See trac ticket #13394 for a discussion of some of the design considerations.
Bases: object
Erases items from a sage.misc.weak_dict.WeakValueDictionary when a weak reference becomes invalid.
This is of internal use only. Instances of this class will be passed as a callback function when creating a weak reference.
EXAMPLES:
sage: from sage.misc.weak_dict import WeakValueDictionary
sage: v = frozenset([1])
sage: D = WeakValueDictionary({1 : v})
sage: len(D)
1
sage: del v
sage: len(D)
0
AUTHOR:
- Nils Bruin (2013-11)
Bases: dict
IMPLEMENTATION:
The WeakValueDictionary inherits from dict. In its implementation, it stores weakrefs to the actual values under the keys. All access routines are wrapped to transparently place and remove these weakrefs.
NOTE:
In contrast to weakref.WeakValueDictionary in Python’s weakref module, the callback does not need to assume that the dictionary key is a valid Python object when it is called. There is no need to compute the hash or compare the dictionary keys. This is why the example below would not work with weakref.WeakValueDictionary, but does work with sage.misc.weak_dict.WeakValueDictionary.
EXAMPLES:
sage: import weakref
sage: class Vals(object): pass
sage: class Keys:
....: def __init__(self, val):
....: self.val = weakref.ref(val)
....: def __hash__(self):
....: return hash(self.val())
....: def __eq__(self, other):
....: return self.val() == other.val()
sage: ValList = [Vals() for _ in range(10)]
sage: import sage.misc.weak_dict
sage: D = sage.misc.weak_dict.WeakValueDictionary()
sage: for v in ValList:
....: D[Keys(v)] = v
sage: len(D)
10
sage: del ValList
sage: len(D)
1
sage: del v
sage: len(D)
0
TESTS:
The following reflects the behaviour of the callback on weak dict values, as discussed on trac ticket #13394.
sage: from sage.misc.weak_dict import WeakValueDictionary
sage: V = [set(range(n)) for n in range(5)]
sage: D = WeakValueDictionary(enumerate(V))
The line V[k] = None triggers execution of the callback functions of the dictionary values. However, the actual deletion is postponed till after the iteration over the dictionary has finished. Hence, when the callbacks are executed, the values which the callback belongs to has already been overridded by a new value. Therefore, the callback does not delete the item:
sage: for k in D.iterkeys(): # indirect doctest
....: V[k] = None
....: D[k] = ZZ
sage: len(D)
5
sage: D[1]
Integer Ring
The following is a stress test for weak value dictionaries:
sage: class C(object):
....: def __init__(self, n):
....: self.n = n
....: def __cmp__(self, other):
....: return cmp(type(self),type(other)) or cmp(self.n, other.n)
sage: B = 100
sage: L = [None]*B
sage: D1 = WeakValueDictionary()
sage: D2 = WeakValueDictionary()
sage: for i in range(10000):
....: ki = floor(random()*B)
....: vi = C(floor(random()*B))
....: D1[ki] = vi
....: D2[ki] = vi
....: L[ki] = vi
....: del vi
....: ko = floor(random()*B)
....: if ko in D1:
....: del D1[ko]
....: L[ko] = None
....: assert D1 == D2
Return the stored value for a key, or a default value for unkown keys.
The default value defaults to None.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: L = [GF(p) for p in prime_range(10^3)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(enumerate(L))
sage: 100 in D
True
sage: 200 in D
False
sage: D.get(100, "not found")
Finite Field of size 547
sage: D.get(200, "not found")
'not found'
sage: D.get(200) is None
True
Returns True, if the key is known to the dictionary.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: class Vals(object): pass
sage: L = [Vals() for _ in range(10)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(enumerate(L))
sage: D.has_key(3)
True
As usual, keys are compared by equality and not by identity:
sage: D.has_key(int(3))
True
This is a weak value dictionary. Hence, the existence of the dictionary does not prevent the values from garbage collection, thereby removing the corresponding key-value pairs:
sage: del L[3]
sage: D.has_key(3)
False
The key-value pairs of this dictionary.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: class Vals:
....: def __init__(self, n):
....: self.n = n
....: def __repr__(self):
....: return "<%s>"%self.n
....: def __cmp__(self, other):
....: c = cmp(type(self),type(other))
....: if c:
....: return c
....: return cmp(self.n,other.n)
sage: class Keys(object):
....: def __init__(self, n):
....: self.n = n
....: def __hash__(self):
....: if self.n%2:
....: return 5
....: return 3
....: def __repr__(self):
....: return "[%s]"%self.n
....: def __cmp__(self, other):
....: c = cmp(type(self),type(other))
....: if c:
....: return c
....: return cmp(self.n,other.n)
sage: L = [(Keys(n), Vals(n)) for n in range(10)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
We remove one dictionary item directly. Another item is removed by means of garbage collection. By consequence, there remain eight items in the dictionary:
sage: del D[Keys(2)]
sage: del L[5]
sage: sorted(D.items())
[([0], <0>),
([1], <1>),
([3], <3>),
([4], <4>),
([6], <6>),
([7], <7>),
([8], <8>),
([9], <9>)]
Iterate over the items of this dictionary.
Warning
Iteration is unsafe, if the length of the dictionary changes during the iteration! This can also happen by garbage collection.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: class Vals:
....: def __init__(self, n):
....: self.n = n
....: def __repr__(self):
....: return "<%s>"%self.n
....: def __cmp__(self, other):
....: c = cmp(type(self),type(other))
....: if c:
....: return c
....: return cmp(self.n,other.n)
sage: class Keys(object):
....: def __init__(self, n):
....: self.n = n
....: def __hash__(self):
....: if self.n%2:
....: return 5
....: return 3
....: def __repr__(self):
....: return "[%s]"%self.n
....: def __cmp__(self, other):
....: c = cmp(type(self),type(other))
....: if c:
....: return c
....: return cmp(self.n,other.n)
sage: L = [(Keys(n), Vals(n)) for n in range(10)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
We remove one dictionary item directly. Another item is removed by means of garbage collection. By consequence, there remain eight items in the dictionary:
sage: del D[Keys(2)]
sage: del L[5]
sage: for k,v in sorted(D.iteritems()):
....: print k, v
[0] <0>
[1] <1>
[3] <3>
[4] <4>
[6] <6>
[7] <7>
[8] <8>
[9] <9>
Iterate over the keys of this dictionary.
Warning
Iteration is unsafe, if the length of the dictionary changes during the iteration! This can also happen by garbage collection.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: class Vals(object): pass
sage: L = [Vals() for _ in range(10)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(enumerate(L))
sage: del L[4]
One item got deleted from the list L and hence the corresponding item in the dictionary got deleted as well. Therefore, the corresponding key 4 is missing in the list of keys:
sage: list(sorted(D.iterkeys()))
[0, 1, 2, 3, 5, 6, 7, 8, 9]
Iterate over the values of this dictionary.
Warning
Iteration is unsafe, if the length of the dictionary changes during the iteration! This can also happen by garbage collection.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: class Vals:
....: def __init__(self, n):
....: self.n = n
....: def __repr__(self):
....: return "<%s>"%self.n
....: def __cmp__(self, other):
....: c = cmp(type(self),type(other))
....: if c:
....: return c
....: return cmp(self.n,other.n)
sage: L = [Vals(n) for n in range(10)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(enumerate(L))
We delete one item from D and we delete one item from the list L. The latter implies that the corresponding item from D gets deleted as well. Hence, there remain eight values:
sage: del D[2]
sage: del L[5]
sage: for v in sorted(D.itervalues()):
....: print v
<0>
<1>
<3>
<4>
<6>
<7>
<8>
<9>
The list of keys.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: class Vals(object): pass
sage: L = [Vals() for _ in range(10)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(enumerate(L))
sage: del L[4]
One item got deleted from the list L and hence the corresponding item in the dictionary got deleted as well. Therefore, the corresponding key 4 is missing in the list of keys:
sage: sorted(D.keys())
[0, 1, 2, 3, 5, 6, 7, 8, 9]
Return the value for a given key, and delete it from the dictionary.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: L = [GF(p) for p in prime_range(10^3)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(enumerate(L))
sage: 20 in D
True
sage: D.pop(20)
Finite Field of size 73
sage: 20 in D
False
sage: D.pop(20)
Traceback (most recent call last):
...
KeyError: 20
Return and delete some item from the dictionary.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: D = sage.misc.weak_dict.WeakValueDictionary()
sage: D[1] = ZZ
The dictionary only contains a single item, hence, it is clear which one will be returned:
sage: D.popitem()
(1, Integer Ring)
Now, the dictionary is empty, and hence the next attempt to pop an item will fail with a KeyError:
sage: D.popitem()
Traceback (most recent call last):
...
KeyError: 'popitem(): weak value dictionary is empty'
Return the stored value for a given key; return and store a default value if no previous value is stored.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: L = [(p,GF(p)) for p in prime_range(10)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
sage: len(D)
4
The value for an existing key is returned and not overridden:
sage: D.setdefault(5, ZZ)
Finite Field of size 5
sage: D[5]
Finite Field of size 5
For a non-existing key, the default value is stored and returned:
sage: D.has_key(4)
False
sage: D.setdefault(4, ZZ)
Integer Ring
sage: D.has_key(4)
True
sage: D[4]
Integer Ring
sage: len(D)
5
Return the list of values.
EXAMPLES:
sage: import sage.misc.weak_dict
sage: class Vals:
....: def __init__(self, n):
....: self.n = n
....: def __repr__(self):
....: return "<%s>"%self.n
....: def __cmp__(self, other):
....: c = cmp(type(self),type(other))
....: if c:
....: return c
....: return cmp(self.n,other.n)
sage: L = [Vals(n) for n in range(10)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(enumerate(L))
We delete one item from D and we delete one item from the list L. The latter implies that the corresponding item from D gets deleted as well. Hence, there remain eight values:
sage: del D[2]
sage: del L[5]
sage: sorted(D.values())
[<0>, <1>, <3>, <4>, <6>, <7>, <8>, <9>]
This function helps testing some cdef function used to delete dictionary items.
INPUT:
The underlying cdef function deletes an item from D that is in the hash bucket determined by h and whose value is identic with value. Of course, this only makes sense if the pairs (h, value) corresponding to items in D are pair-wise distinct.
If a matching item can not be found, the function does nothing and silently returns.
TESTS:
See trac ticket #13394 for a discussion.
sage: from sage.misc.weak_dict import test_del_dictitem_by_exact_value
sage: B=1000
sage: L=list(range(B))
sage: D1=dict()
sage: D2=dict()
sage: for i in range(100000): # long time
....: ki=L[floor(random()*B)]
....: vi=L[floor(random()*B)]
....: D1[ki]=vi
....: D2[ki]=vi
....: ko=L[floor(random()*B)]
....: if ko in D1:
....: vo=D1[ko]
....: del D1[ko]
....: test_del_dictitem_by_exact_value(D2,vo,hash(ko))
....: assert D1 == D2
No action is taken if the item prescribed by key hash and value does not exist in the dictionary:
sage: D = {1: ZZ}
sage: test_del_dictitem_by_exact_value(D, ZZ, 2)
sage: D
{1: Integer Ring}
sage: test_del_dictitem_by_exact_value(D, QQ, 1)
sage: D
{1: Integer Ring}