Save new folder
This commit is contained in:
149
api.dsi.sophal.dz/hr_tickets/Python-3.9.6/Objects/dictnotes.txt
Normal file
149
api.dsi.sophal.dz/hr_tickets/Python-3.9.6/Objects/dictnotes.txt
Normal file
@ -0,0 +1,149 @@
|
||||
NOTES ON DICTIONARIES
|
||||
================================
|
||||
|
||||
Principal Use Cases for Dictionaries
|
||||
------------------------------------
|
||||
|
||||
Passing keyword arguments
|
||||
Typically, one read and one write for 1 to 3 elements.
|
||||
Occurs frequently in normal python code.
|
||||
|
||||
Class method lookup
|
||||
Dictionaries vary in size with 8 to 16 elements being common.
|
||||
Usually written once with many lookups.
|
||||
When base classes are used, there are many failed lookups
|
||||
followed by a lookup in a base class.
|
||||
|
||||
Instance attribute lookup and Global variables
|
||||
Dictionaries vary in size. 4 to 10 elements are common.
|
||||
Both reads and writes are common.
|
||||
|
||||
Builtins
|
||||
Frequent reads. Almost never written.
|
||||
About 150 interned strings (as of Py3.3).
|
||||
A few keys are accessed much more frequently than others.
|
||||
|
||||
Uniquification
|
||||
Dictionaries of any size. Bulk of work is in creation.
|
||||
Repeated writes to a smaller set of keys.
|
||||
Single read of each key.
|
||||
Some use cases have two consecutive accesses to the same key.
|
||||
|
||||
* Removing duplicates from a sequence.
|
||||
dict.fromkeys(seqn).keys()
|
||||
|
||||
* Counting elements in a sequence.
|
||||
for e in seqn:
|
||||
d[e] = d.get(e,0) + 1
|
||||
|
||||
* Accumulating references in a dictionary of lists:
|
||||
|
||||
for pagenumber, page in enumerate(pages):
|
||||
for word in page:
|
||||
d.setdefault(word, []).append(pagenumber)
|
||||
|
||||
Note, the second example is a use case characterized by a get and set
|
||||
to the same key. There are similar use cases with a __contains__
|
||||
followed by a get, set, or del to the same key. Part of the
|
||||
justification for d.setdefault is combining the two lookups into one.
|
||||
|
||||
Membership Testing
|
||||
Dictionaries of any size. Created once and then rarely changes.
|
||||
Single write to each key.
|
||||
Many calls to __contains__() or has_key().
|
||||
Similar access patterns occur with replacement dictionaries
|
||||
such as with the % formatting operator.
|
||||
|
||||
Dynamic Mappings
|
||||
Characterized by deletions interspersed with adds and replacements.
|
||||
Performance benefits greatly from the re-use of dummy entries.
|
||||
|
||||
Data Layout
|
||||
-----------
|
||||
|
||||
Dictionaries are composed of 3 components:
|
||||
The dictobject struct itself
|
||||
A dict-keys object (keys & hashes)
|
||||
A values array
|
||||
|
||||
|
||||
Tunable Dictionary Parameters
|
||||
-----------------------------
|
||||
|
||||
See comments for PyDict_MINSIZE_SPLIT, PyDict_MINSIZE_COMBINED,
|
||||
USABLE_FRACTION and GROWTH_RATE in dictobject.c
|
||||
|
||||
Tune-ups should be measured across a broad range of applications and
|
||||
use cases. A change to any parameter will help in some situations and
|
||||
hurt in others. The key is to find settings that help the most common
|
||||
cases and do the least damage to the less common cases. Results will
|
||||
vary dramatically depending on the exact number of keys, whether the
|
||||
keys are all strings, whether reads or writes dominate, the exact
|
||||
hash values of the keys (some sets of values have fewer collisions than
|
||||
others). Any one test or benchmark is likely to prove misleading.
|
||||
|
||||
While making a dictionary more sparse reduces collisions, it impairs
|
||||
iteration and key listing. Those methods loop over every potential
|
||||
entry. Doubling the size of dictionary results in twice as many
|
||||
non-overlapping memory accesses for keys(), items(), values(),
|
||||
__iter__(), iterkeys(), iteritems(), itervalues(), and update().
|
||||
Also, every dictionary iterates at least twice, once for the memset()
|
||||
when it is created and once by dealloc().
|
||||
|
||||
Dictionary operations involving only a single key can be O(1) unless
|
||||
resizing is possible. By checking for a resize only when the
|
||||
dictionary can grow (and may *require* resizing), other operations
|
||||
remain O(1), and the odds of resize thrashing or memory fragmentation
|
||||
are reduced. In particular, an algorithm that empties a dictionary
|
||||
by repeatedly invoking .pop will see no resizing, which might
|
||||
not be necessary at all because the dictionary is eventually
|
||||
discarded entirely.
|
||||
|
||||
The key differences between this implementation and earlier versions are:
|
||||
1. The table can be split into two parts, the keys and the values.
|
||||
|
||||
2. There is an additional key-value combination: (key, NULL).
|
||||
Unlike (<dummy>, NULL) which represents a deleted value, (key, NULL)
|
||||
represented a yet to be inserted value. This combination can only occur
|
||||
when the table is split.
|
||||
|
||||
3. No small table embedded in the dict,
|
||||
as this would make sharing of key-tables impossible.
|
||||
|
||||
|
||||
These changes have the following consequences.
|
||||
1. General dictionaries are slightly larger.
|
||||
|
||||
2. All object dictionaries of a single class can share a single key-table,
|
||||
saving about 60% memory for such cases.
|
||||
|
||||
Results of Cache Locality Experiments
|
||||
--------------------------------------
|
||||
|
||||
Experiments on an earlier design of dictionary, in which all tables were
|
||||
combined, showed the following:
|
||||
|
||||
When an entry is retrieved from memory, several adjacent entries are also
|
||||
retrieved into a cache line. Since accessing items in cache is *much*
|
||||
cheaper than a cache miss, an enticing idea is to probe the adjacent
|
||||
entries as a first step in collision resolution. Unfortunately, the
|
||||
introduction of any regularity into collision searches results in more
|
||||
collisions than the current random chaining approach.
|
||||
|
||||
Exploiting cache locality at the expense of additional collisions fails
|
||||
to payoff when the entries are already loaded in cache (the expense
|
||||
is paid with no compensating benefit). This occurs in small dictionaries
|
||||
where the whole dictionary fits into a pair of cache lines. It also
|
||||
occurs frequently in large dictionaries which have a common access pattern
|
||||
where some keys are accessed much more frequently than others. The
|
||||
more popular entries *and* their collision chains tend to remain in cache.
|
||||
|
||||
To exploit cache locality, change the collision resolution section
|
||||
in lookdict() and lookdict_string(). Set i^=1 at the top of the
|
||||
loop and move the i = (i << 2) + i + perturb + 1 to an unrolled
|
||||
version of the loop.
|
||||
|
||||
For split tables, the above will apply to the keys, but the value will
|
||||
always be in a different cache line from the key.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user