5.5 KiB
uint32_t Sets
The multi handle tracks added easy handles via an uint32_t it calls an
mid. There are four data structures for uint32_t optimized for the multi
use case.
uint32_tbl
uint32_table, implemented in uint-table.[ch] manages an array of void *.
The uint32_t is the index into this array. It is created with a capacity
which can be resized. The table assigns the index when a void * is
added. It keeps track of the last assigned index and uses the next available
larger index for a subsequent add. Reaching capacity it wraps around.
The table can not store NULL values. The largest possible index
is UINT32_MAX - 1.
The table is iterated over by asking for the first existing index, meaning the smallest number that has an entry, if the table is not empty. To get the next entry, one passes the index of the previous iteration step. It does not matter if the previous index is still in the table. Sample code for a table iteration would look like this:
uint32_t int mid;
void *entry;
if(Curl_uint32_tbl_first(tbl, &mid, &entry)) {
do {
/* operate on entry with index mid */
}
while(Curl_uint_tbl_next(tbl, mid, &mid, &entry));
}
This iteration has the following properties:
- entries in the table can be added/removed safely.
- all entries that are not removed during the iteration are visited.
- the table may be resized to a larger capacity without affecting visited entries.
- entries added with a larger index than the current are visited.
Memory
For storing 1000 entries, the table would allocate one block of 8KB on
a 64-bit system, plus the 2 pointers and 3 uint32_t in its base struct uint32_tbl. A resize allocates a completely new pointer array, copy
the existing entries and free the previous one.
Performance
Lookups of entries are only an index into the array, O(1) with a tiny 1. Adding entries and iterations are more work:
- adding an entry means "find the first free index larger than the previous
assigned one". Worst case for this is a table with only a single free index
where
capacity - 1checks onNULLvalues would be performed, O(N). If the single free index is randomly distributed, this would be O(N/2). - iterating a table scans for the first not
NULLentry after the start index. This makes a complete iteration O(N) work.
In the multi use case, point 1 is remedied by growing the table so that a good chunk of free entries always exists.
Point 2 is less of an issue for a multi, since it does not really matter when the number of transfer is relatively small. A multi managing a larger set needs to operate event based anyway and table iterations rarely are needed.
For these reasons, the simple implementation was preferred. Should this become a concern, there are options like "free index lists" or, alternatively, an internal bitset that scans better.
uint32_bset
A bitset for uint32_t values, allowing fast add/remove operations. It is
initialized with a capacity, meaning it can store only the numbers in the
range [0, capacity-1]. It can be resized and safely iterated.
uint32_bset is designed to operate in combination with uint_tbl.
The bitset keeps an array of uint64_t. The first array entry keeps the
numbers 0 to 63, the second 64 to 127 and so on. A bitset with capacity 1024
would therefore allocate an array of 16 64-bit values (128 bytes). Operations
for an unsigned int divide it by 64 for the array index and then
check/set/clear the bit of the remainder.
Iterator works the same as with uint32_tbl: ask the bitset for the first
number present and then use that to get the next higher number present. Like
the table, this safe for adds/removes and growing the set while iterating.
Memory
The set only needs 1 bit for each possible number. A bitset for 40000 transfers occupies 5KB of memory.
Performance
Operations for add/remove/check are O(1). Iteration needs to scan for the next bit set. The number of scans is small (see memory footprint) and, for checking bits, many compilers offer primitives for special CPU instructions.
uint32_spbset
While the memory footprint of uint32_bset is good, it still needs 5KB to
store the single number 40000. This is not optimal when many are needed. For
example, in event based processing, each socket needs to keep track of the
transfers involved. There are many sockets potentially, but each one mostly
tracks a single transfer or few (on HTTP/2 connection borderline up to 100).
For such uses cases, the uint32_spbset is intended: track a small number of
unsigned int, potentially rather "close" together. It keeps "chunks" with an
offset and has no capacity limit.
Example: adding the number 40000 to an empty sparse bitset would have one chunk with offset 39936, keeping track of the numbers 39936 to 40192 (a chunk has 4 64-bit values). The numbers in that range can be handled without further allocations.
The worst case is then storing 100 numbers that lie in separate intervals. Then 100 chunks would need to be allocated and linked, resulting in overall 4 KB of memory used.
Iterating a sparse bitset works the same as for bitset and table.
uint32_hash
At last, there are places in libcurl such as the HTTP/2 and HTTP/3 protocol
implementations that need to store their own data related to a transfer.
uint32_hash allows then to associate an unsigned int, e.g. the transfer's
mid, to their own data.