mirror of
https://github.com/curl/curl.git
synced 2026-04-15 01:05:56 +08:00
138 lines
5.5 KiB
Markdown
138 lines
5.5 KiB
Markdown
<!--
|
|
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
|
|
|
SPDX-License-Identifier: curl
|
|
-->
|
|
|
|
# `uint32_t` Sets
|
|
|
|
The multi handle tracks added easy handles via an `uint32_t` it calls an
|
|
`mid`. There are four data structures for `uint32_t` optimized for the multi
|
|
use case.
|
|
|
|
## `uint32_tbl`
|
|
|
|
`uint32_table`, implemented in `uint-table.[ch]` manages an array of `void *`.
|
|
The `uint32_t` is the index into this array. It is created with a *capacity*
|
|
which can be *resized*. The table assigns the index when a `void *` is
|
|
*added*. It keeps track of the last assigned index and uses the next available
|
|
larger index for a subsequent add. Reaching *capacity* it wraps around.
|
|
|
|
The table *can not* store `NULL` values. The largest possible index
|
|
is `UINT32_MAX - 1`.
|
|
|
|
The table is iterated over by asking for the *first* existing index, meaning
|
|
the smallest number that has an entry, if the table is not empty. To get
|
|
the *next* entry, one passes the index of the previous iteration step. It does
|
|
not matter if the previous index is still in the table. Sample code for a table
|
|
iteration would look like this:
|
|
|
|
```c
|
|
uint32_t int mid;
|
|
void *entry;
|
|
|
|
if(Curl_uint32_tbl_first(tbl, &mid, &entry)) {
|
|
do {
|
|
/* operate on entry with index mid */
|
|
}
|
|
while(Curl_uint_tbl_next(tbl, mid, &mid, &entry));
|
|
}
|
|
|
|
```
|
|
|
|
This iteration has the following properties:
|
|
|
|
* entries in the table can be added/removed safely.
|
|
* all entries that are not removed during the iteration are visited.
|
|
* the table may be resized to a larger capacity without affecting visited
|
|
entries.
|
|
* entries added with a larger index than the current are visited.
|
|
|
|
### Memory
|
|
|
|
For storing 1000 entries, the table would allocate one block of 8KB on
|
|
a 64-bit system, plus the 2 pointers and 3 `uint32_t` in its base `struct
|
|
uint32_tbl`. A resize allocates a completely new pointer array, copy
|
|
the existing entries and free the previous one.
|
|
|
|
### Performance
|
|
|
|
Lookups of entries are only an index into the array, O(1) with a tiny 1.
|
|
Adding entries and iterations are more work:
|
|
|
|
1. adding an entry means "find the first free index larger than the previous
|
|
assigned one". Worst case for this is a table with only a single free index
|
|
where `capacity - 1` checks on `NULL` values would be performed, O(N). If
|
|
the single free index is randomly distributed, this would be O(N/2).
|
|
2. iterating a table scans for the first not `NULL` entry after the start
|
|
index. This makes a complete iteration O(N) work.
|
|
|
|
In the multi use case, point 1 is remedied by growing the table so that a good
|
|
chunk of free entries always exists.
|
|
|
|
Point 2 is less of an issue for a multi, since it does not really matter when
|
|
the number of transfer is relatively small. A multi managing a larger set
|
|
needs to operate event based anyway and table iterations rarely are needed.
|
|
|
|
For these reasons, the simple implementation was preferred. Should this become
|
|
a concern, there are options like "free index lists" or, alternatively, an
|
|
internal bitset that scans better.
|
|
|
|
## `uint32_bset`
|
|
|
|
A bitset for `uint32_t` values, allowing fast add/remove operations. It is
|
|
initialized with a *capacity*, meaning it can store only the numbers in the
|
|
range `[0, capacity-1]`. It can be *resized* and safely *iterated*.
|
|
`uint32_bset` is designed to operate in combination with `uint_tbl`.
|
|
|
|
The bitset keeps an array of `uint64_t`. The first array entry keeps the
|
|
numbers 0 to 63, the second 64 to 127 and so on. A bitset with capacity 1024
|
|
would therefore allocate an array of 16 64-bit values (128 bytes). Operations
|
|
for an unsigned int divide it by 64 for the array index and then
|
|
check/set/clear the bit of the remainder.
|
|
|
|
Iterator works the same as with `uint32_tbl`: ask the bitset for the *first*
|
|
number present and then use that to get the *next* higher number present. Like
|
|
the table, this safe for adds/removes and growing the set while iterating.
|
|
|
|
### Memory
|
|
|
|
The set only needs 1 bit for each possible number.
|
|
A bitset for 40000 transfers occupies 5KB of memory.
|
|
|
|
## Performance
|
|
|
|
Operations for add/remove/check are O(1). Iteration needs to scan for the next
|
|
bit set. The number of scans is small (see memory footprint) and, for checking
|
|
bits, many compilers offer primitives for special CPU instructions.
|
|
|
|
## `uint32_spbset`
|
|
|
|
While the memory footprint of `uint32_bset` is good, it still needs 5KB to
|
|
store the single number 40000. This is not optimal when many are needed. For
|
|
example, in event based processing, each socket needs to keep track of the
|
|
transfers involved. There are many sockets potentially, but each one mostly
|
|
tracks a single transfer or few (on HTTP/2 connection borderline up to 100).
|
|
|
|
For such uses cases, the `uint32_spbset` is intended: track a small number of
|
|
unsigned int, potentially rather "close" together. It keeps "chunks" with an
|
|
offset and has no capacity limit.
|
|
|
|
Example: adding the number 40000 to an empty sparse bitset would have one
|
|
chunk with offset 39936, keeping track of the numbers 39936 to 40192 (a chunk
|
|
has 4 64-bit values). The numbers in that range can be handled without further
|
|
allocations.
|
|
|
|
The worst case is then storing 100 numbers that lie in separate intervals.
|
|
Then 100 chunks would need to be allocated and linked, resulting in overall 4
|
|
KB of memory used.
|
|
|
|
Iterating a sparse bitset works the same as for bitset and table.
|
|
|
|
## `uint32_hash`
|
|
|
|
At last, there are places in libcurl such as the HTTP/2 and HTTP/3 protocol
|
|
implementations that need to store their own data related to a transfer.
|
|
`uint32_hash` allows then to associate an unsigned int, e.g. the transfer's
|
|
`mid`, to their own data.
|