BIP draft: UTXO set sharing#2137
Conversation
802cfce to
4bc39c8
Compare
| | `merkle_root` | `uint256` | 32 | Root of the Merkle tree computed over chunk hashes. | | ||
|
|
||
| A requesting node MUST ignore entries whose `serialized_hash` does not match a known | ||
| utxo set hash for the corresponding height. |
There was a problem hiding this comment.
I think it would be better for a client that supports this mechanism to hardcode the merkle root instead of the straight serialized hash, and to drop serialized_hash from this message.
There was a problem hiding this comment.
Hm, yeah, I think that makes sense, I was struggling with finding the right path between building on top of the existing assumeutxo data we already have and extending it. I am adding the merkle root to the assumeutxo data in my PR, so checking the serialized_hash is a belts and suspenders there, so it makes sense to make this explicit here as well.
There was a problem hiding this comment.
If you wanted to change the contents of the utxo dump as well, it would be nice to include the header chain for the block where the snapshot was taken. Then you could do an import without needing to connect to the network at all, I think.
There was a problem hiding this comment.
I think it would be better for a client that supports this mechanism to hardcode the merkle root instead of the straight serialized hash, and to drop serialized_hash from this message.
Dropped serialized_hash mentions from the BIP completely in this new version.
it would be nice to include the header chain for the block where the snapshot was taken
Hm, I find this a bit odd. For the purpose of this BIP we are connected to the network necessarily, so we can just as well fetch the headers over the network in the way we usually do and that should be simpler code-wise. For loading the file directly it would make more sense to me but still we are connecting to the network afterwards. I will need to think about this more but thankfully we have the versioning of the dump file so we can extend the format easily in the future and it wouldn't even need to affect this BIP I think.
| | `block_hash` | `uint256` | 32 | Block hash this data corresponds to. | | ||
| | `chunk_index` | `uint32_t` | 4 | Zero-based index of this chunk. | | ||
| | `proof_length` | `compact_size` | 1–9 | Number of hashes in the Merkle proof. | | ||
| | `proof_hashes` | `uint256[]` | 32 × `proof_length` | Sibling hashes from leaf to root. | |
There was a problem hiding this comment.
Rather than a proof, it might be better to just request getutxoset <height> <hash> 0xFFFFFFFF once to get the full list of chunk hashes -- that should be about 74kB for a 9GB utxo set, and should stay under 4MB until the utxo set is >450GB. Then each chunk is just <height> <hash> <number> <data>.
There was a problem hiding this comment.
Huh, interesting idea to get all the chunk hashes first, I didn't think of that. It might make sense to do this with a separate message type even instead of hacking getutxoset as you described. I will have to think about it a little more.
There was a problem hiding this comment.
Specified the fetching of the chunk hash list now with new message type getutxotree/utxotree messages. I liked this a lot more when combining it with dropping the discovery approach and asking for the specific UTXO set directly. Let me know what you think.
|
I was going to complain that I haven’t seen a discussion about this proposal on the mailing list… but you did that already for me. If you already know that you should send it to the mailing list first, I don’t know why you opened the PR first, though. 😛 |
It wasn't clear to me that a ML post was a prerequisit to open the PR. I just thought it was necessary to do this at some point before the bip could get merged/assigned a number. I think that having a place for more detail oriented commentary makes sense to have in addition to the high level discussion happening on the ML, if ML readers have such feedback but would rather use the more convenient inline commenting/suggestion features in GitHub. I was also looking for feedback on my assumeutxo related question, e.g. can I assume knowledge of a feature that is only implemented in Bitcoin Core or should I define this in the BIP. The ML doesn't seem like the right place to ask about this. I will close this for now and re-open when I have made the ML post and given it some reasonable time for discussion. |
|
Thanks, and sorry, I might have come off as more gruff than intended — tone is hard in written text. Obviously, you’ve been around the block and your proposal reads well-considered, but we have been getting a lot of premature submissions out of the blue here, where then BIP Editors become the first line of feedback. Personally, it’s been taking a growing part of my work hours to even just get through all submissions to the repository. So, we have become a bit more insistent on proposals actually being posted to the list first, and I’d like to avoid giving the impression that I’m playing favorites. A hypothetical optimal order might be:
In your case it sounds like you’d be able to skip directly to 5, and we can of course reopen this PR, when there has been an ML discussion. |
| 1. The requesting node identifies peers advertising `NODE_UTXO_SET`. | ||
| 2. The requesting node sends `getutxosetinfo` to one or more of these peers. | ||
| 3. Each peer responds with `utxosetinfo`. The requesting node verifies that the advertised | ||
| `serialized_hash` matches a known UTXO set hash, compares `merkle_root` values across peers, | ||
| and selects a UTXO set whose Merkle root has agreement among multiple peers. |
There was a problem hiding this comment.
I think this process defeats the point of the service bit. Having "at least one" UTXO set only "works" while there are only a small number of UTXO set options to download. If it's not sufficient to just try peers until you find the UTXO set you want, then we probably need a way to advertise specific UTXO sets.
There was a problem hiding this comment.
Thanks, I do think it should be sufficient to try peers until we find the UTXO set we want and I dropped the discovery approach, see my other comment as well.
| before initiating the process. Including the serialized hash in the advertisement lets the requester | ||
| immediately filter out peers claiming a different UTXO set state before downloading any data. | ||
|
|
||
| **Discovery before download:** The `getutxosetinfo`/`utxosetinfo` exchange lets the requesting node |
There was a problem hiding this comment.
I'm not sure discovery is a useful feature. It enables a misguided implementation to blindly trust whatever nodes provide, and harms node privacy by adding a way to fingerprint nodes.
If you know what you'll accept in advance, you can simply request that snapshot, and the node will either provide it or (potentially) say it can't.
There was a problem hiding this comment.
I dropped the discovery approach, I think fingerprinting potential is still there with some probing but it's still better than with discovery and I think this matches the approach we follow in the network generally, like in BIP157 for example, better as well.
| for buffering and verifying a single chunk. Smaller chunks would increase protocol overhead; larger | ||
| chunks would increase memory pressure on constrained devices commonly used to run Bitcoin nodes. | ||
| Together with the additional message overhead, the `utxoset` message including the chunk data also | ||
| sits just below the theoretical maximum block size which means any implementation should be able to |
There was a problem hiding this comment.
It also happens to sit just below the maximum P2P message size MAX_PROTOCOL_MESSAGE_LENGTH, so it may be clearer to refer to that instead of block size
There was a problem hiding this comment.
Yeah, but this was a contious decision actually. MAX_PROTOCOL_MESSAGE_LENGTH is a Bitcoin Core implementation specific value. A different implementation may have a higher value for this. But every implementation will at least need to be able to receive the biggest possible block. So I think it's better to anchor it to that.
|
|
||
| Sent to discover which UTXO sets a peer can serve. This message has an empty payload. | ||
|
|
||
| A node that has advertised `NODE_UTXO_SET` MUST respond with `utxosetinfo`. A node that has not |
There was a problem hiding this comment.
nit: I think it would be clearer to repeat here that we should disconnect a node that doesn't respond to our getutxoset, as stated above:
A node signaling
NODE_UTXO_SETMUST respond togetutxosetinfomessages and MUST be capable of serving all UTXO sets it advertises in itsutxosetinforesponse. A node that fails to meet these
obligations SHOULD be disconnected.
There was a problem hiding this comment.
With the latest changes I felt it would be more appropriate to remove this general statement instead:
A node that fails to meet these obligations SHOULD be disconnected.
Without discovery it is just harder for us to decide if the peer is misbehaving or not. It is more precise anyway to clarify this explicitly for each situation.
|
|
||
| After all chunks have been received, the node MUST compute the serialized hash and compare it against a | ||
| known UTXO set hash. If this check fails, the node MUST discard all data and | ||
| SHOULD disconnect all peers that advertised the corresponding Merkle root. |
There was a problem hiding this comment.
I'm trying to understand why we "should" disconnect here, but we "must" disconnect if the peer sends us a utxoset message that fails the proof verification (L176). Is it because in the second case, the peer is surely trying to trick us, while in the first case, we probably have a bug on our end?
I'm trying to understand how it can happen that our chunks pass the individual verifications, but the final utxo set hash mismatches, and whether in this case it would be a problem on our end, or peers trying to trick us.
There was a problem hiding this comment.
Yeah, the intention was that we are stricter with a clear inconsistency in what the peer has given us vs. something that is probably a problem on our side. But have revisited the disconnection policies and I think it is a lot more consistent now. If I got everything right we must disconnect if the peer sent us something clearly inconsistent, we should disconnect if the peer sent us we didn't ask for and if evidence points to the issue being on our side I am now not saying anything about disconnecting.
|
Since this is already getting more discussion here than many submissions to the mailing list, I’m going to reopen it and turn it into draft. Please set it to “Ready for Review” when you have posted to the mailing list. |
|
Concept NACK. It's bad enough that nodes are formalizing this off network, but incorporating it into p2p is another level of awful. |
This BIP draft describes the sharing of a full UTXO set via the p2p network.
Design summary:
The one part I am not so sure about yet: This references Bitcoin Core and it’s features or RPCs in a few places now. I am aware that this is not ideal for specification that targets a wider audience but the reality is that assumeutxo seems to be only implemented in Bitcoin Core and mentioning RPCs from the workflow there seems the most clear way to describe this. But I am happy to generalize this further, I would be very happy to receive some guidance what level of referencing assumutxo is acceptable here since it is obviously the main current motivation. One concrete example: The Bitcoin Core PR that I will use as a reference implementation will rely on assumeutxo params rather than comparing multiple peer values of the merkle root. Is this discrepancy an issue?
Mailing List post: https://groups.google.com/g/bitcoindev/c/rThmyI8ZN3Q/m/TJBc1xRbAQAJ
Proposed implementation: bitcoin/bitcoin#35054