feat(format): partitioned bulk ingest API#4317
Conversation
|
Do you want to target this against the spec-1.2.0 branch instead? |
| struct AdbcConnection* connection, const char* target_catalog, | ||
| const char* target_db_schema, const char* target_table, const char* mode, |
There was a problem hiding this comment.
I'd rather we use the existing options here instead of duplicating them into the signature
There was a problem hiding this comment.
one caveat there is that existing options are statement-level in practice rather than connection-level. all these new functions are connection-level. are you also suggesting moving all (or some) to statement-level?
| /// write; that happens at Commit (commit it) or Abort (drop it). | ||
| /// | ||
| /// \since ADBC API revision 1.2.0 | ||
| struct AdbcIngestReceipt { |
There was a problem hiding this comment.
It would be good to explain the receipt/handle explicitly here instead of leaving it implicit from the below definitions.
Additionally I think the comments could generally be cleaned up.
| /// Abort is best-effort. If cleanup is incomplete, the driver | ||
| /// returns a warning status and orphaned storage may remain; it is |
There was a problem hiding this comment.
Perhaps this would leverage ConnectionSetWarningHandler?
There was a problem hiding this comment.
I meant some sort of an "okish" adbc status, but warning handler sounds like a more appropriate solution. unless you think it's not necessary at all.. just documenting it as best-effort could be enough as well.
| /// the driver's responsibility to provide housekeeping (e.g. TTL, | ||
| /// background GC, or documented manual cleanup). Callers may also |
There was a problem hiding this comment.
This sounds more like the semantics of the backend system
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Some initial thoughts.
| AdbcStatusCode (*ConnectionWriteIngestPartition)( | ||
| struct AdbcConnection*, const uint8_t*, size_t, struct ArrowArrayStream*, | ||
| struct AdbcIngestReceipt*, struct AdbcError*); | ||
| AdbcStatusCode (*ConnectionCommitIngestPartitions)( |
There was a problem hiding this comment.
Repeating my comment from the mailing list, is this use of Commit going to be misleading around its relationship with a database transactional commit?
There was a problem hiding this comment.
wdyt about "complete" or maybe "apply"? I'm fine with changing to "complete" if you agree.
| /// Abort is best-effort. If cleanup is incomplete, the driver | ||
| /// returns a warning status and orphaned storage may remain; it is |
There was a problem hiding this comment.
Perhaps this would leverage ConnectionSetWarningHandler?
| /// call Abort if the coordinator crashed and was restarted without | ||
| /// the original receipts. |
There was a problem hiding this comment.
Is it worth distinguishing the case of "Abort an unknown handle" from the case of "something went wrong during Abort" with a specific error code for the former?
95d027c to
310e138
Compare
|
@lidavidm should I leave only the spec changes in the PR against spec branch? or postgres impl as well? |
|
Impl is fine as well, but the branch seems to have picked up a lot of extra commits |
Adds the spec document and adbc.h declarations for partitioned bulk ingest — the write-side mirror of ExecutePartitions/ReadPartition.
310e138 to
c264c82
Compare
|
some of the core cpp adbc manager codebase differs between main and spec-1.2.0. so I left only spec changes here. kept the impl in another branch on my repo. |
demo PR for a new partitioned ingest API.