You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/append-table/blob.md
+49-48Lines changed: 49 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,7 +94,19 @@ For details about the blob file format structure, see [File Format - BLOB]({{< r
94
94
<td>No</td>
95
95
<td style="word-wrap: break-word;">false</td>
96
96
<td>Boolean</td>
97
-
<td>When set to true, the blob field input is treated as a serialized BlobDescriptor. Paimon reads from the descriptor's URI and streams the data into Paimon's blob files in small chunks, avoiding loading the entire blob into memory. This is useful for writing very large blobs that cannot fit in memory. When reading, if set to true, returns the BlobDescriptor bytes; if false, returns actual blob bytes.</td>
97
+
<td>Controls read output format for blob fields. When set to true, queries return serialized BlobDescriptor bytes; when false, queries return actual blob bytes. This option is dynamic and can be changed with <code>ALTER TABLE ... SET</code>.</td>
98
+
</tr>
99
+
<tr>
100
+
<td><h5>blob.stored-descriptor-fields</h5></td>
101
+
<td>No</td>
102
+
<td style="word-wrap: break-word;">(none)</td>
103
+
<td>String</td>
104
+
<td>
105
+
Comma-separated BLOB field names stored as serialized <code>BlobDescriptor</code> bytes inline in normal data files.
106
+
By default, all blob fields store blob bytes in separate <code>.blob</code> files.
107
+
If configured, one table can mix:
108
+
some BLOB fields in <code>.blob</code> files and some as descriptor references.
109
+
</td>
98
110
</tr>
99
111
<tr>
100
112
<td><h5>blob.target-file-size</h5></td>
@@ -217,31 +229,18 @@ SELECT id, name FROM image_table;
217
229
SELECT*FROM image_table WHERE id =1;
218
230
```
219
231
220
-
### Blob Descriptor Mode
232
+
### Blob Read Output Mode (`blob-as-descriptor`)
221
233
222
-
When you want to store references from external blob data (stored in object storage) without loading the entire blob into memory, you can use the `blob-as-descriptor` option:
234
+
`blob-as-descriptor` only controls how blob values are returned when reading.
-- Paimon will read from the descriptor's URI and stream data into Paimon's blob files in small chunks, avoiding loading the entire blob into memory
240
-
INSERT INTO blob_table VALUES (1, 'photo', X'<serialized_blob_descriptor_hex>');
237
+
-- Return descriptor bytes
238
+
ALTERTABLE blob_table SET ('blob-as-descriptor'='true');
239
+
SELECT image FROM blob_table;
241
240
242
-
--Toggle this setting to control read output format:
241
+
--Return actual blob bytes
243
242
ALTERTABLE blob_table SET ('blob-as-descriptor'='false');
244
-
SELECT*FROM blob_table;-- Returns actual blob bytes from Paimon storage
243
+
SELECTimageFROM blob_table;
245
244
```
246
245
247
246
## Java API Usage
@@ -442,17 +441,13 @@ long offset = descriptor.offset(); // Starting position in the file
442
441
long length = descriptor.length(); // Length of the blob data
443
442
```
444
443
445
-
### Blob Descriptor Mode
444
+
### Descriptor-Aware Write Behavior
446
445
447
-
The `blob-as-descriptor` option enables **memory-efficient writing** for very large blobs. When enabled, you provide a `BlobDescriptor` pointing to external data, and Paimon streams the data from the external source into Paimon's `.blob` files without loading the entire blob into memory.
446
+
Paimon write path is descriptor-aware automatically:
448
447
449
-
**How it works:**
450
-
1.**Writing**: You provide a serialized `BlobDescriptor` (containing URI, offset, length) as the blob field value
451
-
2.**Paimon copies the data**: Paimon reads from the descriptor's URI in small chunks (e.g., 1024 bytes at a time) and writes to Paimon's `.blob` files
452
-
3.**Data is stored in Paimon**: The blob data IS copied to Paimon storage, but in a streaming fashion
453
-
454
-
**Key benefit:**
455
-
-**Memory efficiency**: For very large blobs (e.g., gigabyte-sized videos), you don't need to load the entire file into memory. Paimon streams the data incrementally.
448
+
1. For blob fields stored in `.blob` files, input can be either blob bytes or a `BlobDescriptor`.
449
+
2. For fields configured in `blob.stored-descriptor-fields`, Paimon stores descriptor bytes inline in data files (no `.blob` files for those fields), and input must be a descriptor.
450
+
3. This behavior does not depend on `blob-as-descriptor`.
456
451
457
452
```java
458
453
importorg.apache.paimon.catalog.Catalog;
@@ -484,21 +479,21 @@ public class BlobDescriptorExample {
**Reading blob data with different output modes:**
563
555
564
-
The `blob-as-descriptor` option also affects how data is returned when reading:
556
+
The `blob-as-descriptor` option affects only read output:
565
557
566
558
```sql
567
559
-- When blob-as-descriptor = true: Returns BlobDescriptor bytes (reference to Paimon blob file)
@@ -573,21 +565,30 @@ ALTER TABLE video_table SET ('blob-as-descriptor' = 'false');
573
565
SELECT*FROM video_table; -- Returns actual blob bytes from Paimon storage
574
566
```
575
567
568
+
### Descriptor Fields: Reuse by Descriptor (No Copy)
569
+
570
+
If you want downstream tables to **reuse** upstream blob files (no copying and no new <code>.blob</code> files), configure the target blob field(s):
571
+
572
+
```sql
573
+
'blob.stored-descriptor-fields'='image'
574
+
```
575
+
576
+
For these configured fields, Paimon stores only serialized <code>BlobDescriptor</code> bytes in normal data files. Reading the blob follows the descriptor URI to access bytes, and writing requires descriptor input for those fields.
577
+
576
578
## Limitations
577
579
578
-
1.**Single Blob Field**: Currently, only one blob field per table is supported.
579
-
2.**Append Table Only**: Blob type is designed for append-only tables. Primary key tables are not supported.
580
-
3.**No Predicate Pushdown**: Blob columns cannot be used in filter predicates.
581
-
4.**No Statistics**: Statistics collection is not supported for blob columns.
582
-
5.**Required Options**: `row-tracking.enabled` and `data-evolution.enabled` must be set to `true`.
580
+
1.**Append Table Only**: Blob type is designed for append-only tables. Primary key tables are not supported.
581
+
2.**No Predicate Pushdown**: Blob columns cannot be used in filter predicates.
582
+
3.**No Statistics**: Statistics collection is not supported for blob columns.
583
+
4.**Required Options**: `row-tracking.enabled` and `data-evolution.enabled` must be set to `true`.
583
584
584
585
## Best Practices
585
586
586
587
1.**Use Column Projection**: Always select only the columns you need. Avoid `SELECT *` if you don't need blob data.
587
588
588
589
2.**Set Appropriate Target File Size**: Configure `blob.target-file-size` based on your blob sizes. Larger values mean fewer files but larger individual files.
589
590
590
-
3.**Consider Descriptor Mode**: For very large blobs that cannot fit in memory, use `blob-as-descriptor` mode to stream data from external sources into Paimon without loading the entire blob into memory.
591
+
3.**Use Descriptor Fields When Reusing External Blob Files**: Configure `blob.stored-descriptor-fields` for fields that should keep descriptor references instead of writing new `.blob` files.
591
592
592
593
4.**Use Partitioning**: Partition your blob tables by date or other dimensions to improve query performance and data management.
0 commit comments