{dsbulk} is a utility that you can use to load, unload, and count data in your database tables.
The {product} provides embedded {dsbulk-short} support by downloading, installing, configuring, and wrapping the dsbulk utility.
The {product} exposes {dsbulk-short} functionality through the following commands:
-
commands:astra-db-dsbulk-load.adoc -
commands:astra-db-dsbulk-unload.adoc -
commands:astra-db-dsbulk-count.adoc
The first time you use one of these commands, the {product} downloads and installs the dsbulk utility to the {product} home directory (~/.astra).
The {product} also downloads the {scb} for each database you connect to and stores the {scb-short} zip files.
Use the commands:astra-db-dsbulk-load.adoc command to load data from a file into a database table:
astra db dsbulk load DB_ID -k KEYSPACE_NAME -t TABLE_NAME --url FILE_LOCATIONResult
[INFO] Downloading Dsbulk, please wait...
[INFO] Installing archive, please wait...
[INFO] RUNNING: /Users/USERNAME/.astra/dsbulk-1.11.0/bin/dsbulk load -u token -p AstraCS:FZm... -b /Users/USERNAME/.astra/scb/scb_91b35105-a5aa-4cd5-a93b-900ac58452ba_us-east1.zip -k dsbulk_demo_keyspace -t cities_by_country -logDir ./logs --log.verbosity normal --schema.allowMissingFields true -maxConcurrentQueries AUTO -delim , -url cities.csv -header true -encoding UTF-8 -skipRecords 0 -maxErrors 100
[INFO] DSBulk is starting please wait ...
Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
Operation directory: /Users/USERNAME/logs/LOAD_20250123-020734-995267
Setting executor.maxPerSecond not set when connecting to DataStax Astra: applying a limit of 27,000 ops/second based on the number of coordinators (9).
If your Astra database has higher limits, please define executor.maxPerSecond explicitly.
total | failed | rows/s | p50ms | p99ms | p999ms | batches
148,266 | 0 | 19,588 | 240.82 | 771.75 | 964.69 | 30.72
Operation LOAD_20250123-020734-995267 completed successfully in 7 seconds.
Checkpoints for the current operation were written to checkpoint.csv.
To resume the current operation, re-run it with the same settings, and add the following command line flag:
--dsbulk.log.checkpoint.file=/Users/USERNAME/logs/LOAD_20250123-020734-995267/checkpoint.csv|
Tip
|
You can use the astra db cqlsh exec DB_ID "SELECT * FROM KEYSPACE_NAME.TABLE_NAME LIMIT 20;"Result[INFO] Cqlsh is starting, please wait for connection establishment...
country_name | name | country_code | country_id | id | latitude | longitude | state_code | state_id | state_name | wikidataid
--------------+---------------------+--------------+------------+------+----------+-----------+------------+----------+---------------------+------------
Bangladesh | Azimpur | BD | 19 | 8454 | 23.7298 | 90.3854 | 13 | 771 | Dhaka District | null
Bangladesh | Badarganj | BD | 19 | 8455 | 25.67419 | 89.05377 | 55 | 759 | Rangpur District | null
Bangladesh | Bagerhat | BD | 19 | 8456 | 22.4 | 89.75 | 27 | 811 | Khulna District | null
Bangladesh | Bandarban | BD | 19 | 8457 | 22 | 92.33333 | B | 803 | Chittagong Division | null
Bangladesh | Baniachang | BD | 19 | 8458 | 24.51863 | 91.35787 | 60 | 767 | Sylhet District | null
Bangladesh | Barguna | BD | 19 | 8459 | 22.13333 | 90.13333 | 06 | 818 | Barisal District | null
Bangladesh | Barisal | BD | 19 | 8460 | 22.8 | 90.5 | 06 | 818 | Barisal District | null
Bangladesh | Bera | BD | 19 | 8462 | 24.07821 | 89.63262 | 54 | 813 | Rajshahi District | null
Bangladesh | Bhairab Bāzār | BD | 19 | 8463 | 24.0524 | 90.9764 | 13 | 771 | Dhaka District | null
Bangladesh | Bherāmāra | BD | 19 | 8464 | 24.02452 | 88.99234 | 27 | 811 | Khulna District | null
Bangladesh | Bhola | BD | 19 | 8465 | 22.36667 | 90.81667 | 06 | 818 | Barisal District | null
Bangladesh | Bhāndāria | BD | 19 | 8466 | 22.48898 | 90.06273 | 06 | 818 | Barisal District | null
Bangladesh | Bhātpāra Abhaynagar | BD | 19 | 8467 | 23.01472 | 89.43936 | 27 | 811 | Khulna District | null
Bangladesh | Bibir Hat | BD | 19 | 8468 | 22.68347 | 91.79058 | B | 803 | Chittagong Division | null
Bangladesh | Bogra | BD | 19 | 8469 | 24.78333 | 89.35 | 54 | 813 | Rajshahi District | null
Bangladesh | Brahmanbaria | BD | 19 | 8470 | 23.98333 | 91.16667 | B | 803 | Chittagong Division | null
Bangladesh | Burhānuddin | BD | 19 | 8471 | 22.49518 | 90.72391 | 06 | 818 | Barisal District | null
Bangladesh | Bājitpur | BD | 19 | 8472 | 24.21623 | 90.95002 | 13 | 771 | Dhaka District | null
Bangladesh | Chandpur | BD | 19 | 8474 | 23.25 | 90.83333 | B | 803 | Chittagong Division | null
Bangladesh | Chapai Nababganj | BD | 19 | 8475 | 24.68333 | 88.25 | 54 | 813 | Rajshahi District | null
(20 rows) |
Use the commands:astra-db-dsbulk-unload.adoc command to unload database table rows into a file:
astra db dsbulk unload DB_ID -k KEYSPACE_NAME -t TABLE_NAME --url FILE_LOCATIONResult
[INFO] RUNNING: /Users/USERNAME/.astra/dsbulk-1.11.0/bin/dsbulk unload -u token -p AstraCS:FZm... -b /Users/USERNAME/.astra/scb/scb_91b35105-a5aa-4cd5-a93b-900ac58452ba_us-east1.zip -k dsbulk_demo_keyspace -t cities_by_country -logDir ./logs --log.verbosity normal --schema.allowMissingFields true -maxConcurrentQueries AUTO -delim , -url unloaded_data -header true -encoding UTF-8 -skipRecords 0 -maxErrors 100
[INFO] DSBulk is starting please wait ...
Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
Operation directory: /Users/USERNAME/logs/UNLOAD_20250123-021231-557959
total | failed | rows/s | p50ms | p99ms | p999ms
134,574 | 0 | 29,767 | 281.25 | 591.40 | 591.40
Operation UNLOAD_20250123-021231-557959 completed successfully in 4 seconds.
Checkpoints for the current operation were written to checkpoint.csv.
To resume the current operation, re-run it with the same settings, and add the following command line flag:
--dsbulk.log.checkpoint.file=/Users/USERNAME/logs/UNLOAD_20250123-021231-557959/checkpoint.csvUse the commands:astra-db-dsbulk-count.adoc command to get information about loaded data:
astra db dsbulk count DB_ID -k KEYSPACE_NAME -t TABLE_NAMEResult
[INFO] RUNNING: /Users/USERNAME/.astra/dsbulk-1.11.0/bin/dsbulk count -u token -p AstraCS:FZm... -b /Users/USERNAME/.astra/scb/scb_91b35105-a5aa-4cd5-a93b-900ac58452ba_us-east1.zip -k dsbulk_demo_keyspace -t cities_by_country -logDir ./logs --log.verbosity normal --schema.allowMissingFields true -maxConcurrentQueries AUTO
[INFO] DSBulk is starting please wait ...
Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
Operation directory: /Users/USERNAME/logs/COUNT_20250123-021120-127216
total | failed | rows/s | p50ms | p99ms | p999ms
134,574 | 0 | 65,674 | 213.18 | 329.25 | 329.25
Operation COUNT_20250123-021120-127216 completed successfully in 1 second.
Checkpoints for the current operation were written to checkpoint.csv.
To resume the current operation, re-run it with the same settings, and add the following command line flag:
--dsbulk.log.checkpoint.file=/Users/USERNAME/logs/COUNT_20250123-021120-127216/checkpoint.csv
134574The following end-to-end example shows how to use the {product}'s built-in {dsbulk-short} support to load data into a database, get information about the data, and unload the data into CSV files:
-
Create an {astra-db} {db-serverless} database:
astra db create dsbulk_demo_db -r us-east1 -k dsbulk_demo_keyspace
Result
Database 'dsbulk_demo_db' has been created with id '8b8fea68-404e-4f12-9a79-02079060adfa'. It is now active after waiting 433 seconds. -
Download the cities.csv file and move it to the directory where you run {product} commands.
The
cities.csvdataset contains information about cities around the world:cities.csvid,name,state_id,state_code,state_name,country_id,country_code,country_name,latitude,longitude,wikiDataId 52,Ashkāsham,3901,BDS,Badakhshan,1,AF,Afghanistan,36.68333000,71.53333000,Q4805192 68,Fayzabad,3901,BDS,Badakhshan,1,AF,Afghanistan,37.11664000,70.58002000,Q156558 ...
-
Create a table in your database to store your data.
-
Start
cqlshin interactive mode:astra db cqlsh start dsbulk_demo_db -k dsbulk_demo_keyspace
Result
Connected to cndb at 127.0.0.1:9042. [cqlsh 6.8.0 | Cassandra 4.0.0.6816 | CQL spec 3.4.5 | Native protocol v4] Use HELP for help. token@cqlsh:dsbulk_demo_keyspace>
-
Copy and paste the following CQL statement into the
cqlshprompt and press kbd:[Enter]:CREATE TABLE cities_by_country ( country_name text, name text, id int, state_id text, state_code text, state_name text, country_id text, country_code text, latitude double, longitude double, wikiDataId text, PRIMARY KEY ((country_name), name) );
This CQL statement creates a table named
cities_by_countrywith the appropriate schema for thecities.csvdataset. -
Type
exitorquit;and press kbd:[Enter] to exitcqlsh.
-
-
Load the data from the
cities.csvfile into thecities_by_countrytable that you just created in your database:astra db dsbulk load dsbulk_demo_db -k dsbulk_demo_keyspace -t cities_by_country --url cities.csv
Result
[INFO] RUNNING: /Users/USERNAME/.astra/dsbulk-1.11.0/bin/dsbulk load -u token -p AstraCS:FZm... -b /Users/USERNAME/.astra/scb/scb_91b35105-a5aa-4cd5-a93b-900ac58452ba_us-east1.zip -k dsbulk_demo_keyspace -t cities_by_country -logDir ./logs --log.verbosity normal --schema.allowMissingFields true -maxConcurrentQueries AUTO -delim , -url cities.csv -header true -encoding UTF-8 -skipRecords 0 -maxErrors 100 [INFO] DSBulk is starting please wait ... Username and password provided but auth provider not specified, inferring PlainTextAuthProvider A cloud secure connect bundle was provided: ignoring all explicit contact points. A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM. Operation directory: /Users/USERNAME/logs/LOAD_20250123-020734-995267 Setting executor.maxPerSecond not set when connecting to DataStax Astra: applying a limit of 27,000 ops/second based on the number of coordinators (9). If your Astra database has higher limits, please define executor.maxPerSecond explicitly. total | failed | rows/s | p50ms | p99ms | p999ms | batches 148,266 | 0 | 19,588 | 240.82 | 771.75 | 964.69 | 30.72 Operation LOAD_20250123-020734-995267 completed successfully in 7 seconds. Checkpoints for the current operation were written to checkpoint.csv. To resume the current operation, re-run it with the same settings, and add the following command line flag: --dsbulk.log.checkpoint.file=/Users/USERNAME/logs/LOAD_20250123-020734-995267/checkpoint.csv
-
Confirm that the data loaded successfully:
astra db cqlsh exec dsbulk_demo_db "select * from dsbulk_demo_keyspace.cities_by_country LIMIT 20;"
Result
[INFO] Cqlsh is starting, please wait for connection establishment... country_name | name | country_code | country_id | id | latitude | longitude | state_code | state_id | state_name | wikidataid --------------+---------------------+--------------+------------+------+----------+-----------+------------+----------+---------------------+------------ Bangladesh | Azimpur | BD | 19 | 8454 | 23.7298 | 90.3854 | 13 | 771 | Dhaka District | null Bangladesh | Badarganj | BD | 19 | 8455 | 25.67419 | 89.05377 | 55 | 759 | Rangpur District | null Bangladesh | Bagerhat | BD | 19 | 8456 | 22.4 | 89.75 | 27 | 811 | Khulna District | null Bangladesh | Bandarban | BD | 19 | 8457 | 22 | 92.33333 | B | 803 | Chittagong Division | null Bangladesh | Baniachang | BD | 19 | 8458 | 24.51863 | 91.35787 | 60 | 767 | Sylhet District | null Bangladesh | Barguna | BD | 19 | 8459 | 22.13333 | 90.13333 | 06 | 818 | Barisal District | null Bangladesh | Barisal | BD | 19 | 8460 | 22.8 | 90.5 | 06 | 818 | Barisal District | null Bangladesh | Bera | BD | 19 | 8462 | 24.07821 | 89.63262 | 54 | 813 | Rajshahi District | null Bangladesh | Bhairab Bāzār | BD | 19 | 8463 | 24.0524 | 90.9764 | 13 | 771 | Dhaka District | null Bangladesh | Bherāmāra | BD | 19 | 8464 | 24.02452 | 88.99234 | 27 | 811 | Khulna District | null Bangladesh | Bhola | BD | 19 | 8465 | 22.36667 | 90.81667 | 06 | 818 | Barisal District | null Bangladesh | Bhāndāria | BD | 19 | 8466 | 22.48898 | 90.06273 | 06 | 818 | Barisal District | null Bangladesh | Bhātpāra Abhaynagar | BD | 19 | 8467 | 23.01472 | 89.43936 | 27 | 811 | Khulna District | null Bangladesh | Bibir Hat | BD | 19 | 8468 | 22.68347 | 91.79058 | B | 803 | Chittagong Division | null Bangladesh | Bogra | BD | 19 | 8469 | 24.78333 | 89.35 | 54 | 813 | Rajshahi District | null Bangladesh | Brahmanbaria | BD | 19 | 8470 | 23.98333 | 91.16667 | B | 803 | Chittagong Division | null Bangladesh | Burhānuddin | BD | 19 | 8471 | 22.49518 | 90.72391 | 06 | 818 | Barisal District | null Bangladesh | Bājitpur | BD | 19 | 8472 | 24.21623 | 90.95002 | 13 | 771 | Dhaka District | null Bangladesh | Chandpur | BD | 19 | 8474 | 23.25 | 90.83333 | B | 803 | Chittagong Division | null Bangladesh | Chapai Nababganj | BD | 19 | 8475 | 24.68333 | 88.25 | 54 | 813 | Rajshahi District | null (20 rows)
-
Count the loaded data:
astra db dsbulk count dsbulk_demo_db -k dsbulk_demo_keyspace -t cities_by_country
Result
[INFO] RUNNING: /Users/USERNAME/.astra/dsbulk-1.11.0/bin/dsbulk count -u token -p AstraCS:FZm... -b /Users/USERNAME/.astra/scb/scb_91b35105-a5aa-4cd5-a93b-900ac58452ba_us-east1.zip -k dsbulk_demo_keyspace -t cities_by_country -logDir ./logs --log.verbosity normal --schema.allowMissingFields true -maxConcurrentQueries AUTO [INFO] DSBulk is starting please wait ... Username and password provided but auth provider not specified, inferring PlainTextAuthProvider A cloud secure connect bundle was provided: ignoring all explicit contact points. Operation directory: /Users/USERNAME/logs/COUNT_20250123-021120-127216 total | failed | rows/s | p50ms | p99ms | p999ms 134,574 | 0 | 65,674 | 213.18 | 329.25 | 329.25 Operation COUNT_20250123-021120-127216 completed successfully in 1 second. Checkpoints for the current operation were written to checkpoint.csv. To resume the current operation, re-run it with the same settings, and add the following command line flag: --dsbulk.log.checkpoint.file=/Users/USERNAME/logs/COUNT_20250123-021120-127216/checkpoint.csv 134574
-
Unload the data into CSV files:
astra db dsbulk unload dsbulk_demo_db -k dsbulk_demo_keyspace -t cities_by_country --url unloaded_data
Result
[INFO] RUNNING: /Users/USERNAME/.astra/dsbulk-1.11.0/bin/dsbulk unload -u token -p AstraCS:FZm... -b /Users/USERNAME/.astra/scb/scb_91b35105-a5aa-4cd5-a93b-900ac58452ba_us-east1.zip -k dsbulk_demo_keyspace -t cities_by_country -logDir ./logs --log.verbosity normal --schema.allowMissingFields true -maxConcurrentQueries AUTO -delim , -url unloaded_data -header true -encoding UTF-8 -skipRecords 0 -maxErrors 100 [INFO] DSBulk is starting please wait ... Username and password provided but auth provider not specified, inferring PlainTextAuthProvider A cloud secure connect bundle was provided: ignoring all explicit contact points. Operation directory: /Users/USERNAME/logs/UNLOAD_20250123-021231-557959 total | failed | rows/s | p50ms | p99ms | p999ms 134,574 | 0 | 29,767 | 281.25 | 591.40 | 591.40 Operation UNLOAD_20250123-021231-557959 completed successfully in 4 seconds. Checkpoints for the current operation were written to checkpoint.csv. To resume the current operation, re-run it with the same settings, and add the following command line flag: --dsbulk.log.checkpoint.file=/Users/USERNAME/logs/UNLOAD_20250123-021231-557959/checkpoint.csv
This command unloads row data from the
cities_by_countrytable, and stores it as CSV files within a subsirectory namesunloaded_datain the same directory where you ran the command.