What happens?
Original issue here: https://github.com/duckdblabs/duckdb-internal/issues/6733
duckdb/duckdb#20171 introduced a new table function sql_tokenize, which takes as input the query and returns the tokens and their category.
Currently in the python client we use a custom implementation somewhere near PyTokenize (see here). I think this can now be moved over to use the table function instead, making it a bit simpler, and possible to provide more information about tokenization in the future.
To Reproduce
See above
OS:
MacOS
DuckDB Package Version:
latest
Python Version:
3.14?
Full Name:
Daniel ten Wolde
Affiliation:
DuckDB Labs
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?