Skip to content

MDEV-39397: Wrong result for GROUP_CONCAT(BIT(8)) in a window function#5028

Open
DaveGosselin-MariaDB wants to merge 2 commits into10.11from
10.11-MDEV-39397-view-wrong-result
Open

MDEV-39397: Wrong result for GROUP_CONCAT(BIT(8)) in a window function#5028
DaveGosselin-MariaDB wants to merge 2 commits into10.11from
10.11-MDEV-39397-view-wrong-result

Conversation

@DaveGosselin-MariaDB
Copy link
Copy Markdown
Member

This PR consists of two commits. One commit implements the fix for the bug and another commit documents and cleans up the implementation of dump_leaf_key. There should be no functional change in the latter commit.

MDEV-39397: Wrong result for GROUP_CONCAT(BIT(8)) in a window function

When DISTINCT or ORDER BY is in play, GROUP_CONCAT builds an
internal temp table and converts BIT arguments to integers so records
can be compared by memcmp.  Calling val_str() on that field
then returned the decimal form ("0", "1", "3") instead of the
packed bit bytes that Field_bit::val_str() emits everywhere else,
including the "plain" GROUP_CONCAT path.

Override get_str_from_field() for Item_func_group_concat to detect
when the argument is BIT, read the value as an integer from the
temp table field, and pack it into (max_length + 7) / 8 bytes,
mirroring Field_bit::val_str().  This makes DISTINCT and ORDER BY
agree with the plain path.

Before this change, BIT fields in group_concat were rendered
as ASCII which was incorrect and produced the wrong result as
described in the ticket.  With this fix, we also update the
existing test cases to wrap BIT columns with '+0' so the recorded
output stays in ASCII format.  Now, with this patch and the test
case from the ticket, MariaDB and MySQL produce identical resultset
of a single row, single column having value 08.

By default, a group_concat including BIT fields on MySQL returns a
hexified value for its resultset whereas MariaDB requires the
hex() function to wrap group_concat for the same result:

Setup:
  create table t1(a bit(2), b varchar(10), c bit);
  insert into t1 values (1, 'a', 0), (0, 'b', 1);

MySQL:
  mysql> select group_concat(a, c) from t1;
  +----------------------------------------+
  | group_concat(a, c)                     |
  +----------------------------------------+
  | 0x01002C0001                           |
  +----------------------------------------+
  1 row in set (0.001 sec)

MariaDB:
  MariaDB [test]> select hex(group_concat(a, c)) from t1;
  +-------------------------+
  | hex(group_concat(a, c)) |
  +-------------------------+
  | 01002C0001              |
  +-------------------------+
  1 row in set (0.001 sec)

The cleanup commit:

MDEV-39397: Document and clean up dump_leaf_key

Add a docblock for dump_leaf_key that describes each parameter
and the leaf payload layout, and documents the return values and
side effects.

Sprinkle inline comments through the body to explain the LIMIT and
OFFSET bookkeeping, the dual purpose of result_finalized, why
borrowing table->record[1] as scratch space is safe, the offset
translation for skipped null bytes, and why the blob_storage
truncation flag is cleared after raising ER_CUT_VALUE_GROUP_CONCAT.

Implement other code cleanups, too, like dropping the unused tmp2
String object and rename old_length to starting_len so the cut_max_length
call reads more directly.  Other changes include narrowing loop
variables to the loop scope, move declarations next to their
first use, and replacing the C casts with static_cast.

There should be no behavior changes at this commit.

When DISTINCT or ORDER BY is in play, GROUP_CONCAT builds an
internal temp table and converts BIT arguments to integers so records
can be compared by memcmp.  Calling val_str() on that field
then returned the decimal form ("0", "1", "3") instead of the
packed bit bytes that Field_bit::val_str() emits everywhere else,
including the "plain" GROUP_CONCAT path.

Override get_str_from_field() for Item_func_group_concat to detect
when the argument is BIT, read the value as an integer from the
temp table field, and pack it into (max_length + 7) / 8 bytes,
mirroring Field_bit::val_str().  This makes DISTINCT and ORDER BY
agree with the plain path.

Before this change, BIT fields in group_concat were rendered
as ASCII which was incorrect and produced the wrong result as
described in the ticket.  With this fix, we also update the
existing test cases to wrap BIT columns with '+0' so the recorded
output stays in ASCII format.  Now, with this patch and the test
case from the ticket, MariaDB and MySQL produce identical resultset
of a single row, single column having value 08.

By default, a group_concat including BIT fields on MySQL returns a
hexified value for its resultset whereas MariaDB requires the
hex() function to wrap group_concat for the same result:

Setup:
  create table t1(a bit(2), b varchar(10), c bit);
  insert into t1 values (1, 'a', 0), (0, 'b', 1);

MySQL:
  mysql> select group_concat(a, c) from t1;
  +----------------------------------------+
  | group_concat(a, c)                     |
  +----------------------------------------+
  | 0x01002C0001                           |
  +----------------------------------------+
  1 row in set (0.001 sec)

MariaDB:
  MariaDB [test]> select hex(group_concat(a, c)) from t1;
  +-------------------------+
  | hex(group_concat(a, c)) |
  +-------------------------+
  | 01002C0001              |
  +-------------------------+
  1 row in set (0.001 sec)
Add a docblock for dump_leaf_key that describes each parameter
and the leaf payload layout, and documents the return values and
side effects.

Sprinkle inline comments through the body to explain the LIMIT and
OFFSET bookkeeping, the dual purpose of result_finalized, why
borrowing table->record[1] as scratch space is safe, the offset
translation for skipped null bytes, and why the blob_storage
truncation flag is cleared after raising ER_CUT_VALUE_GROUP_CONCAT.

Implement other code cleanups, too, like dropping the unused tmp2
String object and rename old_length to starting_len so the cut_max_length
call reads more directly.  Other changes include narrowing loop
variables to the loop scope, move declarations next to their
first use, and replacing the C casts with static_cast.

There should be no behavior changes at this commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant