Skip to content

feat: Add support for Run-End Encoded arrays (REE) in Arrow .NET#260

Open
JorgeCandeias wants to merge 2 commits intoapache:mainfrom
JorgeCandeias:feature/run-end-encoding
Open

feat: Add support for Run-End Encoded arrays (REE) in Arrow .NET#260
JorgeCandeias wants to merge 2 commits intoapache:mainfrom
JorgeCandeias:feature/run-end-encoding

Conversation

@JorgeCandeias
Copy link

This PR attempts to add support for Run-End Encoded arrays by following established codebase patterns.

Notably:

  • New ArrowTypeId added.
  • New array type RunEndEncodedArray added.
  • New visitor method to handle the new array type.
  • New entry in the IPC serializer field type switch.
  • New RunEndEncodedType nested type.
  • Basic feature tests.

This PR is missing targeted performance benchmarks as I could not see an established structure to add them into. Please let me know if, where and how you would like me to create these. They could be useful to observe performance variance from various data sparsity patterns and decide what to optimise.

Please let me know if this PR is missing anything else.

Introduced RunEndEncodedType and RunEndEncodedArray classes to represent run-end encoded arrays, including validation and logical length calculation. Integrated REE support into ArrowArrayFactory and IPC serialization/deserialization (ArrowStreamWriter, ArrowReaderImplementation, ArrowTypeFlatbufferBuilder, MessageSerializer). Added unit tests for REE array creation, validation, serialization, and indexing. This enables efficient handling of consecutive runs of the same value in Arrow .NET.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments