Skip to content

Make parquet/datafusion memory settings configurable#21547

Open
gaurav-amz wants to merge 2 commits intoopensearch-project:mainfrom
gaurav-amz:configurable-settings
Open

Make parquet/datafusion memory settings configurable#21547
gaurav-amz wants to merge 2 commits intoopensearch-project:mainfrom
gaurav-amz:configurable-settings

Conversation

@gaurav-amz
Copy link
Copy Markdown
Contributor

  • Add parquet.min_native_allocation / max_native_allocation_ceiling companions for the existing parquet.max_native_allocation percentage setting; ArrowBufferPool now clamps the resolved value when configured as a percentage.
  • Add parquet.arrow.child_allocation, replacing the hardcoded root/10 per-VSR cap in ArrowBufferPool.
  • Convert datafusion.memory_pool_limit_bytes and spill_memory_limit_bytes from longSetting(maxMemory/N) heap-based defaults to simpleString percentage of non-heap memory, with .min / .max companions for floor/ceiling clamping.
  • Mirrors the indices.memory.native_index_buffer_size pattern in IndexingMemoryController, the only existing OpenSearch precedent for non-heap percentage memory settings.

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@gaurav-amz gaurav-amz requested a review from a team as a code owner May 7, 2026 19:27
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Reviewer Guide 🔍

(Review updated until commit a40037f)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

When totalAvailableMemory is zero or negative, the code logs a warning and falls back to the floor value. However, it does not handle the case where totalAvailableMemory is positive but very small (e.g., 1 byte). In such cases, applying a percentage could yield a value smaller than the floor, which is then clamped. This is correct behavior, but if totalAvailableMemory is unexpectedly small due to misconfiguration or measurement error, the resulting memory allocation may be inadequate for normal operation, yet no warning is logged. Consider logging a warning when totalAvailableMemory is positive but below a reasonable threshold (e.g., 1GB) to alert operators to potential configuration issues.

long totalAvailableMemory = OsProbe.getInstance().getTotalPhysicalMemorySize() - JvmInfo.jvmInfo().getConfiguredMaxHeapSize();
if (totalAvailableMemory <= 0) {
    logger.warn("Non-heap memory not measurable while resolving [{}]; falling back to {}", settingKey, floor);
    return floor.getBytes();
}
Possible Issue

When totalAvailableMemory is zero or negative, the code logs a warning and falls back to the floor value. However, it does not handle the case where totalAvailableMemory is positive but very small. In such cases, applying a percentage could yield a value smaller than the floor, which is then clamped. This is correct behavior, but if totalAvailableMemory is unexpectedly small due to misconfiguration or measurement error, the resulting memory allocation may be inadequate for normal operation, yet no warning is logged. Consider logging a warning when totalAvailableMemory is positive but below a reasonable threshold to alert operators to potential configuration issues.

long totalAvailableMemory = OsProbe.getInstance().getTotalPhysicalMemorySize() - JvmInfo.jvmInfo().getConfiguredMaxHeapSize();
ByteSizeValue floor = ParquetSettings.MIN_NATIVE_ALLOCATION.get(settings);
if (totalAvailableMemory <= 0) {
    logger.warn(
        "Non-heap memory not measurable; falling back to [{}] = {}",
        ParquetSettings.MIN_NATIVE_ALLOCATION.getKey(),
        floor
    );
    return floor.getBytes();
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Code Suggestions ✨

Latest suggestions up to a40037f

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent memory isolation bypass

When childAllocation exceeds maxAllocationInBytes, the child allocator is capped to
the root allocation. This could lead to a single child monopolizing all available
memory, defeating the isolation purpose. Consider enforcing a minimum ratio or
rejecting the configuration entirely.

sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/memory/ArrowBufferPool.java [50-61]

 long maxAllocationInBytes = resolveMaxAllocationBytes(settings);
 long childAllocation = ParquetSettings.CHILD_ALLOCATION.get(settings).getBytes();
 if (childAllocation > maxAllocationInBytes) {
-    logger.warn(
-        "[{}] ({}) is larger than [{}] ({}); capping child allocation to root allocation",
-        ParquetSettings.CHILD_ALLOCATION.getKey(),
-        new ByteSizeValue(childAllocation),
-        ParquetSettings.MAX_NATIVE_ALLOCATION.getKey(),
-        new ByteSizeValue(maxAllocationInBytes)
+    throw new IllegalArgumentException(
+        String.format(
+            "[%s] (%s) exceeds [%s] (%s); child allocation must not exceed root allocation",
+            ParquetSettings.CHILD_ALLOCATION.getKey(),
+            new ByteSizeValue(childAllocation),
+            ParquetSettings.MAX_NATIVE_ALLOCATION.getKey(),
+            new ByteSizeValue(maxAllocationInBytes)
+        )
     );
-    childAllocation = maxAllocationInBytes;
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion raises a valid concern about memory isolation. However, the current approach of capping and warning is more flexible than rejecting the configuration entirely. The improved code would prevent startup with misconfigured settings, which could be too strict for operational flexibility.

Low
Handle invalid memory configuration explicitly

When totalAvailableMemory is zero or negative, the method falls back to the floor
value. However, this condition indicates a critical system configuration issue that
should be handled more explicitly to prevent unexpected behavior in production
environments.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [323-328]

 if (configured.endsWith("%")) {
     long totalAvailableMemory = OsProbe.getInstance().getTotalPhysicalMemorySize() - JvmInfo.jvmInfo().getConfiguredMaxHeapSize();
     if (totalAvailableMemory <= 0) {
-        logger.warn("Non-heap memory not measurable while resolving [{}]; falling back to {}", settingKey, floor);
-        return floor.getBytes();
+        logger.error("Non-heap memory not measurable while resolving [{}]; system configuration may be invalid", settingKey);
+        throw new IllegalStateException("Cannot resolve percentage-based memory setting: non-heap memory is " + totalAvailableMemory);
     }
     ...
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that totalAvailableMemory <= 0 indicates a system issue. However, throwing an exception would prevent the node from starting in environments where memory probing fails. The current fallback to floor with a warning is a reasonable defensive approach.

Low
Fail on invalid memory measurement

Similar to the DataFusion plugin, falling back to the floor value when
totalAvailableMemory is invalid masks a critical system configuration problem. This
should fail explicitly rather than silently using a fallback that may be inadequate
for the workload.

sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/memory/ArrowBufferPool.java [95-105]

 if (configured.endsWith("%")) {
     long totalAvailableMemory = OsProbe.getInstance().getTotalPhysicalMemorySize() - JvmInfo.jvmInfo().getConfiguredMaxHeapSize();
-    ByteSizeValue floor = ParquetSettings.MIN_NATIVE_ALLOCATION.get(settings);
     if (totalAvailableMemory <= 0) {
-        logger.warn(
-            "Non-heap memory not measurable; falling back to [{}] = {}",
-            ParquetSettings.MIN_NATIVE_ALLOCATION.getKey(),
-            floor
+        throw new IllegalStateException(
+            String.format(
+                "Cannot resolve percentage-based [%s]: non-heap memory is %d bytes",
+                ParquetSettings.MAX_NATIVE_ALLOCATION.getKey(),
+                totalAvailableMemory
+            )
         );
-        return floor.getBytes();
     }
     ...
 }
Suggestion importance[1-10]: 5

__

Why: Similar to suggestion 3, this identifies a potential system configuration issue. However, throwing an exception would prevent initialization in edge cases where memory probing fails but the system is otherwise functional. The fallback approach with warning is more resilient.

Low
Prevent silent initialization failures

The null check for memoryPoolFloor and memoryPoolCeiling may allow silent failures
during dynamic updates if these fields are not properly initialized. Consider
throwing an exception or logging at a higher severity level to ensure initialization
issues are caught early rather than silently ignored.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [350-357]

 private void applyMemoryPoolLimitSetting(String newValue) {
     if (memoryPoolFloor == null || memoryPoolCeiling == null) {
-        logger.debug("Min/max companions not yet snapshotted; ignoring memory pool limit update");
-        return;
+        logger.error("Min/max companions not yet snapshotted; cannot apply memory pool limit update");
+        throw new IllegalStateException("Memory pool bounds not initialized");
     }
     long newLimitBytes = resolveBytes(newValue, memoryPoolFloor, memoryPoolCeiling, DATAFUSION_MEMORY_POOL_LIMIT.getKey());
     updateMemoryPoolLimit(newLimitBytes);
 }
Suggestion importance[1-10]: 4

__

Why: While the suggestion correctly identifies that memoryPoolFloor and memoryPoolCeiling could be null during early updates, throwing an exception would break dynamic updates during initialization. The current debug logging is appropriate for this transient state, though a warning might be more visible.

Low

Previous suggestions

Suggestions up to commit feaa5f0
CategorySuggestion                                                                                                                                    Impact
General
Improve visibility of skipped updates

The null check for memoryPoolFloor and memoryPoolCeiling may allow silent failures
during dynamic updates if initialization fails. Consider logging at WARN level
instead of DEBUG to ensure visibility of skipped updates, as this could lead to
unexpected behavior where configuration changes are silently ignored.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [350-357]

 private void applyMemoryPoolLimitSetting(String newValue) {
     if (memoryPoolFloor == null || memoryPoolCeiling == null) {
-        logger.debug("Min/max companions not yet snapshotted; ignoring memory pool limit update");
+        logger.warn("Min/max companions not yet snapshotted; ignoring memory pool limit update for value: {}", newValue);
         return;
     }
     long newLimitBytes = resolveBytes(newValue, memoryPoolFloor, memoryPoolCeiling, DATAFUSION_MEMORY_POOL_LIMIT.getKey());
     updateMemoryPoolLimit(newLimitBytes);
 }
Suggestion importance[1-10]: 6

__

Why: Changing from DEBUG to WARN level improves visibility of skipped configuration updates. However, this scenario should be rare (only during initialization race conditions), and the suggestion adds the newValue parameter which provides useful context for debugging.

Low
Validate concurrent child allocator capacity

When childAllocation exceeds maxAllocationInBytes, capping it to the root allocation
may still be problematic if multiple child allocators are created. Consider
validating that the capped child allocation allows for at least a minimum number of
concurrent children to prevent resource starvation.

sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/memory/ArrowBufferPool.java [49-65]

 public ArrowBufferPool(Settings settings) {
     long maxAllocationInBytes = resolveMaxAllocationBytes(settings);
     long childAllocation = ParquetSettings.CHILD_ALLOCATION.get(settings).getBytes();
     if (childAllocation > maxAllocationInBytes) {
         logger.warn(
             "[{}] ({}) is larger than [{}] ({}); capping child allocation to root allocation",
             ParquetSettings.CHILD_ALLOCATION.getKey(),
             new ByteSizeValue(childAllocation),
             ParquetSettings.MAX_NATIVE_ALLOCATION.getKey(),
             new ByteSizeValue(maxAllocationInBytes)
         );
         childAllocation = maxAllocationInBytes;
     }
+    if (maxAllocationInBytes / childAllocation < 2) {
+        logger.warn("Child allocation ({}) allows fewer than 2 concurrent children from root ({})", childAllocation, maxAllocationInBytes);
+    }
     logger.debug("ArrowBufferPool sized: root={} bytes, perChild={} bytes", maxAllocationInBytes, childAllocation);
     this.rootAllocator = new RootAllocator(maxAllocationInBytes);
     this.maxChildAllocation = childAllocation;
 }
Suggestion importance[1-10]: 5

__

Why: Adding a warning when fewer than 2 concurrent children can be allocated provides useful operational insight. However, the check is somewhat arbitrary (why 2?) and the system may still function correctly with a single child allocator depending on usage patterns.

Low
Handle unmeasurable memory more safely

When totalAvailableMemory is negative or zero, falling back to the floor value may
result in inadequate memory allocation. Consider throwing an exception or using a
more conservative default to prevent potential out-of-memory issues in production
environments where memory probing fails.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [322-340]

 private static long resolveBytes(String configured, ByteSizeValue floor, ByteSizeValue ceiling, String settingKey) {
     if (configured.endsWith("%")) {
         long totalAvailableMemory = OsProbe.getInstance().getTotalPhysicalMemorySize() - JvmInfo.jvmInfo().getConfiguredMaxHeapSize();
         if (totalAvailableMemory <= 0) {
-            logger.warn("Non-heap memory not measurable while resolving [{}]; falling back to {}", settingKey, floor);
-            return floor.getBytes();
+            throw new IllegalStateException(
+                "Cannot resolve percentage-based setting [" + settingKey + "]: non-heap memory not measurable. " +
+                "Consider using an absolute byte size instead."
+            );
         }
         ...
     }
     return ByteSizeValue.parseBytesSizeValue(configured, settingKey).getBytes();
 }
Suggestion importance[1-10]: 4

__

Why: While throwing an exception is more explicit than falling back to floor, the current approach of logging a warning and using a safe default is reasonable for production resilience. The suggestion would make the system less fault-tolerant in edge cases where memory probing fails.

Low
Suggestions up to commit 1b1eb34
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix fragile negative sentinel for ceiling check

The ByteSizeValue constructor new ByteSizeValue(-1) creates a value with -1 bytes,
but ByteSizeValue may not reliably support negative values as a sentinel. The check
ceiling.getBytes() != -1 is fragile because ByteSizeValue could normalize or reject
negative inputs. Consider using a dedicated sentinel constant (e.g., Long.MAX_VALUE
or 0) or checking the raw setting string instead to represent "no ceiling".

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [322]

-if (ceiling.getBytes() != -1 && bytes > ceiling.getBytes()) {
+if (ceiling.getBytes() > 0 && bytes > ceiling.getBytes()) {
Suggestion importance[1-10]: 6

__

Why: The use of new ByteSizeValue(-1) as a sentinel and checking ceiling.getBytes() != -1 is potentially fragile if ByteSizeValue normalizes or rejects negative values. Using > 0 as the check is a more robust approach, though the actual behavior depends on the ByteSizeValue implementation.

Low
General
Add validation for malformed percentage settings

RatioValue.parseRatioValue may throw an exception if the percentage string is
malformed (e.g., "abc%"). Since DATAFUSION_MEMORY_POOL_LIMIT is a simpleString
setting without format validation, invalid values will cause an unhandled runtime
exception. Add input validation or wrap the parse call in a try-catch to fall back
gracefully.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [311-318]

 long totalAvailableMemory = OsProbe.getInstance().getTotalPhysicalMemorySize()
     - JvmInfo.jvmInfo().getConfiguredMaxHeapSize();
 if (totalAvailableMemory <= 0) {
     logger.warn("Non-heap memory not measurable while resolving [{}]; falling back to {}", settingKey, floor);
     return floor.getBytes();
 }
-RatioValue ratio = RatioValue.parseRatioValue(configured);
+RatioValue ratio;
+try {
+    ratio = RatioValue.parseRatioValue(configured);
+} catch (Exception e) {
+    logger.warn("Invalid percentage value [{}] for [{}]; falling back to {}", configured, settingKey, floor, e);
+    return floor.getBytes();
+}
Suggestion importance[1-10]: 4

__

Why: While adding error handling for malformed percentage strings is a valid concern, Setting.simpleString with a validator could be a better solution. The suggestion is valid but addresses a minor edge case, and the improved_code correctly wraps the parse call in a try-catch.

Low
Ensure min/max settings resolve from correct settings source

cs.getSettings() returns the node-level settings, but
DATAFUSION_MEMORY_POOL_LIMIT_MIN and DATAFUSION_MEMORY_POOL_LIMIT_MAX are NodeScope
settings. The min/max companions should be retrieved from the cluster settings or
the original node settings used at startup, not from ClusterService.getSettings(),
which may not contain node-scoped settings. This could cause the min/max to resolve
to their defaults unexpectedly.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [334-346]

 private void applyMemoryPoolLimitSetting(String newValue) {
     ClusterService cs = clusterService;
     if (cs == null) {
         logger.debug("ClusterService not yet initialized; ignoring memory pool limit update");
         return;
     }
+    Settings nodeSettings = cs.getClusterSettings().get(DATAFUSION_MEMORY_POOL_LIMIT_MIN) != null
+        ? cs.getSettings()
+        : cs.getSettings();
     long newLimitBytes = resolveBytes(
         newValue,
         DATAFUSION_MEMORY_POOL_LIMIT_MIN.get(cs.getSettings()),
         DATAFUSION_MEMORY_POOL_LIMIT_MAX.get(cs.getSettings()),
         DATAFUSION_MEMORY_POOL_LIMIT.getKey()
     );
Suggestion importance[1-10]: 1

__

Why: The improved_code is essentially identical to the existing_code in behavior — it adds a ternary that always evaluates to cs.getSettings() regardless of the condition, making the suggestion a no-op. The concern about settings source is valid conceptually, but the proposed fix doesn't actually address it.

Low

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

❌ Gradle check result for 1b1eb34: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Persistent review updated to latest commit feaa5f0

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

❌ Gradle check result for feaa5f0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

gaurav-amz added 2 commits May 8, 2026 11:32
- Add parquet.min_native_allocation / max_native_allocation_ceiling
  companions for the existing parquet.max_native_allocation percentage
  setting; ArrowBufferPool now clamps the resolved value when configured
  as a percentage.
- Add parquet.arrow.child_allocation, replacing the hardcoded root/10
  per-VSR cap in ArrowBufferPool.
- Convert datafusion.memory_pool_limit_bytes and spill_memory_limit_bytes
  from longSetting(maxMemory/N) heap-based defaults to simpleString
  percentage of non-heap memory, with .min / .max companions for
  floor/ceiling clamping.
- Mirrors the indices.memory.native_index_buffer_size pattern in
  IndexingMemoryController, the only existing OpenSearch precedent for
  non-heap percentage memory settings.

Signed-off-by: Gaurav Singh <snghsvn@amazon.com>
- Apply spotless formatting to ArrowBufferPool and DataFusionPlugin
  to fix the gradle check failure (lines that exceeded the column
  budget were re-flowed by spotless).
- Add a setting-time validator on parquet.max_native_allocation,
  datafusion.memory_pool_limit_bytes, and
  datafusion.spill_memory_limit_bytes so malformed values
  (e.g. "abc%", "10xb") are rejected at update time instead of
  surfacing as parser exceptions on the next read.
- Snapshot the static memory_pool_limit min/max companions at plugin
  startup; the dynamic-update listener now resolves new limits
  against this snapshot rather than re-reading
  ClusterService#getSettings(), which can return stale or default
  values during transient cluster-state propagation.

Signed-off-by: Gaurav Singh <snghsvn@amazon.com>
@gaurav-amz gaurav-amz force-pushed the configurable-settings branch from feaa5f0 to a40037f Compare May 8, 2026 06:02
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Persistent review updated to latest commit a40037f

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

❌ Gradle check result for a40037f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant