Skip to content

Managed CRD Feedback #4607

@jack-berg

Description

@jack-berg

Component(s)

auto-instrumentation

Describe the issue you're reporting

I've involved in a number of different otel projects that are / will be coming together in the operator:

  • OpenTelemetry java agent
  • Declarative config project
  • OpenTelemetry injector project

@atoulme recently pointed me to the Managed CRD RFC and the initial draft PR #4475. I like the idea here, and have some feedback on the configuration data model.

The configuration data model of this CRD is arguably the most important aspect of this project. IMO, a successful configuration data model design would have the following characteristics:

  • It doesn't invent a new SDK configuration schema. The spec defines the env var and declarative config schemes. The operator should not be in the business of inventing a new one. Take it from someone who has spent years of their life on declarative config - you don't want this burden!
  • It meets the various SDKs where they are in terms of config interfaces. The env var scheme is widely supported, terse, but has limited expressiveness. Declarative configuration is up-and-coming, more verbose, and more expressive. The operator should allow users to express config using whatever interface they see fit based on requirements and language implementation status.
  • It should be simple to express simple things, and possible to express complex things. In an ideal world, you write one config applicable for all languages and all apps. Reality is messier, often requiring language specific, namespace specific, or app specific configuration. The operator config data model should reflect this reality.
  • It should be evolvable. We don't know all the requirements right now and shouldn't need to. With some care and foresight, we should be able to design the data model to be more expressive over time without becoming a big ball of mud.

With that criteria in mind, here's a rough sketch of a config data model which I think points us in the right direction:

  • .spec.sdkConfigurationRules (object[]): An array of selector / configuration pairs. For each pod created, the operator evaluates these rules and applies the first matching. If none match, the operator does not install instrumentation.
    • .spec.sdkConfigurationRules[].selector (object): Selection criteria for a rule. Conditions are ANDed together.
      • .spec.sdkConfigurationRules[].selector.languages (string[]): Selector matches if a pod's language is in the list. If omitted, match all languages.
      • .spec.sdkConfigurationRules[].selector.namespace (string[]): Selector matches if a pod's namespace is in the list. If omitted, match all namespaces.
      • .spec.sdkConfigurationRules[].selector.labels (object[]): Selector matches if a pod's labels match ALL the key value pairs in the list. If omitted, match all labels.
    • .spec.sdkConfigurationRules.envConfig (object[]): Env variables to be injected into the pods matching this rule.
    • .spec.sdkConfigfigurationRules.declarativeConfig (object): Declarative config to be applied to pods matching this rule. The content of this config is mounted to a file in the filesystem (e.g. /path/to/otel-config.yaml) and the standard declarative config env var is set to reference this path: OTEL_EXPERIMENTAL_CONFIG_FILE=/path/to/otel-config.yaml).

And here's an example YAML snippet to demonstrate this proposal. Here I configure the CRD to install opentelemetry python instrumentation on python applications configuring the SDK using the env var config scheme, and to install the opentelemetry java agent on java applications configuration the SDK using the declarative config scheme. All other language types are ignored. I could just as easily choose to have a single entry for spec.sdkConfigurationRules, or have as many as needed to suit my needs.

apiVersion: opentelemetry.io/v1alpha1
kind: ClusterObservability
metadata:
  name: cluster-observability
  namespace: opentelemetry-operator-system
spec:
  sdkConfigurationRules:
    - selector:
        languages:
          - python
      envConfig:
        - name: OTEL_SAMPLER
          value: parentbased_always_on
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: http://collector:4318
        - name: OTEL_PROPAGATORS
          value: baggage,tracecontext
    - selector:
        languages:
          - java
      declarativeConfig:
        file_format: "1.0-rc.3"
        resource:
          detection/development:
            detectors:
              - service:
              - host:
              - process:
              - container:
        propagator:
          composite:
            - tracecontext:
            - baggage:
          sampler:
            parent_based:
              root:
                always_on:
          processors:
            - batch:
                exporter:
                  otlp_http:
                    endpoint: http://localhost:4318/v1/traces
        meter_provider:
          readers:
            - periodic:
                exporter:
                  otlp_http:
                    endpoint: http://localhost:4318/v1/metrics
        logger_provider:
          processors:
            - batch:
                exporter:
                  otlp_http:
                    endpoint: http://localhost:4318/v1/logs

A key part of the design is that .spec.sdkConfigurationRules is an array of predicate / config pairs. This results in simple config for simple config for simple cases, while still allowing you to express complex cases. Ideally, one day we'll wake up in a couple of years and every language will support declarative configuration and every language will have quality stable instrumentation. In this future, you could express a single rule for all languages and all applications.

For now, we want to meet SDKs where they are in terms of config interface support. For languages that support declarative config, users should be able to leverage the increased expressiveness. For language that don't you should still be able to use env vars.

And even without the declarative config / env var config interface distinction, this pattern is still useful for handling the messy reality that not all instrumentation config will be the same.

That's my piece. To repeat the important bits:

  • The operator shouldn't invent a new SDK config interface.
  • The operator should meet the various SDKs where they are in terms of config interface support.
  • It should be simple to express simple things, and possible to express complex things.
  • The operator data model should be evolvable.

I'm not attached to the proposed data model scheme I proposed above (in particular, I don't care about the names of properties), but do think that this high level evaluation criteria is a good list.

cc @atoulme who wrote the "Manged CRD" RFC, @jinja2 who is working on it in #4475.

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:auto-instrumentationIssues for auto-instrumentationdiscuss-at-sigThis issue or PR should be discussed at the next SIG meeting

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions