Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WFLY-20296 - Add support for more OpenTelemetry Config options #682

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jasondlee
Copy link
Contributor

@jasondlee jasondlee commented Jan 27, 2025

https://issues.redhat.com/browse/WFLY-20296

Add analysis document

Resolves #683

@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Jan 27, 2025
@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Jan 27, 2025
* Additional configuration options should include:
** General
*** TLS server certificate (`otel.exporter.otlp.certificate`)
*** TLS client certificate (`otel.exporter.otlp.client.certificate`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please discuss with the security team how this kind of thing should be represented; i.e. if we should provide deeper integration with existing Elytron services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of data would attributes for these take?

In general we are trying to centralise configuration related to SSLContexts into the Elytron subsystem so users and administrators don't need to learn how to confiure TLS related resources independently on each separate subsystem,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value passed to the underlying library is the full path to the certificates (and keys) used by the OpenTelemetry Collector to configure TLS on its end. I guess it's possible that we switch the perspective here: rather than pointing WF to what the Collector is using to configure itself, we point the Collector to what WF is using to configure itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I am following the inversion.

Where you say OpenTelemetry Collector I assume you are talking about some remote process we are establishing a connection to?

Copy link
Contributor Author

@jasondlee jasondlee Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's a process that runs in another process (or even data center). In my head, the plan is currently to point the OpenTelemetry subsystem at the certificate and key files known to and use by the OpenTelemetry Collector. What I mean by the inversion is that rather than point from inside the app servers config (and FS area) to files owned/managed by the Collector, that could instead point the Collector to cert/key files owned and managed by WildFly. Perhaps I was being too cute. :P I was just thinking "out loud" about how a WildFly SSL context might used to configure an external service. Ultimately, it's going to come down to each system sharing one or more files, regardless of who "owns" those files.

@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Jan 27, 2025
@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Feb 10, 2025
@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Feb 12, 2025
@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Feb 12, 2025
@github-actions github-actions bot removed the stability-level/community "Community" stability level label Feb 12, 2025
@github-actions github-actions bot added the stability-level/community "Community" stability level label Feb 12, 2025
@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Feb 12, 2025
@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Feb 19, 2025
@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Feb 24, 2025
@github-actions github-actions bot added stability-level/community "Community" stability level and removed stability-level/community "Community" stability level labels Feb 26, 2025

=== User Stories

- As an application developer, I would like greater control in how OpenTelemetry is configured in the server without having to depend on an external library (e.g., MicroProfile Config).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, it's not clear to me how exactly it affects the developer to rely on MP Config to configure opentelemetry.

My understanding is that the MP Config way simply means that I would e.g. define a system property via -Dotel.service.name=myname (tried and it works), or I could create "permanent" system property in standalone.xml. Perhaps I can also package an application.properties file into my deployment (haven't tried that).

Nothing else is needed on the application side (no special dependency, etc). Am I correct? The "external library" is a detail not visible to the app developer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if I wanted to configure via programmatic API, I would need to add a dependency to the app?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more had in mind the server administrator, as the MP Config extension/subsystem would have to be enabled to make it work. I'll try to clean up that wording there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I get what you meant now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to focusing on an administrator.

Beyond not 'depending on a library' (or setting system properties) in terms of a programming API, someone using this is depending on us for a robust, long-term-compatible API. They are depending on us to insulate them from underlying changes in OTel. If OTel renames one of their props, or whatever, in a future release, that is our problem, not the user's problem. They count on us to shield them from such things. That is what the WildFly management API does for people who use the many libraries we integrate.

So, think very hard about what you expose, as you'll be living with it for a long time. :-)

@TomasHofman
Copy link
Contributor

You probably have reasons not to do it that way, but I will ask :):

Was it considered to introduce a "properties" section in the opentelemetry config model, which would allow users to specify otel properties directly as they are in the otel spec (same keys & values), rather than introducing custom named attributes?

The advantages would be:

  • the administrator would not need to investigate how Wildfly attributes map to OTel properties,
  • it would (maybe?) allow to configure full set of OTel properties, without extensive changes to the management model,
  • it might be more accommodating to potential new attributes added in future.

@jasondlee
Copy link
Contributor Author

You probably have reasons not to do it that way, but I will ask :):

Was it considered to introduce a "properties" section in the opentelemetry config model, which would allow users to specify otel properties directly as they are in the otel spec (same keys & values), rather than introducing custom named attributes?

I have not thought about that, to be honest, but, having done so a bit now, I'm not sure I'm real excited about that. You're probably right the current approach puts some onus on the admin to map attributes to properties, but I hope the attribute names make it clear enough what they do so the admin doesn't need to know about the mapping. I think more importantly, depending too heavily on an assortment of properties defined elsewhere makes the management interface vague and opaque. Using clearly defined management model attributes for clarity coupled with the properties-based approach via MP Config (or sysprops) gives us a nice middle ground, albeit at the expense of some complexity (like this PR) in adding new, explicit attribute support. I can be persuaded otherwise, but I think that's my take. :)

*** Attribute groups will be used to group properties by signal type to make the XML file and admin console more user-friendly
** Shared properties (e.g., the OpenTelemetry Collector endpoint) should be exposed on the root element.
*** Signal-specific overrides need not be supported
* All new properties added will have, as defaults, values described in the relevant OpenTelemetry documentation (see the official docs https://opentelemetry.io/docs/languages/java/configuration/[here])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/as defaults/as initial defaults/g

Once we set a default, it's our default. It's no longer determined based on what some outside party decides. Changing it is a semi-breaking or breaking change, depending on what the attribute configures. (We can change accept that and change the default but we should not promise that we will always do what some outside party said.)

|Metrics|Enabled|`otel.metrics.exporter`|Are metrics enabled?|`metrics-enabled`|true|If true, the value is set to `otlp`. If not, the value is set to `none`.
||Export interval|`otel.metrics.export.interval`|The interval, in milliseconds, between the start of two export attempts.|`metrics-export-interval`|60000|
||Exemplar filter|`otel.metrics.exemplar.filter`|The filter for exemplar sampling.|`exemplar-filter`|`trace_based`|Options are `always_off`, `always_on` or `trace_based`.
||Metrics cardinality limit|`otel.experimental.metrics. cardinality.limit`|If set, configure cardinality limit; the maximum number of distinct points per metric.|cardinality-limit|2000|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a -1 from me.

If it's 'experimental' to OTel it should not be in the WF management API, except in it's own low-stability-variant of the management API.

* Additional configuration options should include:
|===
|Signal type|Name|OpenTelemetry Property|Description|Attribute Name|Default|Notes
|General|TLS server certificate|`otel.exporter.otlp. certificate`|The path to the server certificate on disk|TBD||_TLS props are pending a discussion with Elytron_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re 'pending discussion with Elytron' -- yes, for sure. A long term strategy for security configuration is a blocker for having this at community stability.

To me one of the value adds of having a subsytem is it does proper cross-subsystem integration for key concerns like managing TLS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on this implementation in consultation with Elytron at this very moment.

|General|TLS server certificate|`otel.exporter.otlp. certificate`|The path to the server certificate on disk|TBD||_TLS props are pending a discussion with Elytron_
||TLS client certificate|`otel.exporter.otlp.client. certificate`|The path to the client certificate on disk|TBD||
||Client key|`otel.exporter.otlp.client. key`|The path to the key on disk|TBD||
||Compression|`otel.exporter.otlp. compression`|The compression type to use on OTLP trace, metric, and log requests.|`compression`|null|Only `gzip` supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does no value mean no compression?

||TLS client certificate|`otel.exporter.otlp.client. certificate`|The path to the client certificate on disk|TBD||
||Client key|`otel.exporter.otlp.client. key`|The path to the key on disk|TBD||
||Compression|`otel.exporter.otlp. compression`|The compression type to use on OTLP trace, metric, and log requests.|`compression`|null|Only `gzip` supported
|Traces|Enabled|`otel.traces.exporter`|Are traces enabled?|`traces-enabled`|true|If true, the value is set to `otlp`. If not, the value is set to `none`.
Copy link
Contributor

@bstansberry bstansberry Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for when we decide to support other exporters?

I can imagine one; I just want to make sure you have too, as you'll be living with this for many years. :-)

Same applies to Metrics|Enabled and Logs|Enabled

*** Signal-specific overrides need not be supported
* All new properties added will have, as defaults, values described in the relevant OpenTelemetry documentation (see the official docs https://opentelemetry.io/docs/languages/java/configuration/[here])
* Support should not depend on, e.g., MicroProfile Telemetry or MicroProfile Config
* Any configured properties should still be able to be overridden via the existing MicroProfile support (this is the current default behavior)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The textual description of any such attribute must explicitly declare that if MicroProfile Telemetry is used the config property 'foo' takes precedence.

I hope if MP Tel is not used that system properties are not preferred to what is set via the management API. If that's not the case let's have a zulip or dev list discussion.


=== Non-Requirements

* The OpenTelemetry documentation lists several signal-specific parameters to configure each signal explicitly. The following parameters have the same defaults as those already supported (i.e., for traces), so, in the name of simplicity, these will be configured implicitly by the system using a single configuration attribute:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on the 'non-requirement' nature of this item.


== Admin Clients

N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure it's applicable. Needs a statement, even if it's just that the CLI and HAL will just work with no changes on their side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stability-level/community "Community" stability level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for more OpenTelemetry Config options
4 participants