Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop Sign, Unsign, Nan & Inf for FixedDecimal #5065

Closed
younies opened this issue Jun 15, 2024 · 50 comments
Closed

Develop Sign, Unsign, Nan & Inf for FixedDecimal #5065

younies opened this issue Jun 15, 2024 · 50 comments
Assignees
Labels
2.0-breaking Changes that are breaking API changes C-numbers Component: Numbers, units, currencies S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality

Comments

@younies
Copy link
Member

younies commented Jun 15, 2024

In many application for units (especially, mixed units), There are a need to represent the numbers as Inf, Neg, Pos ... etc.

Examples:

  1. 3 feet, and 11 inches --> could not be 3 feet and -11 inches
  2. 5:03:04 --> could not be 5:-03:+04
  3. Inf L/100-km
  4. ... etc.

Therefore, we need a way to represent Inf, Neg, Nan ... etc in FixedDecimal

Options for expressing mixed units:

  1. List of FixedUnsignedDecimal
    1. New type; FixedDecimal contains it as an inner field
    2. FixedDecimal gets a generic parameter
    3. Compositional fixed decimal types, like Signed<T>
  2. List of FixedDecimal, take absolute value of each before formatting
  3. List of FixedDecimal, show sign when formatting: 3 feet, -11 inches
@younies younies self-assigned this Jun 15, 2024
@younies
Copy link
Member Author

younies commented Jun 15, 2024

  • @younies - Preventing surprising behavior is important. I don't like to mix semantics. If we use FixedDecimal, we have to document this and it can be surpring for the user.
  • @echeran - Mixed units clearly want to be able to use unsigned decimals.
  • @robertbastian - I think FixedDecimal should be unsigned, and we can have a SignedFixedDecimal. In most instances, it makes sense to be unsigned.
  • @Manishearth - We could have Signed<T>
  • @robertbastian - Signed formatting is "extra"; it is always a variant on a positive number format. We could even split the negative patterns into a separate marker that doesn't need to be loaded if you're only doing positive
  • @sffc - FixedDecimal still needs to support NaN and plus and minus infinity. That's an additional superset of the Unsigned fixed decimal. You need that for floating point and u128
  • @echeran - I think the naming and relation aspect makes sense. Whatever is the base is the simpler name. On the other hand, it seems strange that if you do arithmetic, you would not get a negative number.
  • @Manishearth - What does ECMA-402 do?
  • @Manishearth - ECMA-402 doesn't support unit formatting. The closest it gets to is duration formatting. The proposal for duration formatting is at https://tc39.es/proposal-intl-duration-format. It throws an error if the numbers are mixed signs.
  • @younies - About NaN and Inf: in the unit converter, it's possible that we could produce these values.
  • @sffc - Arithmetic is out of scope of FixedDecimal and we should keep it that way
  • @echeran - Between signed and unsigned, if one of those things is a lot more common than the other, then that one should probably get the short name.
  • @Manishearth - In computers, unsigned tends to be more common.
  • @sffc - I can see arguments for various naming conventions. FixedDecimal + SignedFixedDecimal seems okay; also UnsignedFixedDecimal + SignedFixedDecimal. We can bikeshed later.
  • @sffc - What can we do to unblock Kartavya?
  • @robertbastian - Just use FixedDecimal for now, don't look at the sign, pass a sign separately
  • @younies - Agree
  • @sffc - How do we do this composition stuff over FFI?
  • @Manishearth - Probably RefCell. It's unsafe to return &mut of inner fields or to return a mix of & and &mut
  • @sffc - Could we use FFI structs? Where one of the fields is an opaque
  • @Manishearth - The structs can have references. We couldn't have APIs that like parse a string into a Signed<FixedDecimal>.
  • @sffc - Could we have FFI SignedFixedDecimal::into_parts, then you mutate, then you ::from_parts?
  • @Manishearth - If you do that, just do RefCell
  • @sffc - Going all the way over FFI, one extra branch for RefCell is fine.
pub struct Signed<T> {
    pub sign: Sign,
    pub value: T,
}

pub enum WithInfinity<T> {
    Infinity,
    Finite(T),
}

pub enum WithNaN<T> {
    NaN,
    N(T),
}

/// Maybe:
pub struct WithCompactExponent<T> {
    pub exponent: u8,
    pub significand: T,
}

pub struct WithScientificExponent<T> {
    pub exponent: i16,
    pub significand: T,
}

pub type SignedFixedDecimal = Signed<FixedDecimal>;

pub type FixedDecimalOrInfinity = WithInfinity<FixedDecimal>;

pub type SignedFixedDecimalOrInfinity = Signed<FixedDecimalOrInfinity>;

Signed { sign: Neg, value: WithInfinity::Infinity } // \neg\inf

pub type SignedFixedDecimalOrInfinityOrNan = WithNaN<SignedFixedDecimalOrInfinity>;

Over FFI we would have to
Concrete proposal:

  • FixedDecimal becomes unsigned
  • Add composition structs and enums as shown above
  • Exact names can be bikeshed later
  • Use RefCell for FFI

LGTM: @younies @echeran @sffc @robertbastian @Manishearth

@jedel1043
Copy link
Contributor

Probably a duplicate of #862? Though, this has a more fleshed out design, so I'd say it superseeds it in a way.

@jedel1043
Copy link
Contributor

jedel1043 commented Jun 28, 2024

Considering the design above, what would be the story around the rounding functions if FixedDecimal becomes unsigned? Do we split the rounding functions into unsigned and signed variants? Some of them are duplicates of each other for unsigned numbers; floor <-> truncate, ceil <-> expand, and all the half-variants of those.

Also, would we add rounding methods to FixedDecimalOrInfinity and SignedFixedDecimalOrInfinityOrNan? Otherwise, it would be really painful having to deconstruct FixedDecimalOrInfinity into its parts to get the sign and unsigned decimal, then construct a SignedFixedDecimal just for rounding purposes, then deconstruct again and reconstruct the SignedFixedDecimalOrInfinityOrNan.

@sffc sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Jul 23, 2024
@sffc sffc added this to icu4x 2.0 Jul 23, 2024
@sffc sffc moved this to Unclaimed for sprint in icu4x 2.0 Jul 23, 2024
@sffc sffc added T-core Type: Required functionality C-numbers Component: Numbers, units, currencies S-medium Size: Less than a week (larger bug fix or enhancement) labels Jul 23, 2024
@sffc
Copy link
Member

sffc commented Sep 17, 2024

@younies Would be good to get this into 2.0, but we won't block on it. Please prioritize if you need it

@sffc
Copy link
Member

sffc commented Oct 29, 2024

2024-10-29 discussion:

  // Option 1
  pub struct FixedInteger(SignedFixedDecimal);
  // this is just:
  pub struct FixedInteger(Signed<UnsignedFixedDecimal>);
  // so we should probably have the `Signed` on the outside?
pub struct Signed<T> {
    decimal: T,
    sign: Sign,
}

// Option 1:
pub struct FixedDecimal {
    integer: Integer,
    upper_magnitude: i16,
    lower_magnitude: i16,
}
pub struct Integer {
    digits: SmallVec<[u8; 8]>,
    // keep this an i16 with the current invariants (largest significant digit)
    magnitude: i16,
}

// Option 2:
pub struct FixedDecimal {
    integer: Integer,
    // number of digits to shift down
    shift: u16,
}
pub struct Integer {
    digits: SmallVec<[u8; 8]>,
    // change the definition of `magnitude` to be u16 of the lowest significant digit
    magnitude: u16,
    upper_magnitude: u16,
}

Example 1

123,000,000,000

Current i16 model (magnitude is most significant digit):
digits: [1, 2, 3]
magnitude: 11
upper_magnitude: 11

Proposed u16 model (magnitude is least significant digit):
digits: [1, 2, 3]
magnitude: 9
upper_magnitude: 0

Example 2

0012.3400

Current i16 model:
digits: [1, 2, 3, 4]
magnitude: 1
upper_magnitude: 3
lower_magnitude: -4

Proposed u16 model:
digits: [1, 2, 3, 4]
magnitude: 2
upper_magnitude: 6
shift: 4

Game Plan

Current PR: Signed

@sffc
Copy link
Member

sffc commented Oct 29, 2024

Rounding modes:

UnsignedRoundingMode {
    Expand,
    Trunc,
    HalfExpand,
    HalfTrunc,
    HalfEven
}

SignedRoundingMode {
    Unsigned(UnsignedRoundingMode),
    Ceil,
    Floor,
    HalfCeil,
    HalfFloor
}

@sffc
Copy link
Member

sffc commented Oct 30, 2024

Another composition idea: make the base be called Natural, a natural number including zero

Then we can eventually end up with:

pub struct Natural {
    digits: SmallVec<[u8; 8]>,
    // magnitude of the lowest significant digit
    magnitude: u16,
    // number of leading zeros (TODO: u8 or u16?)
    leading_zeros: u8,
}

pub struct Decimal {
    natural: Natural,
    // rightward shift of the integer
    shift: u16,
}

Just to note, the equations I think end up being:

  • Most significant displayed digit magnitude: magnitude + digits.len() + leading_zeros - shift - 1
  • Least significant displayed digit magnitude: -shift
  • Most significant nonzero digit magnitude: magnitude + digits.len() - shift - 1
  • Least significant nonzero digit magnitude: magnitude - shift

@jedel1043
Copy link
Contributor

Rounding modes:

UnsignedRoundingMode {
    Expand,
    Trunc,
    HalfExpand,
    HalfTrunc,
    HalfEven
}

SignedRoundingMode {
    Unsigned(UnsignedRoundingMode),
    Ceil,
    Floor,
    HalfCeil,
    HalfFloor
}

I think it's fine to use the same rounding mode enum for both signed and unsigned modes, and the reason is because applying floor or ceil on an unsigned number doesn't do anything unexpected or wrong, it's just a bit superfluous.

@sffc
Copy link
Member

sffc commented Nov 6, 2024

Three reasons why I think we should split the rounding mode enum:

  1. Floor and ceiling are defined to work on real numbers (https://en.wikipedia.org/wiki/Floor_and_ceiling_functions), but an unsigned decimal is not a real number: it represents only half of real numbers, with a cliff on the lower end at 0.
  2. It could be a footgun for people to apply floor or ceiling to an unsigned decimal, especially if they have a sign that they just haven't applied to the number yet. I can see this happening in a builder-type pattern, where maybe you have a decimal, a sign, and a rounding mode coming from three different places. You want to apply the sign before the rounding mode.
  3. The signed rounding modes are directly derivable from the unsigned rounding modes. Their implementations are extremely thin: if the sign is positive, call expand() or halfExpand(), and otherwise call trunc() or halfTrunc(), for example. It is surprising if UnsignedFixedDecimal's floor() or ceil() function was not invoked by the wrapper.

@robertbastian
Copy link
Member

an unsigned decimal is not a real number

That's just incorrect. Unsigned decimals are a subset of real numbers.

@sffc
Copy link
Member

sffc commented Nov 7, 2024

What I meant was, the domain of real numbers is approximated by a signed decimal, but an unsigned decimal approximates only half of the domain.

@jedel1043
Copy link
Contributor

Floor and ceiling are defined to work on real numbers

Yes, but my original argument is that any function defined in ℝ is also defined in ℝ⁺.

The signed rounding modes are directly derivable from the unsigned rounding modes. Their implementations are extremely thin: if the sign is positive, call expand() or halfExpand(), and otherwise call trunc() or halfTrunc(), for example. It is surprising if UnsignedFixedDecimal's floor() or ceil() function was not invoked by the wrapper.

Implementation details shouldn't really matter at all. If we cared about internal readability, we wouldn't have used the IncrementLike trait — which makes the code harder to understand — to improve binary sizes.

It could be a footgun for people to apply floor or ceiling to an unsigned decimal, especially if they have a sign that they just haven't applied to the number yet. I can see this happening in a builder-type pattern, where maybe you have a decimal, a sign, and a rounding mode coming from three different places. You want to apply the sign before the rounding mode.

This is fair, but I think the same footgun can occur in so many different ways (e.g. round first then call multiplied_pow10) that we should try to make the API easier to use instead of trying to protect the user from small footguns.

@younies
Copy link
Member Author

younies commented Nov 13, 2024

For now, we have implemented two enums: UnsignedRoundingMode and SignedRoundingMode. Should I bring this up in the ICU4X meeting, or should we proceed with using two enums as previously discussed?

@sffc
Copy link
Member

sffc commented Nov 14, 2024

Yes, but my original argument is that any function defined in ℝ is also defined in ℝ⁺.

I can see your perspective on that point

This is fair, but I think the same footgun can occur in so many different ways (e.g. round first then call multiplied_pow10) that we should try to make the API easier to use instead of trying to protect the user from small footguns.

Whether or not a function exists on a certain type is the easiest, most effective, and cleanest way to nudge developers to do the right thing. I don't see how missing a function makes the API easier or harder to use. If you try calling .ceil() on an UnsignedFixedDecimal and you get a "function not found" error, then you have an opportunity to learn about whether you should be using either .expand() or whether you should convert to a SignedFixedDecimal first. That seems all good and not bad.

Implementation details shouldn't really matter at all. If we cared about internal readability, we wouldn't have used the IncrementLike trait — which makes the code harder to understand — to improve binary sizes.

I wasn't making an argument about internal readability; I was making an argument about being surprising that UnsignedFixedDecimal::ceil is a leaf function that is not used by SignedFixedDecimal::ceil.

@jedel1043
Copy link
Contributor

Whether or not a function exists on a certain type is the easiest, most effective, and cleanest way to nudge developers to do the right thing.

Yes, I think we shouldn't have a ceil function for UnsignedFixedDecimal, nudging users to other functions if possible, but what I'm actually pushing for is just having a simple flat RoundingMode, because having wrapped enums is a bit clunky to use and a bit harder to abstract around than just having a plain enum.

@sffc
Copy link
Member

sffc commented Nov 15, 2024

One more thing: ECMA-402 actually defines an Unsigned Rounding Mode enumeration

https://tc39.es/ecma402/#sec-getunsignedroundingmode

It uses different names:

  • Infinity
  • Zero
  • Half-Infinity
  • Half-Zero
  • Half-Even

So one option is we could have RoundingMode with the 9 familiar names and UnsignedRoundingMode with these 5 names derived from ECMA-402?

@sffc
Copy link
Member

sffc commented Jan 11, 2025

What exactly are the use cases for NaN and Infinity?

I've long been of the belief that FixedDecimal was best just leaving them out and asking user code to handle these cases. Unfortunately this makes code more complicated and even experienced programmers don't really know what the right thing is to do, so all previous i18n libraries to my knowledge have handled NaN and Infinity internally.

What I'm sort of getting at is maybe we can design a thinner API with the primary goal of developer economics but otherwise not doing anything else special with these values.

@jedel1043
Copy link
Contributor

If that's the idea, I would support having a separate formatter (or even just a databag to fetch the localized versions of those) for only the special cases of NaN and +-Infinity.

@Manishearth
Copy link
Member

I've long been of the belief that FixedDecimal was best just leaving them out and asking user code to handle these cases. Unfortunately this makes code more complicated and even experienced programmers don't really know what the right thing is to do, so all previous i18n libraries to my knowledge have handled NaN and Infinity internally.

I agree with this. I think we could have separate formatters, but currently CLDR data for Infinity is just always ∞, and CLDR data for NaN is mostly "NaN", with some manual translations which many of us feel is actually counterproductive.

We already were discussing normalizing to the string "NaN". I think it seems fine for implementations to wrap around FixedDecimalFormatter and hardcode ∞ and NaN.

If we want we can provide an end-to-end FixedDecimalFormatter::format_f64_with_nan_and_infinity that returns an impl Writeable, perhaps a pair of these, one that takes in a SignedFixedDecimal and a NaNInfState enum.

@Manishearth
Copy link
Member

So it seems like ICU4X-TC's general opinion around this, as well as that of CLDR's design-wg, is that we should hardcode "NaN" and ∞, and potentially provide specialized formatters for just those strings in the future.

Given that, I propose this:

  • We move this issue out of the 2.0 milestone: we're not proposing data changes
  • We have this issue track a 2.x API to make Infinity/NaN formatting easier

One small missing thing is -Infinity: for users to be able to format that they still need FixedDecimalFormatter data. We could provide a format_signed_string() method that lets you format Signed<&str> or something. This could be a 2.0 stretch part of this issue.

Thoughts?

@Manishearth Manishearth added the discuss Discuss at a future ICU4X-SC meeting label Jan 15, 2025
@sffc
Copy link
Member

sffc commented Jan 16, 2025

WG discussion:

// This type is basically what ECMA-402 needs
let with_nan_with_inf = WithNan::<Signed<WithInfinity<UnsignedFixedDecimal>>::try_from_f64(x);

// These are nicer types that don't propagate NaNs
let with_inf: Signed<WithInfinity<UnsignedFixedDecimal> = with_nan_with_inf.non_nan()?; // NanError
let fd: Signed<UnsignedFixedDecimal> = with_inf.finite()?; // LimitError

// Unclear what the use case for this is
let with_nan: WithNan<Signed<UnsignedFixedDecimal> = with_nan_with_inf.finite()?; // LimitError
  • @sffc Do we keep the try_from_f64 fns on SignedFixedDecimal with their current signatures?
  • @robertbastian No, try_from_f64 becomes infallible and returns WithNan<Signed<WithInfinity<UnsignedFixedDecimal>>
  • @sffc It seems like since we will have the nicer parse API, we can continue to return a LimitError in the try_from_f64 functions on the other types.
  • @robertbastian - The nice parse API should be the most accessible API, but users won't generally reach for WithNan<Signed<WithInfinity<UnsignedFixedDecimal>>. The type you're asking for is still in there, it's just wrapped in WithNan and WithInfinity instead of Result<_, LimitError> or Result<_, NanError>
  • @sffc I think it is bad form for try_from functions to return something that is not the type. We do this in a small number of compiled data constructors and I'd rather not do it in more places.
  • @robertbastian - Constructors will return Result<Arc<Box<_>>>, the type is still in there, it's just wrapped
  • @sffc The fact that you called try_new on SignedFixedDecimal means that you are already opting to get that type and treat the other cases as errors
  • @robertbastian - Disagree, you're calling it on SignedFixedDecimal because a type like WithNan<Signed<WithInfinity<UnsignedFixedDecimal>> is scary and you scrolled right past it
  • @sffc The type should not be scary. I assume we will give all of these nice type aliases.
  • @sffc I want to arrive on a reasonable solution without the WithNaN/WithInfinity in 2.0, but which we can extend later in 2.x
  • @robertbastian I'm happy to make infinite the default, but NaN should not be, because it's basically null/None/Err(_) and should only appear as the result of a float parse

Possible type aliases:

  • SignedFixedDecimal = Signed<UnsignedFixedDecimal>
  • Decimal = Signed<WithInfinity<UnsignedFixedDecimal>>
  • DecimalWithNaN = WithNaN<Signed<WithInfinity<UnsignedFixedDecimal>>>

@robertbastian Counter proposal:

  • UnsignedFiniteFixedDecimal = Fixed
  • FiniteFixedDecimal = Signed<Fixed>
  • FixedDecimal = Signed<WithInfinity<Fixed>>
  • FixedDecimalWithNaN = WithNaN<Signed<WithInfinity<Fixed>>>

@Manishearth
Copy link
Member

Was not supporting NaN discussed? I think supporting Infinity makes some sense: it's hard to format -Inf without it, and Infinity is somewhat likely to come up in some cases.

But NaN really just is an error and we should not be handling errors via formatting. And if we support Inf, userspace formatting for NaN (for 402) becomes relatively straightforward.

@Manishearth
Copy link
Member

I want to arrive on a reasonable solution without the WithNaN/WithInfinity in 2.0, but which we can extend later in 2.x

This also works for me. I think what we have now is acceptable. The default format function shouldn't handle Inf IMO.

@sffc
Copy link
Member

sffc commented Jan 17, 2025

I think what we have now is acceptable.

I mostly agree, but I think the names need to be bikeshed more.

@Manishearth Manishearth added the discuss-priority Discuss at the next ICU4X meeting label Jan 21, 2025
@Manishearth
Copy link
Member

Marking as priority, let's discuss this to the point that it stops being a 2.0 blocker.

Since I can't make the call tomorrow, and also potentially not the WG meeting due to UTC, I'll state my principles here for the naming:

  • I have a slight preference for the signed one being the "default" name.
  • I do think Signed<UnsignedFixedDecimal> is kind of silly, though. But this is lower priority than my previous preference, so I think it's better than having a SignedFixedDecimal type.
  • I don't think NaN should be in a type with the "default" name.
  • I think Infinity could be in the type with the "default" name but I do not prefer it. Infinity is not a decimal type, it's a different thing,

This gives me the types Signed<T>, UnsignedFixedDecimal, and WithInfinity<T>, with FixedDecimal being Signed<UnsignedFixedDecimal>.

@sffc
Copy link
Member

sffc commented Jan 21, 2025

@Manishearth did you have an opinion on my proposal to keep the crate named fixed_decimal but remove "Fixed" from type names

@sffc
Copy link
Member

sffc commented Jan 21, 2025

We're not including integers in this discussion, but we should.

I think there are only a few compositions that are actually useful:

  1. An unsigned fixed decimal
    • Use case: number of minutes in a duration
  2. A signed fixed decimal
    • Use case: formatting currency values
  3. A signed fixed decimal with infinity and NaN
    • Use case: formatting IEEE binary or decimal floating point values
  4. An unsigned fixed integer
    • Use case: number of months in a duration
  5. A signed fixed integer
    • Use case: formatting a BigInt

I did not include "infinity but not NaN" in the list above because I couldn't immediately identify a real use case. If you are dealing with non-finite values, you usually hit both Infinity (1/0) and NaN (0/0).

The Rust convention and the convention in most other programming languages is that the signed thing is the default and unsigned has an adjective, as much as I dislike negated adjectives.

So, the only real question is whether the finite or non-finite decimal should be the "default" one. The position I've held for years has been that fixed_decimal should specialize in finite decimal values, and I don't see a reason to change.

This would imply naming such as the following:

  1. UnsignedDecimal
  2. Decimal
  3. AugmentedDecimal / DecimalPlus / PotentiallyNonFiniteDecimal / ...
  4. UnsignedInteger
  5. Integer

As I did last week, I dropped "Fixed" from my names because the crate says it. However, if we think people are going to want to regularly import this into a project that has a Decimal type from somewhere else, we can keep Fixed.

@Manishearth
Copy link
Member

Good question. I think calling it Decimal is fine, fixed_decimal::Decimal is unambiguous, Decimal on its own does not need to be unambiguous, and people can always rename to FixedDecimal if they have a clash (unlikely).

I think the suggested naming is fine, ignoring specific choices for the Infinite cases (which I think we should just bikeshed separately).

@sffc
Copy link
Member

sffc commented Feb 5, 2025

Discussion: core decimal types

  • @sffc Did we want pub type UnsignedDecimal = WithDecimal<UnsignedInteger>? Decimal is an Integer with an extra field that specifies the location of the decimal point.
  • @robertbastian Yes, but it is extra work.
  • @sffc Is it breaking to change pub struct UnsignedDecimal to pub type UnsignedDecimal?
  • @robertbastian Maybe, not sure
  • @sffc If we keep fixed_decimal as a separate non-stable crate, this is fine, because the only thing we format is Decimal
  • @sffc Do we want the type wrap to be private? pub struct Decimal(Signed<UnsignedDecimal>)
  • @robertbastian That requires duplicating all the functions or derefing, and removing a deref step seems more semver breaking than the alternative

Proposal:

  1. Definitely in 2.0:
    • pub type Decimal = Signed<UnsignedDecimal>
    • pub struct UnsignedDecimal
  2. Additional integer renames (see point 3):
    • pub type Integer = Signed<UnsignedInteger>
    • pub struct UnsignedInteger
  3. If fixed_decimal remains a util crate, we can do the integer changes post-2.0, which may involve a range dependency from icu_decimal to fixed_decimal. If we merge it into icu_decimal, we put the integer types behing an experimental feature.
  4. If we can make pub type UnsignedDecimal in 2.x, great, if not, fine, maybe in 3.0

LGTM: @sffc @younies @robertbastian @echeran

Issue to handle this part:
#6144

@robertbastian
Copy link
Member

Discussion on how to handle infinity formatting:

  • @younies In units, sometimes when conversion happens, there could be infinity or NaN. For example, liter-per-100-kilometer to miles-per-gallon: if you are driving downhill, you can get 0 liter-per-100-kilometer. In general this can happen whenever you have inverse units.
  • @Manishearth What we do during conversion is different than what we do in formatting. It's fine for conversion to return WithInfinity.
  • @younies You won't have infinity in currency formatting. Most formatters don't need infinity. And in units conversion, you won't end up with NaN.
  • @Manishearth A "half-proposal" is: when converting meters to feet, you shouldn't need to worry about infinity, but when converting to units where it matters, you need to explicitly handle it, which implies you may need two different conversion APIs. How hard would it be to have a separate trait to distinguish reciprocals?
  • @younies Yeah, it could work
  • @sffc I like the idea of handling Infinity and NaN as part of the type system, and I think it can be done.
  • @younies The Infinity data is very small.

Architectural proposal (not bikeshedding):

For 2.0:

  • FixedDecimalFormatter does not format Infinity by default.
  • FixedDecimalFormatter does not format NaN and will likely never do so.
  • We consider Infinity to be a special case that we wish ICU4X users to think about and handle when they see them
  • We do not currently (2.0) need a WithInfinity type until Units wants one

LGTM: @sffc @Manishearth @younies @robertbastian @echeran

Rough design for units post 2.0 (not binding):

  • You have UnitsConverterFactory::converter() return an error if you attempt to create a converter between two units that are reciprocals AND the N type you use cannot handle infinities.
  • We can provide a separate InfinityFormatter if asked

Discussion for FixedDecimal vs UnsignedFixedDecimal vs SignedFixedDecimal

  • @sffc One term that exists elsewhere is "finite" to refer to a IEEE number that does not include infinity and NaN. Can we just use that terminology. FiniteDecimal and FiniteDecimalFormatter
  • @Manishearth These are not IEEE
  • @robertbastian Finite should not be the special case
  • @sffc We would end up with WithInfinity<FiniteDecimal> or similar
  • @robertbastian "fixed" decimals don't usually come with NaN and infinity, the WithInfinity and WithNan types are artifacts of float conversions.
  • @Manishearth "finite" is more clear if "fixed" is confusing. It doesn't answer our other questions.
  • @younies There are BigDecimal, etc., which include infinity and NaN.
  • @sffc I share the experience that people expect a type called "decimal" to handle NaN and Infinity.
  • @robertbastian - Java BigDecimal doesn't include infinity... if you divide by zero, you get an exception, all other calculatsions are exact. IEEE floats use infinities as propagating error values
  • @sffc What does "fixed decimal" mean to you?
  • @Manishearth I would assume it is what we have except that the magnitude is a constant rather than a variant.
  • @echeran For me, when you say "fixed decimal" I think of the ICU4C type that has that name.
    @Manishearth , if we are going to change the name, I think Finite is not the one
  • @sffc I think that "finite" is more clear on what this actually is, and it works better with the ecosystem where we have non-finite as part of the type system.
  • @Manishearth If we rename "fixed" to "finite", we should rename it everywhere. I don't like "fixed" in the crate name but not in the type name
  • @echeran +1
  • @robertbastian Can we distill the behavior of trailing zero handling into one word?
  • .. - it's essentially "fixed", just runtime fixed
  • @sffc agrees and withdraws the FiniteDecimal proposal
  • @younies We currently have
    1. FixedDecimal
    2. FiniteDecimal
    3. Decimal
  • @Manishearth One reason I like using Decimal is because it de-emphasizes the fixedness, but it's still okay...ish to have fixed_decimal::Decimal. It's not strongly deciding that "fixed" is incorrect.
  • @sffc Let's stick with fixed_decimal as the crate name please, and if that eliminates FiniteDecimal from contention, fine
  • @robertbastian Maybe it should be a type in icu_decimal since it is only used for formatting.
  • @sffc I think we need this as a holistic discussion.
  • @robertbastian If the type is not in the fixed_decimal crate, then are we still okay with having it not have "fixed" in the name?
  • @robertbastian I am, because if it is in icu_decimal, it strongly suggests that it is a formatting type.
  • @robertbastian To clarify, Decimal is a type alias to Signed<UnsignedDecimal>?
  • @sffc That was my thinking, yes. But we are currently deciding on the main exported names.

Proposal:

  • Names are Decimal and UnsignedDecimal
  • We can also have integer types named Integer and UnsignedInteger, which are like the decimal types except that the lower magnitude is always zero
  • Those names could be in either fixed_decimal or icu_decimal

LGTM: @sffc @Manishearth @robertbastian @echeran @younies

@Manishearth
Copy link
Member

Rob to finish the renames

@sffc sffc removed discuss Discuss at a future ICU4X-SC meeting discuss-priority Discuss at the next ICU4X meeting labels Feb 10, 2025
@younies
Copy link
Member Author

younies commented Feb 18, 2025

Now, the decimal.rs has the UnsignedDecimal (see #6143),

Shall we rename the files too ?

@robertbastian
Copy link
Member

We can clean up file names later, that's not 2.0 blocking. I agree it would be nice to do that.

Manishearth pushed a commit that referenced this issue Feb 18, 2025
# Description: 

Renames the UnsignedFixedDecimal type to UnsignedDecimal across multiple
files in the fixed_decimal module. This includes updates to:
- Rust source files
- Documentation
- Test files
- Dart bindings
- Diplomat coverage allowlist

The rename maintains the existing functionality while providing a more
concise type name.

Related Issues: #5065,
#6144


<!--
Thank you for your pull request to ICU4X!

Reminder: try to use [Conventional
Comments](https://conventionalcomments.org/) to make comments clearer.

Please see
https://github.com/unicode-org/icu4x/blob/main/CONTRIBUTING.md for
general
information on contributing to ICU4X.
-->
@Manishearth
Copy link
Member

We've landed both renames. This is done.

@github-project-automation github-project-automation bot moved this from Unclaimed for sprint to Done in icu4x 2.0 Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes C-numbers Component: Numbers, units, currencies S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality
Projects
Status: Done
Development

No branches or pull requests

5 participants