Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bmerge errors on integer64 to double join #4167

Closed
MichaelChirico opened this issue Jan 9, 2020 · 6 comments · Fixed by #6626
Closed

bmerge errors on integer64 to double join #4167

MichaelChirico opened this issue Jan 9, 2020 · 6 comments · Fixed by #6626
Labels
bit64 consistency joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins

Comments

@MichaelChirico
Copy link
Member

Just came across this... haven't had time to minimize the example but here goes:

library(DBI)
library(RSQLite)
library(data.table)
# download and unzip file from here:
#   https://ankiweb.net/shared/info/867291675
card_db = dbConnect(SQLite(),  'collection.anki2')
cards = setDT(dbGetQuery(card_db, 'select * from cards'))
cards[id == 1419644220828]
# Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  : 
#   Incompatible join types: x.id is type integer64 but i.id is type double and contains fractions
@jangorecki
Copy link
Member

jangorecki commented Jan 10, 2020

cards=data.table(id=bit64::as.integer64(1419644220828))
cards[id == 1419644220828]
# Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  : 
#   Incompatible join types: x.id is type integer64 but i.id is type double and contains fractions

should we automatically coerce RHS? otherwise error seems fine, isn't it?

@MichaelChirico
Copy link
Member Author

Forgot to add Low, filed because:

and contains fractions

is not right

@jangorecki jangorecki added joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins bit64 labels Apr 2, 2020
@vspinu
Copy link

vspinu commented Apr 9, 2020

I think there is no reliable way to check if a double is an "integer" or not. So I think it's best in this line to just convert to integer64 and rely on bit64 to actually do the check.

Or at least add an option to disable the check. Or maybe just add a warning instead of an error.

@jangorecki jangorecki changed the title bmerge error finds double with fractions but its whole number bmerge errors on integer64 to double join May 24, 2020
@jangorecki
Copy link
Member

jangorecki commented May 24, 2020

I think we should promote integer64 to double type instead.
Logical hierarchy would be like this
integer < integer64 < numeric
The actual storage precision will not match to that hierarchy bc:
2^31 < 2^63 < 2^51
But this is not our problem, what we should care about is logical hierarchy.

We could check for numeric to be whole numbers, we have fast routine for that, but that will introduce inconsistencies in behavior that depends on the data, so I would not go that way.

But we use bmerge not only for joins, but as well during subset, where DT[int64col==1] probably should be coerced to int64, a not lhs to double.

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Nov 27, 2024

Just found this bug again after I filed the duplicate #6625.

I think we should promote integer64 to double type instead.

I'm not sure I follow. I think the approach taken in #6626 is OK.

Fundamentally, there is no hierarchical ordering of integer64 and double. There are many integers that can be represented as integer64, but not as double, and vice versa. That contrasts to integer32 which is a strict subset of double.

For example, 36028797018963972 (2^55 + 4) is perfectly represented in integer64, but resolves as 36028797018963968 (2^55) in double, whereas 1180591620717411303424 (2^70) is perfectly represented in double, but is outside the integer64 limits. (see https://numeral-systems.com/ieee-754-converter/)

@MichaelChirico
Copy link
Member Author

Trying to write out a bit more carefully why I think the approach taken in #6626 is good.

  • If we want to attempt the join at all, we need a common storage type. Either integer64 or double should be chosen.
    • 128-bit storage might be better but is not really that widespread: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format "quadruple precision may be specified by the long double type, but this is not required by the language (which only requires long double to be at least as precise as double), nor is it common"
    • character is an available common super-type, but coercing to character to join is definitely slow. I'd rather error and ask the user to coerce on their own than pursue this.
  • We should not try to convert integer64➡️double because there's not easy way to ensure it's possible IINM. (double)i64 == i64 won't really work, IIUC it'd have to be (long double)i64 and be assured long double is actually quadruple-precision (which, see above). Other ways to check would require lots of handwritten arithmetic that will be hard to assure is platform- and implementation-robust.
  • By contrast, isRealReallyInt64() looks reasonable to me as a way to ensure converting double➡️integer64 WAI. To be assured a given double has representation as integer64, it's enough to know that (1) the value is within bit64::lim.integer64() range and (2) it has no fractional part.
  • The downside is there are some joins that might be possible which we don't perform, e.g. joining 2^100 to as.integer64(1L) in principal is the same as joining to as.integer64(2L) -- it's just like any other miss to be handled by whether we're doing INNER or OUTER join.
    • This is fine, the user should just do the coercion themselves if they want to proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bit64 consistency joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants