bmerge errors on integer64 to double join #4167

MichaelChirico · 2020-01-09T21:50:44Z

Just came across this... haven't had time to minimize the example but here goes:

library(DBI)
library(RSQLite)
library(data.table)
# download and unzip file from here:
#   https://ankiweb.net/shared/info/867291675
card_db = dbConnect(SQLite(),  'collection.anki2')
cards = setDT(dbGetQuery(card_db, 'select * from cards'))
cards[id == 1419644220828]
# Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  : 
#   Incompatible join types: x.id is type integer64 but i.id is type double and contains fractions

The text was updated successfully, but these errors were encountered:

jangorecki · 2020-01-10T03:33:59Z

cards=data.table(id=bit64::as.integer64(1419644220828))
cards[id == 1419644220828]
# Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  : 
#   Incompatible join types: x.id is type integer64 but i.id is type double and contains fractions

should we automatically coerce RHS? otherwise error seems fine, isn't it?

MichaelChirico · 2020-01-10T03:38:54Z

Forgot to add Low, filed because:

and contains fractions

is not right

vspinu · 2020-04-09T18:42:35Z

I think there is no reliable way to check if a double is an "integer" or not. So I think it's best in this line to just convert to integer64 and rely on bit64 to actually do the check.

Or at least add an option to disable the check. Or maybe just add a warning instead of an error.

jangorecki · 2020-05-24T11:15:26Z

I think we should promote integer64 to double type instead.
Logical hierarchy would be like this
integer < integer64 < numeric
The actual storage precision will not match to that hierarchy bc:
2^31 < 2^63 < 2^51
But this is not our problem, what we should care about is logical hierarchy.

We could check for numeric to be whole numbers, we have fast routine for that, but that will introduce inconsistencies in behavior that depends on the data, so I would not go that way.

But we use bmerge not only for joins, but as well during subset, where DT[int64col==1] probably should be coerced to int64, a not lhs to double.

MichaelChirico · 2024-11-27T06:16:38Z

Just found this bug again after I filed the duplicate #6625.

I think we should promote integer64 to double type instead.

I'm not sure I follow. I think the approach taken in #6626 is OK.

Fundamentally, there is no hierarchical ordering of integer64 and double. There are many integers that can be represented as integer64, but not as double, and vice versa. That contrasts to integer32 which is a strict subset of double.

For example, 36028797018963972 (2^55 + 4) is perfectly represented in integer64, but resolves as 36028797018963968 (2^55) in double, whereas 1180591620717411303424 (2^70) is perfectly represented in double, but is outside the integer64 limits. (see https://numeral-systems.com/ieee-754-converter/)

MichaelChirico · 2024-11-27T19:20:25Z

Trying to write out a bit more carefully why I think the approach taken in #6626 is good.

If we want to attempt the join at all, we need a common storage type. Either integer64 or double should be chosen.
- 128-bit storage might be better but is not really that widespread: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format "quadruple precision may be specified by the long double type, but this is not required by the language (which only requires long double to be at least as precise as double), nor is it common"
- character is an available common super-type, but coercing to character to join is definitely slow. I'd rather error and ask the user to coerce on their own than pursue this.
We should not try to convert integer64➡️double because there's not easy way to ensure it's possible IINM. (double)i64 == i64 won't really work, IIUC it'd have to be (long double)i64 and be assured long double is actually quadruple-precision (which, see above). Other ways to check would require lots of handwritten arithmetic that will be hard to assure is platform- and implementation-robust.
By contrast, isRealReallyInt64() looks reasonable to me as a way to ensure converting double➡️integer64 WAI. To be assured a given double has representation as integer64, it's enough to know that (1) the value is within bit64::lim.integer64() range and (2) it has no fractional part.
The downside is there are some joins that might be possible which we don't perform, e.g. joining 2^100 to as.integer64(1L) in principal is the same as joining to as.integer64(2L) -- it's just like any other miss to be handled by whether we're doing INNER or OUTER join.
- This is fine, the user should just do the coercion themselves if they want to proceed.

MichaelChirico added the Low label Jan 10, 2020

jangorecki added joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins bit64 labels Apr 2, 2020

jangorecki changed the title ~~bmerge error finds double with fractions but its whole number~~ bmerge errors on integer64 to double join May 24, 2020

MichaelChirico added the consistency label Oct 2, 2021

MichaelChirico mentioned this issue Nov 27, 2024

Misleading (and incorrect?) error about incompatible column types in integer64 join #6625

Closed

MichaelChirico removed the Low label Nov 27, 2024

MichaelChirico mentioned this issue Nov 27, 2024

Allow double-integer64 joins when double is in (integer32 , integer64] range #6626

Merged

MichaelChirico closed this as completed in #6626 Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bmerge errors on integer64 to double join #4167

bmerge errors on integer64 to double join #4167

MichaelChirico commented Jan 9, 2020

jangorecki commented Jan 10, 2020 •

edited

Loading

MichaelChirico commented Jan 10, 2020

vspinu commented Apr 9, 2020

jangorecki commented May 24, 2020 •

edited

Loading

MichaelChirico commented Nov 27, 2024 •

edited

Loading

MichaelChirico commented Nov 27, 2024

bmerge errors on integer64 to double join #4167

bmerge errors on integer64 to double join #4167

Comments

MichaelChirico commented Jan 9, 2020

jangorecki commented Jan 10, 2020 • edited Loading

MichaelChirico commented Jan 10, 2020

vspinu commented Apr 9, 2020

jangorecki commented May 24, 2020 • edited Loading

MichaelChirico commented Nov 27, 2024 • edited Loading

MichaelChirico commented Nov 27, 2024

jangorecki commented Jan 10, 2020 •

edited

Loading

jangorecki commented May 24, 2020 •

edited

Loading

MichaelChirico commented Nov 27, 2024 •

edited

Loading