Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread cleanup prints error message in webR #5969

Closed
maek-ies opened this issue Feb 29, 2024 · 5 comments
Closed

fread cleanup prints error message in webR #5969

maek-ies opened this issue Feb 29, 2024 · 5 comments

Comments

@maek-ies
Copy link

maek-ies commented Feb 29, 2024

data.table is working quite fine in webR, but when using fread an error occurs (System errno 8 unmapping file: Bad file descriptor). Interestingly, the file loads but it throws an error neverthless. Same error occurs also for .gz files. Perhaps, it's just the fact that this runs in browser using wasm is not recognized, resulting in an error.

install.packages("data.table")
library("data.table")

dt = data.table(1:10)
fwrite(dt,"dt.csv")
list.files()
dtcsv = fread("dt.csv", verbose = T)
print(dtcsv)

sessionInfo()

Resulting output:

`> install.packages("data.table")
Downloading webR package: data.table

library("data.table")
data.table 1.14.7 IN DEVELOPMENT built 2023-12-19 09:51:47 UTC using 1 threads (see ?getDTthreads). Latest news: r-datatable.com


This development version of data.table was built more than 4 weeks ago. Please update: data.table::update_dev_pkg()



This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.
This is Emscripten. This warning should not normally occur on Windows or Linux where OpenMP is turned on by data.table's configure script by passing -fopenmp to the compiler. If you see this warning on Windows or Linux, please file a GitHub issue.


dt = data.table(1:10)

fwrite(dt,"dt.csv")

list.files()
[1] "dt.csv"

dtcsv = fread("dt.csv", verbose = T)
This installation of data.table has not been compiled with OpenMP support.
omp_get_num_procs() 1
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 1
omp_get_max_threads() 1
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 1 threads with throttle==1024. See ?setDTthreads.
freadR.c has been passed a filename: dt.csv
[01] Check arguments
Using 1 threads (omp_get_max_threads()=1, nth=1)
NAstrings = [<>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as integer
[02] Opening the file
Opening file dt.csv
File opened, size = 24 bytes.
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
Positioned on line 1 starting: <>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep automatically ...
No sep and quote rule found a block of 2x2 or greater. Single column input.
Detected 1 columns on line 1. This line is either column names or first data row. Line starts as: <>
Quote rule picked = 0
fill=false and the most number of columns found is 1
[07] Detect column types, good nrow estimate and whether first row is column names
Number of sampling jump points = 1 because (23 bytes from row 1 to eof) / (2 * 23 jump0size) == 0
Type codes (jump 000) : 6 Quote rule 0
'header' determined to be true due to column 1 containing a string on row 1 and a lower type (int32) in the rest of the 10 sample rows
All rows were sampled since file is small so we know nrow=10 exactly
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : 6
[10] Allocate memory for the datatable
Allocating 1 column slots (1 - 0 dropped) with 10 rows
[11] Read the data
jumps=[0..1), chunk_size=1048576, total_size=20
Read 10 rows x 1 columns from 24 bytes file in 00:00.004 wall clock time
[12] Finalizing the datatable
Type counts:
1 : int32 '6'
=============================
0.001s ( 25%) Memory map 0.000GB file
0.002s ( 50%) sep='' ncol=1 and header detection
0.000s ( 0%) Column type detection using 10 sample rows
0.001s ( 25%) Allocation of 10 rows x 1 cols (0.000GB) of which 10 (100%) rows used
0.000s ( 0%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 10 rows) using 1 threads

  • 0.000s ( 0%) Parse to row-major thread buffers (grown 0 times)
  • 0.000s ( 0%) Transpose
  • 0.000s ( 0%) Waiting
    0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
    0.004s Total
    System errno 8 unmapping file: Bad file descriptor

print(dtcsv)
V1

1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
8: 8
9: 9
10: 10

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: wasm32-unknown-emscripten (32-bit)
Running under: emscripten

Matrix products: default

locale:
[1] en_US.UTF-8 C en_US.UTF-8 en_US.UTF-8 en_US.UTF-8 C

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.14.7

loaded via a namespace (and not attached):
[1] tools_4.3.2 webr_0.2.2.9000
error_fread
`

@ben-schwen ben-schwen changed the title fread error not working in webR fread cleanup prints error message in webR Feb 29, 2024
@MichaelChirico
Copy link
Member

I really don't know how to debug webR issues. My instinct is to file with them first, however... cc @georgestagg

@georgestagg
Copy link
Contributor

georgestagg commented Feb 29, 2024

I believe this is a manifestation of an Emscripten issue, emscripten-core/emscripten#20459, from the use of munmap() in fread.c.

data.table/src/fread.c

Lines 152 to 153 in 07fd933

if (munmap(mmp, fileSize))
DTPRINT(_("System errno %d unmapping file: %s\n"), errno, strerror(errno)); // # nocov

Until Emscripten's support for mmap() is improved, we might be able to patch data.table so that the file is not closed until after munmap() is run when running under Wasm. I could maintain such a change in our r-wasm fork, or upstream back here, depending on how the maintainers feel about that kind of patch. It would probably be handled by an #ifdef __EMSCRIPTEN__.

In the meantime, from my reading of the data.table source code, the routine attempts to carry on even after showing this error. So, as you have observed, at least in the short term things should continue to work under webR even though the error is emitted.

@MichaelChirico
Copy link
Member

Thanks for the quick reply!

I could maintain such a change in our r-wasm fork, or upstream back here, depending on how the maintainers feel about that kind of patch.

Thinking aloud here a bit.

I am fine with such a patch here (we'd have to assign any related issues to you I think).

If the issue might affect other installations of data.table (any OS/environment which has the same trouble with mmap()?), here is definitely the "right" place to maintain it. If it's a more general issue we might benefit from some ./config testing as well?

If it's truly isolated to Wasm environments, it may make more sense to maintain a patch in the fork, but OTOH we do maintain some #ifdef stuff related to Windows, so is that roughly the same?

@georgestagg
Copy link
Contributor

OTOH we do maintain some #ifdef stuff related to Windows, so is that roughly the same?

Yes, I would consider it roughly the same. Emscripten can be thought of as another OS, just extremely similar to Linux.

I think the best practice for true Unix portability would be to assume that mmap() is not always available, test for it in configure, and include some fallback path, but that's a much bigger change to the system. Also, AFAICT this particular issue is an Emscripten only problem that might even go away in the future.

I am fine with such a patch here (we'd have to assign any related issues to you I think).

OK. I'll create a short PR with the required #ifdef __EMSCRIPTEN__ changes for Webassembly, and we can go from there.

@georgestagg
Copy link
Contributor

Just a quick note to add that this now seems to work without error in the latest webR at https://webr.r-wasm.org/v0.3.1/

So, AFAICT this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants