You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey Julia Community,
I am very new to Julia, but what I saw so far is very good. On a Project I read some csv-files with the csv.jl. The Threads.nthreads() is set to 60.
Some CSV-Files could not be read and the Programm exit with the following error Message:
ERROR: TaskFailedException
nested task error: thread = 38 fatal error, encountered an invalidly quoted field while parsing around row = 127, col = 1: ""outcviusqgvvejbjwrbumoedfhtdyiorvqueekyfhwzegowxkzomzskinamwxiimajggitwcymyxnjtpuhtbwngpunlwelyfkpfo
vqosvsysvoqkxgzaepzvrbrbneqpidrcrhgsmglapotilebnkntoecqywbxwaiiticlzbpslhkkyjvujwddoduzmixjpznipcptb
", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself
Stacktrace:
[1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
@ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:596
[2] parsevalue!(::Type{…}, buf::Vector{…}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
@ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:804
[3] parserow
@ ~/.julia/packages/CSV/XLcqT/src/file.jl:646 [inlined]
[4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{…}, ::Type{…})
@ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:556
[5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{…}, rowchunkguess::Int64, i::Int64, rows::Vector{…}, wholecolumnslock::ReentrantLock)
@ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:366
[6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
@ CSV ~/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:455
[2] macro expansion
@ ./task.jl:487 [inlined]
[3] CSV.File(ctx::CSV.Context, chunking::Bool)
@ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:240
[4] File
@ ~/.julia/packages/CSV/XLcqT/src/file.jl:227 [inlined]
[5] #File#32
@ ~/.julia/packages/CSV/XLcqT/src/file.jl:223 [inlined]
[6] #read#118
@ ~/.julia/packages/CSV/XLcqT/src/CSV.jl:117 [inlined]
[7] read
@ ~/.julia/packages/CSV/XLcqT/src/CSV.jl:113 [inlined]
[8] top-level scope
@ ./REPL[208]:3
Some type information was truncated. Use `show(err)` to see complete types.
The data I am using here is generated by the following code:
using Random
using CSV
factor = 100
open(joinpath(@__DIR__, "test.csv"), "w") do file
write(file, "a;b;c;d\n")
write(file, randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*"\n")
for i in 1:1000
write(file, "\""*randstring('a':'z', 1*factor)*"\n"*randstring('a':'z', 1*factor)*"\n"*"\n"*randstring('a':'z', 1*factor)*";"*randstring('a':'z', 1*factor)*"\""
*";"*randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*"\n")
end
end
I tried to generate a csv-file which looks similar to the real world data I am facing. There are a lot more columns in the real world data but that doesn't matter. The Problem is caused by splitting the input-file into several chunks and read them in parallel. Thats a very big advantage of this library and results in a lot of speed when it comes to reading csv files. Simple workaround is to set ntasks to one so the file could easily be read and parsed as a DataFrame.
on one execution I got the following Message
┌ Error: Multithreaded parsing failed and fell back to single-threaded parsing. This can happen if the input contains multi-line fields; otherwise, please report this issue.
└ @ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:579
after I saw this Message I want to share my results of the findings and ask why this fallback method isn't used every time?
My code to read the csv file:
for i in 1:60
println(i)
CSV.read(joinpath(@__DIR__,"test.csv"), DataFrame ;quotechar='"', escapechar='"', delim=';', ntasks=i)
end
i do this in a for loop to find the crashing ntasks parameter currently it is the 8 but that depends on the inputdata (I would guess)
I am currently using Julia in Version 1.10.7
and the CSV (v0.10.15) and DataFrames (v1.7.0) Package with the SHA1 Hash:
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
The text was updated successfully, but these errors were encountered:
Hey Julia Community,
I am very new to Julia, but what I saw so far is very good. On a Project I read some csv-files with the csv.jl. The
Threads.nthreads()
is set to 60.Some CSV-Files could not be read and the Programm exit with the following error Message:
The data I am using here is generated by the following code:
I tried to generate a csv-file which looks similar to the real world data I am facing. There are a lot more columns in the real world data but that doesn't matter. The Problem is caused by splitting the input-file into several chunks and read them in parallel. Thats a very big advantage of this library and results in a lot of speed when it comes to reading csv files. Simple workaround is to set
ntasks
to one so the file could easily be read and parsed as a DataFrame.on one execution I got the following Message
after I saw this Message I want to share my results of the findings and ask why this fallback method isn't used every time?
My code to read the csv file:
i do this in a for loop to find the crashing
ntasks
parameter currently it is the8
but that depends on the inputdata (I would guess)I am currently using Julia in Version 1.10.7
and the CSV (v0.10.15) and DataFrames (v1.7.0) Package with the SHA1 Hash:
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
The text was updated successfully, but these errors were encountered: