read fallback to ntasks=1 not always working #1157

Crefok · 2025-03-10T11:49:13Z

Hey Julia Community,
I am very new to Julia, but what I saw so far is very good. On a Project I read some csv-files with the csv.jl. The Threads.nthreads() is set to 60.
Some CSV-Files could not be read and the Programm exit with the following error Message:

ERROR: TaskFailedException

    nested task error: thread = 38 fatal error, encountered an invalidly quoted field while parsing around row = 127, col = 1: ""outcviusqgvvejbjwrbumoedfhtdyiorvqueekyfhwzegowxkzomzskinamwxiimajggitwcymyxnjtpuhtbwngpunlwelyfkpfo
    vqosvsysvoqkxgzaepzvrbrbneqpidrcrhgsmglapotilebnkntoecqywbxwaiiticlzbpslhkkyjvujwddoduzmixjpznipcptb
    
    ", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself
    
    Stacktrace:
     [1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
       @ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:596
     [2] parsevalue!(::Type{…}, buf::Vector{…}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
       @ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:804
     [3] parserow
       @ ~/.julia/packages/CSV/XLcqT/src/file.jl:646 [inlined]
     [4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{…}, ::Type{…})
       @ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:556
     [5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{…}, rowchunkguess::Int64, i::Int64, rows::Vector{…}, wholecolumnslock::ReentrantLock)
       @ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:366
     [6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
       @ CSV ~/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:455
 [2] macro expansion
   @ ./task.jl:487 [inlined]
 [3] CSV.File(ctx::CSV.Context, chunking::Bool)
   @ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:240
 [4] File
   @ ~/.julia/packages/CSV/XLcqT/src/file.jl:227 [inlined]
 [5] #File#32
   @ ~/.julia/packages/CSV/XLcqT/src/file.jl:223 [inlined]
 [6] #read#118
   @ ~/.julia/packages/CSV/XLcqT/src/CSV.jl:117 [inlined]
 [7] read
   @ ~/.julia/packages/CSV/XLcqT/src/CSV.jl:113 [inlined]
 [8] top-level scope
   @ ./REPL[208]:3
Some type information was truncated. Use `show(err)` to see complete types.

The data I am using here is generated by the following code:

using Random
using CSV

factor = 100
open(joinpath(@__DIR__, "test.csv"), "w") do file
    write(file, "a;b;c;d\n")
    write(file, randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*"\n")
    for i in 1:1000
        write(file, "\""*randstring('a':'z', 1*factor)*"\n"*randstring('a':'z', 1*factor)*"\n"*"\n"*randstring('a':'z', 1*factor)*";"*randstring('a':'z', 1*factor)*"\""
        *";"*randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*";"*randstring('a':'z', 6*factor)*"\n")
    end
end

I tried to generate a csv-file which looks similar to the real world data I am facing. There are a lot more columns in the real world data but that doesn't matter. The Problem is caused by splitting the input-file into several chunks and read them in parallel. Thats a very big advantage of this library and results in a lot of speed when it comes to reading csv files. Simple workaround is to set ntasks to one so the file could easily be read and parsed as a DataFrame.

on one execution I got the following Message

┌ Error: Multithreaded parsing failed and fell back to single-threaded parsing. This can happen if the input contains multi-line fields; otherwise, please report this issue.
└ @ CSV ~/.julia/packages/CSV/XLcqT/src/file.jl:579

after I saw this Message I want to share my results of the findings and ask why this fallback method isn't used every time?

My code to read the csv file:

for i in 1:60
    println(i)
    CSV.read(joinpath(@__DIR__,"test.csv"), DataFrame ;quotechar='"', escapechar='"', delim=';', ntasks=i)
end

i do this in a for loop to find the crashing ntasks parameter currently it is the 8 but that depends on the inputdata (I would guess)

I am currently using Julia in Version 1.10.7
and the CSV (v0.10.15) and DataFrames (v1.7.0) Package with the SHA1 Hash:
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read fallback to ntasks=1 not always working #1157

read fallback to ntasks=1 not always working #1157

Crefok commented Mar 10, 2025 •

edited

Loading

read fallback to ntasks=1 not always working #1157

read fallback to ntasks=1 not always working #1157

Comments

Crefok commented Mar 10, 2025 • edited Loading

Crefok commented Mar 10, 2025 •

edited

Loading