Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded loops can return incorrect output in 1.8.5 #49707

Closed
nick-cao opened this issue May 9, 2023 · 2 comments
Closed

Multithreaded loops can return incorrect output in 1.8.5 #49707

nick-cao opened this issue May 9, 2023 · 2 comments

Comments

@nick-cao
Copy link

nick-cao commented May 9, 2023

I have written a function to implement an iteration algorithm. When using the Threads.@threads macro to multithread a loop, the algorithm returns incorrect output. The MWE is as follows. I see no race conditions in the threaded loop, and moving Threads.@threads to the inner loop does not fix the issue.

function foo2(; W_uT, W_eT, T=200, tolerance = 1e-5)
    α = 1/2; b = 0.7; B = 0.46; η = 0.5; κ = 1.0; λ̲_eu = 0.028; Q = 0.95
    x_grid = 0.8:0.1:2
    z_grid = 0.8:0.1:1.4
    N_x = length(x_grid); N_z = length(z_grid)
    
    f(x) = x^(1-α) / α^α * (1 - (1-α)*x )^α
    
    # initialize matrices for VFI
    W_u = zeros(N_z,T); W_e0 = zeros(N_x,N_z,T); W_e1 = zeros(N_x,N_z,T)
    W_u[:,T] = W_uT;    W_e0[:,:,T] = W_eT
    
    # ordering: 1=θ, 2=x̂, 3=λ_eu
    policy_u = zeros(N_z,T,2)
    policy_e = zeros(N_x,N_z,T,3)
    
    distance = Inf
    t = T
    
    while (t > 1 && distance > tolerance)
        t = t-1
        
        Threads.@threads for z_j in eachindex(z_grid)
            x̂=1
            
            # UNEMPLOYED
            policy_u[z_j,t,2] = x̂
            gain_u = max(f(x̂) - b + Q*(W_e0[3,z_j,t+1] - W_u[z_j,t+1]), 0)
            θ_u = ((1-η)*B/κ * gain_u)^(1/η)
            policy_u[z_j,t,1] = θ_u
            W_u[z_j,t] = -κ*θ_u + b + Q*W_u[z_j,t+1] + B*θ_u^(1-η) * gain_u
            
            # EMPLOYED
            for x_j in eachindex(x_grid)
                x = x_grid[x_j]
                policy_e[x_j,z_j,t,2] = x̂
                
                gain_e = max(f(x̂) - f(x) + Q*(W_e0[3,z_j,t+1] - W_e0[x_j,z_j,t+1]), 0)
                # gain_e = 0
                θ_e = ((1-η)*B/κ * gain_e)^(1/η)
                policy_e[x_j,z_j,t,1] = θ_e
                W_e1[x_j,z_j,t] = -κ*θ_e + f(x) + 
                    Q*W_e0[3,z_j,t+1] + B*θ_e^(1-η) * gain_e
                
                λ_eu = W_e1[x_j,z_j,t] ≥ W_u[z_j,t] ? λ̲_eu : 1
                policy_e[x_j,z_j,t,3] = λ_eu
                
                W_e0[x_j,z_j,t] = (1-λ_eu) * W_e1[x_j,z_j,t] + 
                    λ_eu * W_u[z_j,t]
            end
        end
        
        d_u = abs.(W_u[:,t] - W_u[:,t+1])
        d_e0 = abs.(W_e0[:,:,t] - W_e0[:,:,t+1])
        d_e1 = abs.(W_e1[:,:,t] - W_e1[:,:,t+1])
        distance = max(maximum(d_u), maximum(d_e0), maximum(d_e1))
        
        if (T-t) % 20 == 0
            println("At iteration " * string(T-t) * ", distance is " * string(distance))
        end
    end
    
    return W_u[:,t]
    
    # problem code block    
    θ_u  = policy_u[:,:,1]
    x̂    = policy_u[:,:,2]
    θ_e  = policy_e[:,:,:,1]
    λ_eu = policy_e[:,:,:,3]
end

test1 = foo2(W_uT=17*ones(7), W_eT=19*ones(13,7), T=201)

The expected output is as follows:

At iteration 20, distance is 0.002608784925385521
At iteration 40, distance is 0.0008213945195656436
At iteration 60, distance is 0.0002941337208604011
At iteration 80, distance is 0.00010544187161087848
At iteration 100, distance is 3.779942396064939e-5
At iteration 120, distance is 1.3550561359210178e-5
7-element Vector{Float64}:
 17.48593508593614
 17.48593508593614
 17.48593508593614
 17.48593508593614
 17.48593508593614
 17.48593508593614
 17.48593508593614

I obtain the expected output when I remove the Threads.@threads macro in front of the outer loop (line 23). More surprisingly, I also obtain the expected output when I comment out the final block of code inside the function foo2 (lines 66–69, marked as # problem code block), which shows up after the return statement and does not modify the returned object W_u! This is strange, and makes me more confident that there is a bug here, and not user error. Similarly, no error occurs if the code is run directly in the REPL, instead of inside the function foo2.

Additionally, the algorithm also converges if one replaces gain_e on line 39 with gain_e = 0, although this would implement a different algorithm.

The output returned by the above code on my machine (2021 M1 Pro MacBook Pro) varies across iterations. One representative run is as follows:

At iteration 20, distance is 0.3208000710708525
At iteration 40, distance is 0.28686406648645146
At iteration 60, distance is 0.19684900643048664
At iteration 80, distance is 0.2881569510685438
At iteration 100, distance is 0.21668109218493115
At iteration 120, distance is 0.38061458718242136
At iteration 140, distance is 0.27014045956968147
At iteration 160, distance is 0.336477405879652
At iteration 180, distance is 0.20106967087047067
At iteration 200, distance is 0.3086270766122645
7-element Vector{Float64}:
 17.48320356738303
 17.33112421684518
 17.428034702494518
 17.43550357581563
 17.350460971065
 17.47269116279718
 17.360508547114623

The error seems to be that the algorithm starts to jump around at certain points, when multithreading is enabled. The following plot plots W_u[1,:].

multithreaded

With the @threads macro removed, W_u[1,:] converges smoothly to the solution, whereupon the algorithm terminates. The following plot plots W_u[1,80:end].

singlethreaded

My versioninfo() is as follows:

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 8 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 6 on 6 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 6
@KristofferC
Copy link
Member

Dup of #14948?

@nick-cao
Copy link
Author

nick-cao commented May 9, 2023

I just declared θ_u, , θ_e, and λ_eu to be local inside the loop, and it fixes the issue. I'm inclined to say that it is a duplicate of #14948. Thanks so much! Should I close the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants