Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mod_ping hanging at cancel_timer #4358

Open
mzealey opened this issue Mar 10, 2025 · 0 comments
Open

mod_ping hanging at cancel_timer #4358

mzealey opened this issue Mar 10, 2025 · 0 comments

Comments

@mzealey
Copy link
Contributor

mzealey commented Mar 10, 2025

Environment

  • ejabberd version: Version based off aa369de with some custom patches, but including 6c691a7
  • Erlang version: Erlang (SMP,ASYNC_THREADS) (BEAM) emulator version 14.1
  • OS: Docker (FROM erlang:26.1.1.0)
  • Installed from: source + some custom patches

Configuration (only if needed): grep -Ev '^$|^\s*#' ejabberd.yml

        mod_ping:
            send_pings: true
            ping_interval: 200s
            ping_ack_timeout: 30s
            timeout_action: kill

Errors from error.log/crash.log

No errors

Bug description

We periodically see mod_ping hanging on our servers:

> p1_prof:q(2).
** pid: pid(0,197026501,26)
** message_queue_len: 38668113
** status: running
** memory: 15917799680
** reductions: 442638291450
** current_function: {misc,cancel_timer,1}
** registered_name: 'mod_ping_binu.m.in-app.io'

** pid: pid(0,180143752,15)
** message_queue_len: 422
** status: waiting
** memory: 2582968
** reductions: 760042
** current_function: {ejabberd_http_ws,route_text,2}
** registered_name: []
** $initial_call: {ejabberd_http_ws,init,1}
** $ancestors: [<0.223530354.15>,ejabberd_http_sup,ejabberd_sup,<0.126.0>]

After a couple of days the queue len gets too long and the process is restarted.

Digging into this a bit more I see:

> recon:info({0,197026501,26}).
[{meta,[{registered_name,'mod_ping_binu.m.in-app.io'},
        {dictionary,[{'$initial_call',{mod_ping,init,1}},
                     {'$ancestors',[ejabberd_gen_mod_sup,ejabberd_sup,
                                    <0.126.0>]}]},
        {group_leader,<0.125.0>},
        {status,running}]},
 {signals,[{links,[<0.392.0>]},
           {monitors,[]},
           {monitored_by,[]},
           {trap_exit,true}]},
 {location,[{initial_call,{proc_lib,init_p,5}},
            {current_stacktrace,[{misc,cancel_timer,1,
                                       [{file,"/opt/ejabberd/src/misc.erl"},{line,455}]},
                                 {mod_ping,add_timer,3,
                                           [{file,"/opt/ejabberd/src/mod_ping.erl"},{line,271}]},
                                 {mod_ping,handle_info,2,
                                           [{file,"/opt/ejabberd/src/mod_ping.erl"},{line,171}]},
                                 {gen_server,try_handle_info,3,
                                             [{file,"gen_server.erl"},{line,1077}]},
                                 {gen_server,handle_msg,6,
                                             [{file,"gen_server.erl"},{line,1165}]},
                                 {proc_lib,init_p_do_apply,3,
                                           [{file,"proc_lib.erl"},{line,241}]}]}]},
 {memory_used,[{memory,16138271976},
               {message_queue_len,39238980},
               {heap_size,1020295867},
               {total_heap_size,1585655104},
               {garbage_collection,[{max_heap_size,#{error_logger => true,include_shared_binaries => false,
                                                     kill => true,size => 0}},
                                    {min_bin_vheap_size,46422},
                                    {min_heap_size,233},
                                    {fullsweep_after,10},
                                    {minor_gcs,0}]}]},
 {work,[{reductions,445708587969}]}]

This implies that erlang:cancel_timer is hanging for some reason.

I see in erlang:cancel_timer that there is an option to do an async cancel. Without knowing what the issue is (presumably not erlang/otp#5359 although it does sound like what we're seeing) is it possible/sensible to change misc:cancel_timer or at least the version used by mod_ping so that the cancellation happens async and even if one or two cancels hang for a long time for some reason it doesn't break the main mod_ping process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant