Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nest running into infinite loop (?) #403

Closed
felix-schneider opened this issue Jun 20, 2016 · 11 comments · Fixed by #1481
Closed

Nest running into infinite loop (?) #403

felix-schneider opened this issue Jun 20, 2016 · 11 comments · Fixed by #1481
Assignees
Labels
I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: Normal Handle this with default priority T: Bug Wrong statements in the code or documentation ZC: Model DO NOT USE THIS LABEL ZP: PR Created DO NOT USE THIS LABEL

Comments

@felix-schneider
Copy link

I'm encountering an error when running trying to run my network. Nest runs into an infinite loop, using 100% cpu of one core but making no progress. I'm using PyNN, but debugging showed that the loop occurs in nest. I have encountered this behaviour multiple times, it is difficult to predict and highly dependent on the neuron parameters. In a similar example, the error did not occur immediately, but only 70 out of 100 seconds into the simulation.
I apologise for the large example, it is the smallest example that was able to reliably produce the error. For understanding, I am trying to reproduce behaviour of NEF with Nest.
I am using PyNN 0.8.1 from git and Nest 2.10.0 from the website, though I have also encountered a similar error with PyNN 0.8.0 and Nest 2.8.0 as well as PyNN 0.7.5 and Nest 2.2.2.

What I find particularly confusing is that I have a different script that produces the exact same network, with the same components and same parameters which does not produce the error. I have attached it as well. Invoke it with python save_ensemble.py <random-seed>. The seed I used was 11245.

# (not so) mwe.py
# produces infinite loop during the first simulation run
import numpy as np
import pyNN.nest as sim

sim.setup(timestep=1.0, min_delay=1.0)

n_neurons = 40
n_steps = 100

sim_time = 1.0

T_REF = 0.001
T_RC = 0.02
V_THRESH = 20.0
V_REST = -70.0
TAU_SYN = 0.001

RATE_LOW = 25
RATE_HIGH = 75

ratio = 1.0

func = lambda x: x * x

x_values = np.linspace(-1, 1, n_steps)
func_values = func(x_values)

def calc_tuning_curves(encoders, gain, bias):
    x_samples = np.linspace(-1, 1, n_steps)
    tuning_curves = np.zeros((n_steps, n_neurons))
    for i in range(n_steps):
        J = gain * np.dot(x_samples[i], encoders) + bias
        j = J - V_THRESH
        tuning_curves[i, j > 0] = 1.0 / (T_REF + T_RC * np.log1p(V_THRESH / j[j > 0]))
    return tuning_curves.astype(np.uint16)

def calc_gain_bias(intercepts, max_rates):
    x = V_THRESH / (1 - np.exp((T_REF - (1.0 / max_rates)) / T_RC))
    gain = (V_THRESH - x) / (intercepts - 1)
    bias = V_THRESH - gain * intercepts
    return gain, bias

def generate_ensemble(seed):
    np.random.seed(seed)
    encoders = np.random.choice([-1,1], n_neurons)
    intercepts = np.random.uniform(-1, 1, n_neurons)
    max_rates = np.random.uniform(RATE_LOW, RATE_HIGH, n_neurons)
    gain, bias = calc_gain_bias(intercepts, max_rates)
    tuning_curves = calc_tuning_curves(encoders, gain, bias)
    return encoders, intercepts, max_rates, tuning_curves

encoders_A, intercepts_A, max_rates_A, tuning_curves_A = generate_ensemble(11245)

gain, bias = calc_gain_bias(intercepts_A, max_rates_A)

decoders_A = np.linalg.lstsq(tuning_curves_A, x_values)[0]

cellparams = {
    'tau_refrac':T_REF * 1000.0,
    'cm':T_RC / gain,
    'v_rest':V_REST,
    'tau_syn_E':TAU_SYN * 1000.0,
    'tau_syn_I':TAU_SYN * 1000.0,
    'tau_m':T_RC * 1000.0,
    'i_offset':bias / gain,
    'v_thresh':V_REST + V_THRESH,
    'v_reset':V_REST
}

A = sim.Population(n_neurons, sim.IF_curr_exp(**cellparams))
A.initialize(v=V_REST)
A.record('spikes')

input_source = sim.Population(2, sim.SpikeSourcePoisson())
weights = np.array([[1.0/ratio] * n_neurons * encoders_A] * 2)
weights[1] *= -1
sim.Projection(input_source, A, sim.AllToAllConnector(),
    sim.StaticSynapse(weight=weights))

decoders = np.linalg.lstsq(tuning_curves_A, func_values)[0]

spikes_A = np.zeros((n_steps, n_neurons))

print "Starting simulation..."

for i in range(n_steps):
    rates = [0.0, 0.0]
    if x_values[i] < 0:
        rates[1] = -x_values[i] / TAU_SYN * ratio
    else:
        rates[0] = x_values[i] / TAU_SYN * ratio
    input_source.set(rate=rates)

    sim.run(sim_time * 1000.0)
    data = A.get_data('spikes', clear=True).segments[0]
    spikes_A[i] = [len(train) / sim_time for train in data.spiketrains]

    print "Step {} complete.".format(i+1)

sim.end()

A_values = np.dot(spikes_A, decoders_A)

import matplotlib.pyplot as plt
plt.figure()
plt.plot(x_values, A_values, label="A")
plt.plot(x_values, x_values, 'k-')
plt.plot(x_values, func_values, 'k-')
plt.legend()
plt.show()
# save_ensemble.py
# does not produce infinite loop error when invoked with seed 11245
import numpy as np
import pyNN.nest as sim
import sys

seed = int(sys.argv[1])

sim.setup(timestep=1.0, min_delay=1.0)
np.random.seed(seed)

n_neurons = 40
n_steps = 100

sim_time = 1.0

T_REF = 0.002
T_RC = 0.02
V_THRESH = 20.0
V_REST = -70.0
TAU_SYN = 0.001

RATE_LOW = 25
RATE_HIGH = 75

ratio = 1.0

x_values = np.linspace(-1, 1, n_steps)

encoders = np.random.choice([-1,1], n_neurons)
intercepts = np.random.uniform(-1, 1, n_neurons)
max_rates = np.random.uniform(RATE_LOW, RATE_HIGH, n_neurons)# * 1000.0/TAU_SYN

x = V_THRESH / (1 - np.exp((T_REF - (1.0 / max_rates)) / T_RC))
gain = (V_THRESH - x) / (intercepts - 1)
bias = V_THRESH - gain * intercepts

cellparams = {
    'tau_refrac':T_REF * 1000.0,
    'cm':T_RC * 1000.0 / gain,
    'v_rest':V_REST,
    'tau_syn_E':TAU_SYN * 1000.0,
    'tau_syn_I':TAU_SYN * 1000.0,
    'tau_m':T_RC * 1000.0,
    'i_offset':bias / gain,
    'v_thresh':V_REST + V_THRESH,
    'v_reset':V_REST
}

pop = sim.Population(n_neurons, sim.IF_curr_exp(**cellparams))
pop.initialize(v=V_REST)
pop.record('spikes')

input_source = sim.Population(2, sim.SpikeSourcePoisson())
weights = np.array([[1.0/ratio] * n_neurons * encoders] * 2)
weights[1] *= -1
sim.Projection(input_source, pop, sim.AllToAllConnector(),
    sim.StaticSynapse(weight=weights))

spikes = np.zeros((n_steps, n_neurons))

print "Network created, simulating..."

for i in range(n_steps):
    rates = [0.0, 0.0]
    if x_values[i] < 0:
        rates[1] = -x_values[i] / TAU_SYN * ratio
    else:
        rates[0] = x_values[i] / TAU_SYN * ratio
    input_source.set(rate=rates)

    sim.run(sim_time * 1000.0)
    data = pop.get_data('spikes', clear=True).segments[0]
    spikes[i] = [len(train) / sim_time for train in data.spiketrains]

sim.end()

print "Simulated. Saving..."

with open('ensemble_{}.npz'.format(seed), 'w') as dump:
    np.savez(dump, encoders=encoders, intercepts=intercepts,
        max_rates=max_rates, spikes=spikes)

Again, I apologise for posting a lot of code, it was really quite tricky to reproduce this error.

@heplesser
Copy link
Contributor

@Scaatis I can reproduce you infinite loop. One difference I noticed between your two scripts is

cellparams = {
    ... 
    'cm': T_RC / gain,

vs

cellparams = {
    ... 
    'cm': T_RC * 1000.0 / gain,

The first leads to cm around 10^-3 and to an infinite loop, while the second does not. Multiplying by 1000.0 also in the first scripts makes that script work, too. When configuring the neuron in NEST, PyNN multiplies cm by a factor 1000 internally, so we get in NEST

  • C_m == 1.083 pF in the first case (infinite loop)
  • C_m == 1083 pF in the second case (code works)

The default value for C_m for the iaf_psc_exp_ps model is 250 pF. In the first case, we thus have a very small membrane capacitance. Simulating with on-grid spikes by

sim.setup(timestep=1.0, min_delay=1.0, spike_precision='on_grid')

NEST will use iaf_psc_exp instead of the precise-timing version. Then, simulations in the first case go through and reveal firing rates of 500 spikes/second, i.e., the neuron fires immediately at the end of each refractory period.

I think that this leads to a deadlock in the precise case. One will need to look at the details of iaf_psc_exp_ps::update()to see precisely where NEST gets stuck.

@heplesser heplesser added the T: Bug Wrong statements in the code or documentation label Jun 20, 2016
@krishnanj
Copy link
Contributor

@Scaatis @heplesser : I can reproduce the error and I am looking at the update function now. Will keep you posted!

@krishnanj
Copy link
Contributor

iaf_psc_exp_ps::update() calls the emit_spike_ function whenever the voltage is above threshold. emit_spike_ function calculates spike time by bisection method iaf_psc_exp_ps::bisectioning_ and the infinite loop appears to happen in the following while condition (for some values of voltage before propagation V_.y2_before_):

while ( fabs( P_.U_th_ - y2_root ) > 1e-14 )
  {
    if ( y2_root > P_.U_th_ )
      root -= dt / div;
    else
      root += dt / div;
....
....
  }

If we were not to take the absolute value here:

  while ( ( P_.U_th_ - y2_root ) > 1e-14 )
  {
    if ( y2_root > P_.U_th_ )
      root -= dt / div;
    else
      root += dt / div;

    div *= 2.0;
    ....
    ....
  }

the simulation is through in both cases. I haven't systematically tried to understand why though, I will update this thread again.

@heplesser
Copy link
Contributor

@krishnanj Thanks for your detective work! Dropping the absolute value will make the code pass because the while loop will terminate as soon as the left-hand side of the comparison becomes negative. So the fabs() needs to stay.

I assume that we get stuck in an infinite loop as div grows and dt/div becomes so small that root does not change any more. If we add as additional condition and dt / div > 0, we will avoid the infinite loop.

The question is what to do precisely in this case: Throw an exception or just accept the root found. To decide this, we need to know more about when this problem occurs. Could you analyze this either with a debugger or by adding some output statements to the code? To avoid too much output it would maybe suffice for the start to print values only if not ( dt / div > 0 ) after the loop.

@krishnanj
Copy link
Contributor

@heplesser Bisection method takes the voltage value before propagation (which is below threshold) and step dt and calculates spike time iteratively. Generally the voltage values y2_before (=y2_root) are positive and typically close to threshold (since its going to spike in a few ms). In this case it seems that in one of the iterations the voltage takes a completely off negative value and I don't understand why.

Just to be sure, I tried to replicate this in Python:

sim_params = [-93.1618,1, 0, 0, -1473.68, 2.22734, 1882.04] 

def check(y2, dt, y0, y1_ex, y1_in, c_m, i_e):
        root = 0.0
    y2_root = y2
    div = 2.
    while (abs(V_th - y2_root) > 1e-14):
        if (y2_root > V_th):
            root -= dt/div
        else:
            root += dt/div
        div *= 2.0
        expm1_tau_ex = np.exp(-root/tau_ex)-1
        expm1_tau_in = np.exp(-root/tau_in)-1
        expm1_tau_m = np.exp(-root/ tau_m)-1
        P20 = -tau_m / c_m * expm1_tau_m
        P21_ex = -tau_m * tau_ex / (tau_m - tau_ex)/ c_m * (expm1_tau_ex - expm1_tau_m)
        P21_in = -tau_m * tau_in / (tau_m - tau_in)/ c_m *(expm1_tau_in - expm1_tau_m)
        y2_root = P20 * (i_e + y0 ) + P21_ex * y1_ex + P21_in * y1_in + expm1_tau_m * y2 + y2
    return root

root = check(*sim_params)
print root

and it gets stuck since I suppose y2_root < threshold always.

@heplesser
Copy link
Contributor

@krishnanj Thanks for the further analysis! I will have a look at it shortly.

@heplesser
Copy link
Contributor

@krishnanj I ran your script with a bit of debugging output and it indeed shows that at some point div becomes so large that dt/div becomes zero. The iteration is then stuck. To avoid this, we should change the loop as follows:

  double_t t_step = dt / 2.0;

  while ( t_step > 0.0 and fabs( P_.U_th_ - y2_root ) > 1e-14 )
  {
    if ( y2_root > P_.U_th_ )
      root -= t_step;
    else
      root += t_step;

    t_step /= 2.0;

The issue then is how to handle cases in which we leave the loop because we have not converged. I suggest to raise an exception:

if ( fabs( P_.U_th_ - y2_root ) > 1e-14 )
{
  throw NumericalInstability("iaf_psc_exp_ps::bisectioning_() did not converge.");
}

The other question is why this occurs. If y2_before_ is about -90 mV, then the distance from the pre-threshold-crossing membrane potential to threshold is probably simply so large that one cannot hit the threshold to within 1e-14 due to unavoidable numerical limitations. This is the mathematical explanation.

The question now is how the membrane potential could be raised from -90 mV to a superthreshold value (so by about 100 mV) by a single input spike. This seems patently unphysiological. I find the second voltage trace you posted strange: it contains only zeros and occasionally large negative values. Could you explore this a bit more and record the input that goes into the neuron? For this you probably need to use a parrot_neuron_ps as intermediary, since you cannot record spikes directly from a generator.

@krishnanj
Copy link
Contributor

@heplesser Thank you! I will look into this and update this thread again.

@krishnanj
Copy link
Contributor

@heplesser may I ask if you can point me to a reference for how to create parrot neurons in PyNN? I couldn't find it in the documentation.

@heplesser
Copy link
Contributor

@krishnanj Unfortunately, I don't know. Could you ask on a PyNN or on the NEST User mailing list? I think it would be best to create a reproducer for this bug using PyNEST, or, even better in SLI, so that we later can integrate it as regression test into our testsuite.

@krishnanj
Copy link
Contributor

@heplesser Here is a replica of the same script in PyNEST that generates the same error:

import numpy as np
import nest
nest.set_verbosity("M_WARNING")
nest.SetKernelStatus({"overwrite_files": True})


nrns = 40
nsteps = 100
simtime = 1.0
tref = 0.001
tm = 0.02
vt = 20.0
vrest = -70.0
tsyn = 0.001
rate_low = 25
rate_high = 75
ratio = 1.0

func = lambda x: x * x

x_values = np.linspace(-1, 1, nsteps)
func_values = func(x_values)

def calc_tuning_curves(encoders, gain, bias):
    x_samples = np.linspace(-1, 1, nsteps)
    tuning_curves = np.zeros((nsteps, nrns))
    for i in range(nsteps):
        J = gain * np.dot(x_samples[i], encoders) + bias
        j = J - vt
        tuning_curves[i, j > 0] = 1.0 / (tref +tm * np.log1p(vt / j[j > 0]))
    return tuning_curves.astype(np.uint16)

def calc_gain_bias(intercepts, max_rates):
    x = vt / (1 - np.exp((tref- (1.0 / max_rates)) / tm))
    gain = (vt - x) / (intercepts - 1)
    bias = vt - gain * intercepts
    return gain, bias

def generate_ensemble(seed):
    np.random.seed(seed)
    encoders = np.random.choice([-1,1], nrns)
    intercepts = np.random.uniform(-1, 1, nrns)
    max_rates = np.random.uniform(rate_low, rate_high, nrns)
    gain, bias = calc_gain_bias(intercepts, max_rates)
    tuning_curves = calc_tuning_curves(encoders, gain, bias)
    return encoders, intercepts, max_rates, tuning_curves

encoders_A, intercepts_A, max_rates_A, tuning_curves_A = generate_ensemble(11245)

gain, bias = calc_gain_bias(intercepts_A, max_rates_A)

decoders_A = np.linalg.lstsq(tuning_curves_A, x_values)[0]

p = {
    't_ref':tref * 1000.0,
    'V_min':vrest,
    'tau_syn_ex':tsyn * 1000.0,
    'tau_syn_in':tsyn * 1000.0,
    'tau_m':tm * 1000.0,
    'V_th':vrest + vt,
    'V_reset':vrest
}

model = 'iaf_psc_exp_ps'
A = nest.Create(model, n = nrns,  params = p )

for i in A:
    nest.SetStatus([i,], {'C_m':tm / gain[i-1],'I_e':bias[i-1] / gain[i-1] })

sd = nest.Create('spike_detector', 1)
nest.SetStatus(sd, {"to_file":True})
input_source = nest.Create("poisson_generator_ps", 2) 
parrot = nest.Create('parrot_neuron_ps', 2)
nest.Connect(input_source, parrot, {'rule':'one_to_one'})
nest.Connect(parrot, A, syn_spec={ "delay":5.0})
nest.Connect(parrot, sd, syn_spec={ "delay":1.0})
nest.SetStatus(sd, {'to_screen':True})

decoders = np.linalg.lstsq(tuning_curves_A, func_values)[0]

spikes_A = np.zeros((nsteps, nrns))

print "Starting simulation..."

for i in range(nsteps):
    rates = [0.0, 0.0]
    if x_values[i] < 0:
        rates[1] = -x_values[i] / tsyn* ratio
    else:
        rates[0] = x_values[i] / tsyn * ratio
        nest.SetStatus(input_source, [{"rate": rates[0]}, {"rate": rates[1]}])

A_values = np.dot(spikes_A, decoders_A)


nest.Simulate(simtime * 1000.0)
ev = nest.GetStatus(sd)[0]['events']['times']
print ev

Output:
44  2.27171 
44  4.45477 
44  5.78683 
44  5.95528 

and irrespective of how weights are set it appears that the arrival of first event causes the infinite loop. Initialising seed to some other random value in the program or during run-time only delays this occurrence but seems to hit unnatural voltage values at some point.

@heplesser heplesser added ZC: Model DO NOT USE THIS LABEL I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) ZP: In progess DO NOT USE THIS LABEL S: Normal Handle this with default priority labels Nov 17, 2016
@heplesser heplesser self-assigned this Apr 3, 2017
@heplesser heplesser assigned clinssen and unassigned heplesser Sep 13, 2018
@heplesser heplesser added ZP: PR Created DO NOT USE THIS LABEL and removed ZP: In progess DO NOT USE THIS LABEL labels Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: Normal Handle this with default priority T: Bug Wrong statements in the code or documentation ZC: Model DO NOT USE THIS LABEL ZP: PR Created DO NOT USE THIS LABEL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants