Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't force global binding resolution in compiled code #11456

Merged
merged 1 commit into from
Jun 12, 2015

Conversation

JeffBezanson
Copy link
Member

Fall back to run time lookup when compiled code needs to access a global that cannot be found yet.
Fixes #7864.

This has performance implications; the run time lookup is very slow. However this probably doesn't happen very often. We only hit this case if you try to run (and therefore compile) code that uses a global that is not available yet, for example:

julia> function f()
         g()
       end
f (generic function with 1 method)

julia> f()
ERROR: UndefVarError: g not defined
 in f at none:2

julia> g() = 0
g (generic function with 1 method)

julia> f()
0

However, g could have been provided by a package, and in that case we used to do something totally wrong. With this PR it works better, but f() will be really slow. I suspect this code pattern mostly happens at the prompt, but I'm not sure. I should load a bunch of packages and instrument how many times this happens. If necessary we can fix the performance by caching the result of jl_get_binding for each such global variable use.

JeffBezanson added a commit that referenced this pull request May 27, 2015
@JeffBezanson
Copy link
Member Author

As I thought, this happens extremely rarely. There were a couple cases very early in bootstrap, and some cases where globals use delayed initialization (i.e. some function contains global x =, and that function runs fairly late). A tiny number of instances occur in code with top level expressions that use isdefined; there we clearly need to generate code for a possibly-undefined variable.

@yuyichao
Copy link
Contributor

What does this mean for statically compiling a package with const global initialized in __init__ ?

@JeffBezanson
Copy link
Member Author

That's a very good point. Currently, __init__ runs right after loading a module, so I believe it will run before compiling. However, it seems like that would cause an error when loading the compiled code, due to reassigning a const. In Base, globals assigned from __init__ are not const.

Non-const globals assigned in __init__ are indeed penalized by this PR. Basically, you have to declare global x at top level as early as possible, to declare your intent to define the variable in the current module (as opposed to importing it from elsewhere).

As soon as we compile __init__, we see the assignments to the globals and can resolve them. The problem is that we might have already compiled other code. We could perhaps try to compile __init__ before everything else; not sure how well that would work.

@yuyichao
Copy link
Contributor

(Actually I think I finally understand what this PR is about. I didn't know there's distinction between global variable and imported values.)

Currently, __init__ runs right after loading a module, so I believe it will run before compiling. However, it seems like that would cause an error when loading the compiled code, due to reassigning a const. In Base, globals assigned from __init__ are not const.

I'm more wondering about sth like ccall only accepting const global dlopen'd handle. And in any case won't const global experience a even bigger performance penalty and there's no way to fix it (since you cannot declare it before defining it....)

As soon as we compile __init__, we see the assignments to the globals and can resolve them.

This is usually true. However, I guess unless you can teach the compilation to understand eval, it will fail to recognize a lot of global definitions. (I don't know how often it is used but I've seen it in Nettle.jl and I would guess it is probably pretty common for package dealing with library that has a dynamic interface)

The problem is that we might have already compiled other code. We could perhaps try to compile __init__ before everything else; not sure how well that would work.

Although not backed by many real world examples, I would still like #11098 to be implemented. IMHO, that would make identifying __init__ harder.

@JeffBezanson
Copy link
Member Author

There is not really a distinction between global variables and imported variables; it's just a matter of knowing where a value is stored.

@JeffBezanson
Copy link
Member Author

If code uses lots of evals to generate its code based on run time info, you can't really statically compile it. What would that mean?

@yuyichao
Copy link
Contributor

Actually one thing I somehow forgot to ask just now. Would it be possible to fix it "from the other direction", i.e. by making using A.b the same with assigning to a global constant b.

@JeffBezanson
Copy link
Member Author

I'm not sure what you mean. If code begins with using A.b, then we know exactly where b comes from and there's no problem.

@yuyichao
Copy link
Contributor

If code uses lots of evals to generate its code based on run time info, you can't really statically compile it. What would that mean?

I'm thinking of sth like this.

f() = g()

__init__() = @eval g() = 1

Of course it would be more complicated in real package but my point is that a global value used by other functions might be assigned with eval in __init__

@yuyichao
Copy link
Contributor

I'm not sure what you mean. If code begins with using A.b, then we know exactly where b comes from and there's no problem.

I was commenting on

f() = g
try
   f()
end
g = 1
f()

works but

f() = g
try
   f()
end
using A.g
f()

doesn't.

Is it possible to fix it by making using A.g and g = 1 have the same effect on global binding?

@JeffBezanson
Copy link
Member Author

Code like that cannot be statically compiled in an efficient way. However, if you don't care about static compilation, there's no problem. As long as you don't try to call f() before __init__ runs, everything is fine. See my example at the top. Problems only happen when you try to actually run code before a module is fully set up.

@yuyichao
Copy link
Contributor

Code like that cannot be statically compiled in an efficient way.

I see.

I guess this would be what the static compilation needs to care about but it feels like such code can be slower if forced to be static compiled.

@JeffBezanson
Copy link
Member Author

Is it possible to fix it by making using A.g and g = 1 have the same effect on global binding?

No, because with g = 1 we are creating a new variable, so we know where it's stored since we allocated it. With using A.g, module A has already allocated space for g, and when f() is compiled we have no idea where that is. Previously, the compilation for f() assumed that g would be a global variable in the current module. There are two ways to solve this:

  1. Add an indirection. That's basically what this PR does, except there's a function call in between that can be removed with a bit more work.
  2. If A.g is a constant, then yes, we could replace using A.g with global const g = A.g. However that only works for constants.

@yuyichao
Copy link
Contributor

To be honest, this is the first time I noticed that julia's import/using is different from python's with respect to binding....

In [1]: import imp, sys

In [2]: A = imp.new_module('A')

In [3]: exec("a = 1", A.__dict__)

In [4]: A.a
Out[4]: 1

In [5]: sys.modules["A"] = A

In [6]: from A import a

In [7]: a
Out[7]: 1

In [8]: exec("a = 2", A.__dict__)

In [9]: A.a
Out[9]: 2

In [10]: a
Out[10]: 1
julia> module A
       a = 1
       end

julia> A.a
1

julia> using A.a

julia> a
1

julia> A.eval(:(a = 2))
2

julia> A.a
2

julia> a
2

I'm not sure which one is better but I am a little surprised.... Is there any code/feature that relies on this behavior?

@JeffBezanson
Copy link
Member Author

Oh, that is interesting. Our behavior is strictly more expressive. This is how bindings are supposed to behave; for example think of symbols in shared libraries, where one library owns a location and many other libraries can reference it. If you want a copy, you can always easily get one with global a = A.a.

@yuyichao
Copy link
Contributor

If you want a copy, you can always easily get one with global a = A.a.

OTOH, binding behavior can always be achieved with using A.a instead of a. I think I would agree the julia behavior is slightly more expressive/compact since if a is used multiple times, a would be shorter than A.a.

I brought up the question about what is relying on this behavior because python seems to be fine with this behavior and it can possibly also solve the issue addressed by this PR without performance penalty.

@JeffBezanson
Copy link
Member Author

Indeed, non-const globals are themselves pretty rare. One example is Base.STDOUT; that's a non-const global that other code would want to import. However I think it is only assigned once at startup. It's quite possible all globals should be const, and mutable globals could use Ref cells. That would also give you the effect of declaring types. But I'm still not sure how to make write-once variables in __init__ fully const.

@JeffBezanson JeffBezanson force-pushed the jb/lazybindings branch 2 times, most recently from 90c7425 to 3029070 Compare June 9, 2015 16:21
@StefanKarpinski
Copy link
Member

For what it's worth, I find the Python behavior less surprising. I would expect importing a name to be a shorthand for assigning the name in the current module rather than what we're currently doing, which is making the name a shorthand for the fully qualified version. Of course, our using construct can't actually be implemented by doing assignment at using time because of this:

module Foo
    export bar
    bar = 1
end

module Baz
    using Foo
    get_bar() = bar
    function set_bar(val)
        global bar = val
    end
end

Depending on whether Baz.set_bar is called or not, Baz.bar can either mean Foo.bar or be an entirely unrelated variable. This is convenient for making large namespaces easily available without any fuss, but I'm not sure how I feel about this – it seems so very dynamic.

@yuyichao
Copy link
Contributor

Won't that give a override imported name warning if you call set_bar?

@yuyichao
Copy link
Contributor

OK the warning only shows up when get_bar() is called first.

IMHO, this is actually an argument for the python behavior since it is a little surprising... If we make using always creating global assignement than set_bar will always do what you expect.

@StefanKarpinski
Copy link
Member

Two different Julia sessions:

julia> module Foo
           export bar
           bar = 1
       end

julia> module Baz
           using Foo
           get_bar() = bar
           function set_bar(val)
               global bar = val
           end
       end

julia> Baz.get_bar()
1

In this session Baz.bar means Foo.bar. New session:

julia> module Foo
           export bar
           bar = 1
       end

julia> module Baz
           using Foo
           get_bar() = bar
           function set_bar(val)
               global bar = val
           end
       end

julia> Baz.set_bar(:qux)
:qux

julia> Baz.get_bar()
:qux

In this session Foo.bar and Baz.bar are unrelated globals.

@yuyichao
Copy link
Contributor

In this session Baz.bar means Foo.bar. New session:

In this session Foo.bar and Baz.bar are unrelated globals.

And this is the "surprise" I'm talking about.

@StefanKarpinski
Copy link
Member

Right, but in that case everyone who has complained that exports from Base "steal" names will actually be correct: any name exported by any module that you use will be unavailable, and in particular, since non-bare modules have an implict using Base, every name exported from Base will be unavailable basically everywhere. That's not good.

@yuyichao
Copy link
Contributor

Ah. Right. I guess python doesn't have this issue because everything are wrapped in modules. (which, IMO, is not a bad thing to do and, if I remember correctly, has been brought up before.)

@StefanKarpinski
Copy link
Member

A compromise approach would be that Foo.bar and Baz.bar are always unrelated variables, but using Foo causes global Baz.bar = Foo.bar to happen as soon as Foo.bar is used if it hasn't otherwise been assigned. It would change the behavior to this:

julia> module A
            a = 1
       end

julia> A.a
1

julia> using A.a

julia> a # causes `global a = A.a` to happen right now
1

julia> A.eval(:(a = 2))
2

julia> A.a
2

julia> a # `Main.a` is not the same as `A.a` they just referred to the same value
1

@JeffBezanson, would that make this issue any easier?

@yuyichao
Copy link
Contributor

I like this schematics although it's beyond my current knowledge to tell how easy it is to get everything right...

@StefanKarpinski
Copy link
Member

Talked with @JeffBezanson on the phone about this. He convinced me that the way that using and import do binding resolution is useful and should stay as is. One example is things like STDIN, STDOUT and STDERR – Base exports these and they get used by other modules, but Base can change them and have that be reflected everywhere. This is quite useful.

However, this example is still disturbing. We decided that instead of resolving the binding of bar when set_foo is called it should be resolve as soon as set_foo is defined. Currently calling get_foo and then calling set_foo causes a warning; instead, you would get a warning from calling get_fooand then defining set_foo. In other words, the first of the above example sessions would have this behavior instead:

julia> module Foo
           export bar
           bar = 1
       end

julia> module Baz
           using Foo
           get_bar() = bar
           function set_bar(val)
               global bar = val
           end
       end

julia> Baz.get_bar()
ERROR: UndefVarError: bar not defined
 in get_bar at none:3

In particular, the behavior of Baz doesn't depend on whether Foo exports something named bar or not – the fact that Baz has code that assigns to bar causes bar to be resolved as a new global rather than an imported binding from Foo.

@yuyichao
Copy link
Contributor

I guess that still doesn't help with using eval to define globals in functions at runtime. e.g.

function __init__()
    @eval get_some_const() = $(ccall(:get_some_runtime_const, Cint, ()))
end

But I've kind of convinced myself that those can be workaround (e.g. by defining a const global instead).

JeffBezanson added a commit that referenced this pull request Jun 12, 2015
don't force global binding resolution in compiled code
@JeffBezanson JeffBezanson merged commit c224c6b into master Jun 12, 2015
@tkelman tkelman deleted the jb/lazybindings branch June 13, 2015 00:20
@crayxt
Copy link
Contributor

crayxt commented Jun 13, 2015

Building against LLVM-SVN

    CC src/codegen.dbg.obj
codegen.cpp: In function ‘llvm::Value* global_binding_pointer(jl_module_t*, jl_sym_t*, jl_binding_t**, bool, jl_codectx_t*)’:
codegen.cpp:2748:35: error: ‘class llvm::IRBuilder<>’ has no member named ‘CreateCall2’
             Value *bval = builder.CreateCall2(prepare_call(jlgetbindingorerror_func),
                                   ^
make[1]: *** [codegen.dbg.obj] Error 1
make: *** [julia-src-debug] Error 2

@yuyichao
Copy link
Contributor

yuyichao commented Jul 4, 2015

@StefanKarpinski @JeffBezanson Sorry to bring this up again but this just come into my mind.

So the argument of keeping the current binding behavior is that we can override sth like STDOUT and make it take effect everywhere. I expect this behavior is not used in many places so how bad it would be to make these object a wrapper / ref of the real one and just mutate them?

i.e. instead of writing STDOUT = <new stream>, do STDOUT[] = <new stream> instead.

@JeffBezanson
Copy link
Member Author

To implement #964 we will probably do something like that. A global will always be bound to a specific object. A mutable global will be bound to a Ref{T}, and we could even consider hiding the explicit loads and mutations (so you don't need []).

@yuyichao
Copy link
Contributor

yuyichao commented Jul 4, 2015

Yeah, @vtjnash just reminds me (#12010 (comment)) that you've already said this although apparently I didn't think about it this way because of the context....

fcard pushed a commit to fcard/julia that referenced this pull request Jul 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Attempting to use an undefined top-level identifier should not create a new binding
4 participants