Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tfjs.node.decodeImage run error in workthread #7064

Open
mccoysc opened this issue Nov 20, 2022 · 23 comments
Open

tfjs.node.decodeImage run error in workthread #7064

mccoysc opened this issue Nov 20, 2022 · 23 comments
Assignees

Comments

@mccoysc
Copy link

mccoysc commented Nov 20, 2022

System information
any system,any node version 18.×

Describe the current behavior
in nodejs workThread:

ctx={tfjsnode:function(){return require("@tensorflow/tfjs-node")}};

code="imgData=fs.readFileSync("img.jpg");imgData=Uint8Array.from(imgData);tfjsnode().node.decodeImage(imgData)";

vm.runInNewContext(ctx,code);

the code would throw an error: decodeImage can only accept typedArray and so on.

i traced it into code and found:
isTypedArray(values) would always return false.
but really,values is a Uint8Array.

i don't know why.

Describe the expected behavior

Standalone code to reproduce the issue

Other info / logs

@mccoysc mccoysc added the type:bug Something isn't working label Nov 20, 2022
@mccoysc
Copy link
Author

mccoysc commented Nov 20, 2022

image

@mccoysc
Copy link
Author

mccoysc commented Nov 20, 2022

i am sure that the imgData is "Uint8Array".
and the code can be run good in non-workThread and non-contextfied node env.

@rthadur
Copy link
Contributor

rthadur commented Nov 22, 2022

You may need to useimgData=Uint8Array(imgData)instead of imgData=Uint8Array.from(imgData) , can you please check here . Thank you!

@rthadur rthadur self-assigned this Nov 22, 2022
@mccoysc
Copy link
Author

mccoysc commented Nov 22, 2022

You may need to useimgData=Uint8Array(imgData)instead of imgData=Uint8Array.from(imgData) , can you please check here . Thank you!

no,i don't only use from,i also use new Uint8Array

@mccoysc
Copy link
Author

mccoysc commented Nov 22, 2022

maybe the offical should re-write the function code for "isTypedArray",the orignal code use "instnaceof" to determine if the content is an instance of "typedarray".
but "instanceof" only checks if the object's constructor function is equal.

in fact,"tensorflow" has no need to be sure that the constructor functions are equal.
@rthadur

@pyu10055 pyu10055 assigned mattsoulanille and unassigned pyu10055 Nov 22, 2022
@mattsoulanille
Copy link
Member

I can reproduce this issue. Here's the code I'm using:

const tfjs = require('@tensorflow/tfjs');
const tfjsnode = require('@tensorflow/tfjs-node');
const fs = require('fs');
const vm = require('node:vm');


console.log('Running in main context');
let imgData = fs.readFileSync('img.jpg');
imgData = Uint8Array.from(imgData);
console.log(tfjsnode.node.decodeImage(imgData));

const ctx = {
  tfjsnode: function() {
    return require('@tensorflow/tfjs-node');
  },
  fs: function() {
    return require('fs');
  },
  log: console.log,
};

code = `
log('Running in a new context');
let imgData = fs().readFileSync('img.jpg');
imgData = Uint8Array.from(imgData);
const decoded = tfjsnode().node.decodeImage(imgData);
log(decoded);
`;

vm.createContext(ctx);
vm.runInContext(code, ctx);

I'm not yet sure why this isn't working, but if you just need to load an image and run decodeImage on it, passing it directly instead of converting it to a Uint8Array seems to work in node 18:

const vm = require('node:vm');

const ctx = {
  tfjsnode: function() {
    return require('@tensorflow/tfjs-node');
  },
  fs: function() {
    return require('fs');
  },
  log: console.log,
};

code = `
log('Running in a new context');
let imgData = fs().readFileSync('img.jpg');
// imgData = Uint8Array.from(imgData); // <-- This seems to break it.
const decoded = tfjsnode().node.decodeImage(imgData);
log(decoded);
`;

vm.createContext(ctx);
vm.runInContext(code, ctx);

@mattsoulanille
Copy link
Member

maybe the offical should re-write the function code for "isTypedArray",the orignal code use "instnaceof" to determine if the content is an instance of "typedarray". but "instanceof" only checks if the object's constructor function is equal.

in fact,"tensorflow" has no need to be sure that the constructor functions are equal. @rthadur

You're right, this seems to be the source of the issue. Thanks for the pointer!

Here's some test code that demonstrates it:

const vm = require('node:vm');

const ctx = {
  tfjsnode: function() {
    return require('@tensorflow/tfjs-node');
  },
  fs: function() {
    return require('fs');
  },
  log: console.log,
};

code = `
log('Running in a new context');
let imgData = fs().readFileSync('img.jpg');
imgData = Uint8Array.from(imgData);
log('imgData instanceof Uint8Array:', imgData instanceof Uint8Array);
const tfn = tfjsnode();
log('tfn.util.isTypedArray(imgData):', tfn.util.isTypedArray(imgData));
`;

vm.runInNewContext(code, ctx);

This prints the following:

imgData instanceof Uint8Array: true
tfn.util.isTypedArray(imgData): false

This is unexpected because isTypedArray returns true if its argument is an instance of Uint8Array. Maybe there's something I don't understand about vm.runInNewContext that's making it check the Uint8Array of the image from the new context against the Uint8Array from the main context, and that's why it's failing?

@mattsoulanille
Copy link
Member

My guess is that this code:

const ctx = {
  tfjsnode: function() {
    return require('@tensorflow/tfjs-node');
  },
}

... is causing tfjs-node to be loaded in the main context while it should actually be loaded in the new context. That's why isTypedArray is returning false (since it's prototype is from the main context but it's checking an object from the new context). If we use the Uint8Array constructor from the main context as well, then it works:

const fs = require('fs');
const vm = require('node:vm');

const ctx = {
  tfjsnode: function() {
    return require('@tensorflow/tfjs-node');
  },
  fs: function() {
    return require('fs');
  },
  log: console.log,
  makeUint8Array: function(data) {
    return new Uint8Array(data);
  },
};

code = `
log('Running in a new context');
let imgData = fs().readFileSync('img.jpg');
imgData = makeUint8Array(imgData);
log('imgData instanceof Uint8Array:', imgData instanceof Uint8Array);
const tfn = tfjsnode();
log('tfn.util.isTypedArray(imgData):', tfn.util.isTypedArray(imgData));
const decoded = tfn.node.decodeImage(imgData);
log(decoded);
`;

vm.runInNewContext(code, ctx);

Is there a way to load tfjs-node from the new context instead of the main context? I think that would solve this issue better than the above approach. Otherwise, I don't have a good solution to this. TFJS uses TypedArrays extensively to represent Tensor values, and we need to check the type of a typed array when creating a tensor.

Maybe your problem can be solved with WebWorkers in node and Comlink instead of vm.runInNewContext.

@mattsoulanille mattsoulanille closed this as not planned Won't fix, can't repro, duplicate, stale Nov 22, 2022
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@mccoysc
Copy link
Author

mccoysc commented Nov 22, 2022

My guess is that this code:

const ctx = {

  tfjsnode: function() {

    return require('@tensorflow/tfjs-node');

  },

}

... is causing tfjs-node to be loaded in the main context while it should actually be loaded in the new context. That's why isTypedArray is returning false (since it's prototype is from the main context but it's checking an object from the new context). If we use the Uint8Array constructor from the main context as well, then it works:

const fs = require('fs');

const vm = require('node:vm');



const ctx = {

  tfjsnode: function() {

    return require('@tensorflow/tfjs-node');

  },

  fs: function() {

    return require('fs');

  },

  log: console.log,

  makeUint8Array: function(data) {

    return new Uint8Array(data);

  },

};



code = `

log('Running in a new context');

let imgData = fs().readFileSync('img.jpg');

imgData = makeUint8Array(imgData);

log('imgData instanceof Uint8Array:', imgData instanceof Uint8Array);

const tfn = tfjsnode();

log('tfn.util.isTypedArray(imgData):', tfn.util.isTypedArray(imgData));

const decoded = tfn.node.decodeImage(imgData);

log(decoded);

`;



vm.runInNewContext(code, ctx);

Is there a way to load tfjs-node from the new context instead of the main context? I think that would solve this issue better than the above approach. Otherwise, I don't have a good solution to this. TFJS uses TypedArrays extensively to represent Tensor values, and we need to check the type of a typed array when creating a tensor.

Maybe your problem can be solved with WebWorkers in node and Comlink instead of vm.runInNewContext.

as you said,tfjs only use "Uint8Array" to "represent" tensor data,no need to care about if it's really a "Uint8Array".

so, "instanceof" is no need.

@mattsoulanille
Copy link
Member

Whether this is possible or not, I don't think it does what you want. When you run tfjsnode() in the new context to get access to tfjs-node, node still loads tfjs-node in the main context. Even though you use vm.runInNewContext, this tfjsnode() code is not isolated because require loads it in the main context. To get proper isolation, you may need to reimplement require in the new context (maybe by reading and eval-ing the file or something equivalent), but I don't recommend this.

Can you share the problem you're solving with vm.runInNewContext? What is the use-case for doing this? Thanks!

@mccoysc
Copy link
Author

mccoysc commented Nov 24, 2022

Whether this is possible or not, I don't think it does what you want. When you run tfjsnode() in the new context to get access to tfjs-node, node still loads tfjs-node in the main context. Even though you use vm.runInNewContext, this tfjsnode() code is not isolated because require loads it in the main context. To get proper isolation, you may need to reimplement require in the new context (maybe by reading and eval-ing the file or something equivalent), but I don't recommend this.

Can you share the problem you're solving with vm.runInNewContext? What is the use-case for doing this? Thanks!

no.
default,no "require" can be called in new context in nodejs.
if you want,the only way to "require" is give a "require" function in context params.

so,as my code,it's the only way,you have no way to "require" in new context.

@mccoysc
Copy link
Author

mccoysc commented Nov 24, 2022 via email

@mattsoulanille
Copy link
Member

I will be out of office Thursday and Friday, but I'm going to reopen this so I can take another look at it next week.

@RaisinTen
Copy link

the code running in new context is uploaded by user and must be isolated by new context.

Is this for security? The Node.js docs for VM - https://nodejs.org/api/vm.html#vm-executing-javascript already says:

The node:vm module is not a security mechanism. Do not use it to run untrusted code.

@mccoysc
Copy link
Author

mccoysc commented Nov 25, 2022

the code running in new context is uploaded by user and must be isolated by new context.

Is this for security? The Node.js docs for VM - https://nodejs.org/api/vm.html#vm-executing-javascript already says:

The node:vm module is not a security mechanism. Do not use it to run untrusted code.

no,the "user code" are all audited by some trusted org.
i isolate some runtime vars by vm context.

@RaisinTen
Copy link

What do you think about copying everything except for your context specific runtime vars for each vm script?

I modified you example and now it returns true in both cases FWIW:

const vm = require('node:vm');

const ctx = vm.createContext(Object.defineProperties(
  {
    ...global,
    tfjsnode: function() {
      return require('@tensorflow/tfjs-node');
    },
    log: console.log,
  },
  Object.getOwnPropertyDescriptors(global),
));

code = `
log('Running in a new context');
const imgData = Uint8Array.from([1,2]);
log('imgData instanceof Uint8Array:', imgData instanceof Uint8Array);
const tfn = tfjsnode();
log('tfn.util.isTypedArray(imgData):', tfn.util.isTypedArray(imgData));
`;

vm.runInNewContext(code, ctx);

@mccoysc
Copy link
Author

mccoysc commented Nov 25, 2022

What do you think about copying everything except for your context specific runtime vars for each vm script?

I modified you example and now it returns true in both cases FWIW:

not good,i need isolate all vars to make user code run in new env every time.

@RaisinTen
Copy link

RaisinTen commented Nov 25, 2022

not good,i need isolate all vars to make user code run in new env every time.

I mean all the variables are isolated in the current approach, that's why Uint8Array in the main context is different from Uint8Array in the vm script's context. It looks like you want the variables on different contexts to be the same and different at the same time?

If you want everything to be isolated, it doesn't make sense to share the contents of a module you required in the main context with the vm script's context.

If you are okay with using the variables of the main context inside the vm script's context, you should use vm.runInThisContext, where everything would run in the same context.

These can't be mixed together.

You might find the discussion in nodejs/help#761 (comment) useful.

@mccoysc
Copy link
Author

mccoysc commented Nov 25, 2022

yes,i want isolate all vars in new context,not in "this context".
yes,better,if i can require module in new context, but can't.

@RaisinTen
Copy link

yes,i want isolate all vars in new context,not in "this context".
yes,better,if i can require module in new context, but can't.

Yea, I don't see a way of achieving what you want without implementing your own require() in the vm script's context without using anything from the main context.

@mccoysc
Copy link
Author

mccoysc commented Nov 25, 2022 via email

@RaisinTen
Copy link

RaisinTen commented Nov 25, 2022

There is an open issue in nodejs/node#31852 that pretty much asks for something like that.

I think the easiest way to unblock yourself would be to pass Uint8Array (and any other symbols that you have use cases for) to the vm script's context. I know that it's a hack but it would eventually go away once that issue gets fixed and looking at some of the recent comments it looks like we are getting close.

mattsoulanille added a commit to mattsoulanille/tfjs that referenced this issue Dec 15, 2022
isTypedArray is implemented with `instanceof`, which does not work in jest (jestjs/jest#11864). Instead, use node's builtin `util.types.isUint8Array`, `util.types.isFloat32Array`, etc to perform this check.

Fixes tensorflow#7175.
This may also address tensorflow#7064, but it does not fix the root cause.
mattsoulanille added a commit that referenced this issue Dec 16, 2022
isTypedArray is implemented with `instanceof`, which does not work in jest (jestjs/jest#11864). Instead, use node's builtin `util.types.isUint8Array`, `util.types.isFloat32Array`, etc to perform this check.

Fixes #7175.
This may also address #7064, but it does not fix the root cause.
Linchenn pushed a commit to Linchenn/tfjs that referenced this issue Jan 9, 2023
)

isTypedArray is implemented with `instanceof`, which does not work in jest (jestjs/jest#11864). Instead, use node's builtin `util.types.isUint8Array`, `util.types.isFloat32Array`, etc to perform this check.

Fixes tensorflow#7175.
This may also address tensorflow#7064, but it does not fix the root cause.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants