Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Luca/fix raw predict issue 635 #637

Merged
merged 8 commits into from
Feb 24, 2020

Conversation

lucagiovagnoli
Copy link
Member

Code change
This is a one line change that fixes issue #635 where rawPredict should be (-m, m).
(also found another small bug where tree_limit wasn't being serialized in the store)

Added tests
I noticed there were no tests for mleap-xgboost-runtime so I added a bunch of them using the same "agaricus" dataset from dmlc-xgboost:

  1. Booster and mleap XGBoostClassifier results match (tests for (-m, m))
  2. XGBoostClassification works before and after serialization.
  3. XGBoostRuntime schema assertions based on the NodeShape

@talalryz @voganrc @ancasarb

@lucagiovagnoli lucagiovagnoli force-pushed the luca/fix-rawPredict-issue-635 branch from e8bfc8b to 1a444f0 Compare February 8, 2020 03:01
@@ -114,7 +115,7 @@ object Dependencies {

val tensorflow = l ++= tensorflowDeps ++ Seq(Test.scalaTest)

val xgboostRuntime = l ++= Seq(xgboostDep) ++ Seq(Test.scalaTest)
val xgboostRuntime = l ++= Seq(xgboostDep) ++ Test.spark ++ Seq(Test.scalaTest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure Provided.spark isn't enough? it seems like that for the other parity tests?

Copy link
Member Author

@lucagiovagnoli lucagiovagnoli Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I actually meant to get your input on these dependency changes.
Both Provided.spark or Test.spark achieve the same thing here because I am using spark things only in tests.

I think that putting those dependencies with a test scope makes the build faster, since the dependencies are only brought in when running tests, not when building. I read online:

  • test:

Dependencies with maven dependency scope test are not needed to build and run the project. They are needed to compile and run the unit tests.

  • provided:

Maven dependency scope provided is used during build and test the project.

so I assume a test scope is slightly lighter than provided? Again, the xgboost-runtime code has no dependencies on spark stuff, but its tests do.

Another option would be to avoid spark things in tests but I found spark has the best libsvm reader that lets me do
val dataFrame = spark.read.format("libsvm").load(this.getClass.getClassLoader.getResource(filePath).getFile)

and I can use the Dataframe -> to mleapFrame utils.

Let me know what you think!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thank you!

dependencies = Seq(runtime)
dependencies = Seq(
runtime,
sparkTestkit % "test")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we use sparkTestkit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't bring that in, I cannot use:
import org.apache.spark.sql.mleap.TypeConverters

Do you think it is ok to use that in tests? (%"test" should not be included in the build)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that comes from mleap-spark-base, but spark-testkit should be ok too.

Copy link
Member

@ancasarb ancasarb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall, just had a couple of questions.

Copy link
Contributor

@talalryz talalryz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.
Perhaps we should also test for XGBoostMultinomialClassificationModel

@lucagiovagnoli
Copy link
Member Author

Thanks all for reviewing! I added a multinomial classifier test and fixed all copy-paste leftovers. Let me know what you think about using spark % test for these tests.

Copy link
Contributor

@talalryz talalryz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but it looks like the travis build failed.

@lucagiovagnoli
Copy link
Member Author

it seems that my new tests passed successfully (https://travis-ci.org/combust/mleap/jobs/649255264#L9688) but the python ones failed. I'll try rerun these

@lucagiovagnoli lucagiovagnoli force-pushed the luca/fix-rawPredict-issue-635 branch 8 times, most recently from e16f09d to 2312be4 Compare February 13, 2020 01:20
@lucagiovagnoli
Copy link
Member Author

lucagiovagnoli commented Feb 13, 2020

I think this broke after the recent update of virtualenv to 20.0.0
See pypa/virtualenv#1551

EDIT: I fixed this in #641 so I expect these test to fail until we push that branch out

@lucagiovagnoli lucagiovagnoli force-pushed the luca/fix-rawPredict-issue-635 branch 2 times, most recently from da6d907 to 06c738a Compare February 13, 2020 01:31
@lucagiovagnoli lucagiovagnoli force-pushed the luca/fix-rawPredict-issue-635 branch from 06c738a to e7877bd Compare February 14, 2020 07:22
Copy link
Contributor

@talalryz talalryz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small comments, but this looks good to me!

Thank you for adding the multinomial test as well!

@lucagiovagnoli lucagiovagnoli force-pushed the luca/fix-rawPredict-issue-635 branch from ae22247 to 47053e0 Compare February 15, 2020 03:29
Copy link
Contributor

@voganrc voganrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lucagiovagnoli lucagiovagnoli force-pushed the luca/fix-rawPredict-issue-635 branch from 7dcd9bd to 11848c3 Compare February 22, 2020 03:55
Copy link
Member

@ancasarb ancasarb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants