-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Luca/fix raw predict issue 635 #637
Luca/fix raw predict issue 635 #637
Conversation
df15e97
to
e8bfc8b
Compare
…ime xgboost Regression
e8bfc8b
to
1a444f0
Compare
...ntime/src/test/scala/ml/combust/mleap/xgboost/runtime/testing/BundleSerializationUtils.scala
Outdated
Show resolved
Hide resolved
@@ -114,7 +115,7 @@ object Dependencies { | |||
|
|||
val tensorflow = l ++= tensorflowDeps ++ Seq(Test.scalaTest) | |||
|
|||
val xgboostRuntime = l ++= Seq(xgboostDep) ++ Seq(Test.scalaTest) | |||
val xgboostRuntime = l ++= Seq(xgboostDep) ++ Test.spark ++ Seq(Test.scalaTest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure Provided.spark isn't enough? it seems like that for the other parity tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I actually meant to get your input on these dependency changes.
Both Provided.spark
or Test.spark
achieve the same thing here because I am using spark things only in tests.
I think that putting those dependencies with a test
scope makes the build faster, since the dependencies are only brought in when running tests, not when building. I read online:
- test:
Dependencies with maven dependency scope test are not needed to build and run the project. They are needed to compile and run the unit tests.
- provided:
Maven dependency scope provided is used during build and test the project.
so I assume a test scope is slightly lighter than provided? Again, the xgboost-runtime code has no dependencies on spark stuff, but its tests do.
Another option would be to avoid spark things in tests but I found spark has the best libsvm reader that lets me do
val dataFrame = spark.read.format("libsvm").load(this.getClass.getClassLoader.getResource(filePath).getFile)
and I can use the Dataframe -> to mleapFrame utils.
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thank you!
dependencies = Seq(runtime) | ||
dependencies = Seq( | ||
runtime, | ||
sparkTestkit % "test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we use sparkTestkit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't bring that in, I cannot use:
import org.apache.spark.sql.mleap.TypeConverters
Do you think it is ok to use that in tests? (%"test" should not be included in the build)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that comes from mleap-spark-base, but spark-testkit should be ok too.
...ost-runtime/src/test/scala/ml/combust/mleap/xgboost/runtime/testing/CachedDatasetUtils.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall, just had a couple of questions.
...ime/src/main/scala/ml/combust/mleap/xgboost/runtime/bundle/ops/XGBoostClassificationOp.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
Perhaps we should also test for XGBoostMultinomialClassificationModel
...p-xgboost-runtime/src/test/scala/ml/combust/mleap/xgboost/runtime/testing/BoosterUtils.scala
Outdated
Show resolved
Hide resolved
Thanks all for reviewing! I added a multinomial classifier test and fixed all copy-paste leftovers. Let me know what you think about using spark % test for these tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but it looks like the travis build failed.
it seems that my new tests passed successfully (https://travis-ci.org/combust/mleap/jobs/649255264#L9688) but the python ones failed. I'll try rerun these |
e16f09d
to
2312be4
Compare
I think this broke after the recent update of virtualenv to 20.0.0 EDIT: I fixed this in #641 so I expect these test to fail until we push that branch out |
da6d907
to
06c738a
Compare
...e/src/test/scala/ml/combust/mleap/xgboost/runtime/XGBoostClassificationModelParitySpec.scala
Outdated
Show resolved
Hide resolved
...e/src/test/scala/ml/combust/mleap/xgboost/runtime/XGBoostClassificationModelParitySpec.scala
Outdated
Show resolved
Hide resolved
...e/src/test/scala/ml/combust/mleap/xgboost/runtime/XGBoostClassificationModelParitySpec.scala
Outdated
Show resolved
Hide resolved
...e/src/test/scala/ml/combust/mleap/xgboost/runtime/XGBoostClassificationModelParitySpec.scala
Outdated
Show resolved
Hide resolved
...me/src/test/scala/ml/combust/mleap/xgboost/runtime/testing/FloatingPointApproximations.scala
Show resolved
Hide resolved
06c738a
to
e7877bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small comments, but this looks good to me!
Thank you for adding the multinomial test as well!
...e/src/test/scala/ml/combust/mleap/xgboost/runtime/XGBoostClassificationModelParitySpec.scala
Outdated
Show resolved
Hide resolved
...ost-runtime/src/test/scala/ml/combust/mleap/xgboost/runtime/testing/CachedDatasetUtils.scala
Outdated
Show resolved
Hide resolved
ae22247
to
47053e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7dcd9bd
to
11848c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you!
Code change
This is a one line change that fixes issue #635 where rawPredict should be
(-m, m)
.(also found another small bug where tree_limit wasn't being serialized in the
store
)Added tests
I noticed there were no tests for mleap-xgboost-runtime so I added a bunch of them using the same "agaricus" dataset from
dmlc-xgboost
:(-m, m)
)@talalryz @voganrc @ancasarb