Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display spark datasets and dataframes with table widget #7040

Closed
scottdraves opened this issue Mar 25, 2018 · 5 comments
Closed

display spark datasets and dataframes with table widget #7040

scottdraves opened this issue Mar 25, 2018 · 5 comments

Comments

@scottdraves
Copy link
Contributor

scottdraves commented Mar 25, 2018

see @jpallas comments #6993

indeed. the widget should show just what spark normally shows ascii-formatted tables (sometimes very wide).

is there a way to configure show method to use our repr?
maybe via https://github.com/jupyter/jvm-repr?
add a new method or a subclass?

after the tables support streaming #7006
then we can connect them directly to that.

@scottdraves scottdraves changed the title display spark datasets with table widget display spark datasets and dataframes with table widget Mar 25, 2018
@scottdraves
Copy link
Contributor Author

related: #7041

@jpallas
Copy link
Contributor

jpallas commented Mar 25, 2018

show is defined directly by Dataset and has no return value, so I don't think there's any way to get it into a table short of scraping the output (or doing unspeakable things with cglib).

@scottdraves
Copy link
Contributor Author

scottdraves commented Mar 26, 2018

yea should be a display handler, installed by jvm-repr.
maybe start with 1000 rows.

@scottdraves
Copy link
Contributor Author

scottdraves commented Mar 28, 2018

@jpallas on the other issue you said:

(I've prototyped this).

care to share? jarek could pick up your work if you think it's on the right track.

@jpallas
Copy link
Contributor

jpallas commented Mar 28, 2018

What I did is just a few lines:

implicit class DatasetOps(ds: org.apache.spark.sql.Dataset[_]) {
    def display(rows: Int = 20) = {
        // I do not understand why this import is necessary
        import com.twosigma.beakerx.scala.table.TableDisplay
        val columns = ds.columns
        val rowVals = ds.toDF.take(rows)
        val t = new TableDisplay(rowVals map (row => (columns zip row.toSeq).toMap))
        t.display()
    }
}

(There is something strange going on with imports and visibility in the Scala kernel, maybe some odd interaction with the way the interpreter wraps things.)

Example with a Dataset[Fields] where case class Fields(first: String, second: String, third: Int):
screen shot 2018-03-28 at 11 48 35 am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants