Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

classification1 #3

Closed
GloriaWYY opened this issue May 11, 2022 · 8 comments · Fixed by #78
Closed

classification1 #3

GloriaWYY opened this issue May 11, 2022 · 8 comments · Fixed by #78
Assignees

Comments

@GloriaWYY
Copy link
Contributor

GloriaWYY commented May 11, 2022

PR #7

@GloriaWYY GloriaWYY self-assigned this May 11, 2022
@GloriaWYY GloriaWYY moved this to Todo in ocese-summer May 11, 2022
@GloriaWYY GloriaWYY moved this from Todo to In Progress in ocese-summer May 11, 2022
@GloriaWYY
Copy link
Contributor Author

GloriaWYY commented May 17, 2022

General progress:

  • All R code has been concerted into Python, but the code is not "dry" and I still need to write helper function to keep code more concise. The book builds successfully locally, and the latest change in files has been pushed to the classification1 branch.

  • Still working on editing the text description. e.g. where the text is describing an R function/package, change it to Python counterpart

Questions:

  • [GOOD FOR NOW] When I have alt.data_transformers.enable('data_server') and alt.renderers.enable('mimetype'), the book does not show code outputs. I have tried execute_notebooks: "off"in _config.yml, - file: classification1.ipynb in _toc.yml, and run the ipynb before building the book. Since this chapter does not involve a large dataset, I just get rid of those two lines and keep _config.yml and _toc.yml as they were initially, and the book builds sucessfully.

  • \index appears quite often (e.g. \index{predictive question}), I suppose this is for R markdown cross-referencing? Should I remove those since they do not work in Jupyter Book?

  • I see in-text citation such as [@knnfix; @knncover], and those are matching the references.bib, so should I figure out a way to convert it to Jupyter Book style reference?

  • [SOLVED] In the R textbook, many code cells are completely hidden. I tried this in MyST markdown, but so far the best I can do is to use {toggle} directive and the code is still visible when clicking on 'Click to show'

  • [SOLVED] In R markdown, we can do this (new_point is a variable containing a character vector). Is there a MyST markdown or Jupyter Book way to do this?

standardized perimeter r new_point[1] and concavity of r new_point[2]

  • In R markdown, we can reference a figure output from code through \@ref e.g. \@ref(fig:05-scatter). But I could not figure a way to mimic this in MyST markdown. The best I could think of so far is to first save the altair plot as a png, then show the figure through a markdown cell:
:height: 500px
:name: fig:05-scatter

Scatter plot of concavity versus perimeter colored by diagnosis label.

and reference the figure as {numref}fig:05-scatter. Below shows the screenshot of the book when I do this. The problem is that the resolution of the png is not great when I save the altair plot even when I have scale_factor=10.0

image

@joelostblom
Copy link
Contributor

Great progress! Answers to some of your questions:

In the R textbook, many code cells are completely hidden.

You can remove code input cells.

In R markdown, we can do this (new_point is a variable containing a character vector).

In JupyterBook you can glue variables into the text

In R markdown, we can reference a figure output from code

You can reference figures in Jupyter Book as well

The problem is that the resolution of the png is not great when I save the altair plot even when I have scale_factor=10.0

scale_factor is currently broken. However I don't think that should be an issue for us since I would vote for that we use svg in general for figures as these are resolution independent. If you need to reference a code chunk that created a figure then maybe you can add a label like for an equation? (not sure if this works).

@GloriaWYY
Copy link
Contributor Author

@joelostblom Thanks for your answers, they are quite helpful and clear up most questions I have! For referencing the code chunk, still not able to do that with a label similar to an equation. But it should be good since I can always save the plot and reference through a markdown cell.

@joelostblom
Copy link
Contributor

joelostblom commented May 17, 2022

@GloriaWYY
Copy link
Contributor Author

I tried 2 approaches:

  • The first one is just referencing the code cell that outputs an altair plot through referencing a general image output from code, but it's not attaching captions nor generating proper numbered reference so that I could reference it in text.

image

image

Saving the plot as png or svg seems to work, but without generating proper numbered reference (and defeats the purpose of referencing a code chunk)
image
image

image

image

@joelostblom
Copy link
Contributor

joelostblom commented May 18, 2022

Oh cool! I thought that would not work from reading the comments, great find @GloriaWYY !

I just made a PR #5 , with a way that does not require glueing, but I needed to modify the CSS and have an empty image in the figure directive with the caption. I think either of these solutions will work, although it might be annoying to have to assign each plot to a variable name, include the additional glueing cell, and set its metadata in addition to having the fig caption cell for each plot? On the other hand, this does not require changing any css or having empty images so that could be an advantage maybe? I am leaning towards the solution we merged in, but not with much. cc @phaustin

@GloriaWYY
Copy link
Contributor Author

GloriaWYY commented May 21, 2022

Some Questions

  1. [SOLVED] Plotly 3d scatter plot does not have legend. I was trying to use plotly.graph_objs to make the 3d plot (replicate Figure 5.8 in R textbook), but I cannot figure out why it does not show the color legend (i.e. diagnosis: Malinant, Benign, Unknown)

  2. [SOLVED] altair plot: how to customize the symbol size? I want the red diamond to be larger than the circles. I tried adjusting alt.Size but it did not bring me any obvious change.

image

  1. What should we do specifically to the \index? Right now I have preserved all \index (either in the markdown text when there is a one-to-one match in python, or as comments in the code cell), but wondering if I can do more. I saw in the pypkg textbook cookiecutter\\index{cookiecutter} was used, but when I tried the same thing predictions\\index{predictive question}, it gives:

image

  1. How shall we open PR that allows easy preview of Jupyter Book? Shall we change build_book.yml in the PR?

@joelostblom
Copy link
Contributor

joelostblom commented May 21, 2022

  1. If you use plotly express to create the 3d scatter plot it will automatically have a legend https://plotly.com/python/3d-scatter-plots/

  2. If you just want to differentiate by size, you can set size='column_name as an encoding. If you want control over which size corresponds to each category and, as in your case, reduce it from three levels (benign, malign, unknown) to two levels (small, big), the quickest is to use a condition:

    size=alt.condition('datum.Diagnosis == "Unknown"', alt.value(100), alt.value(30))

    You could also use a new chart to plot only the big points in a layer on top with a bigger marker size or create a new column that holds the values for the marker size:

    df['marker_size'] = 10
    df.loc[df['Diagnosis']=='Unknown', 'marker_size'] = 40
    ...
    size=alt.Size('marker_size:Q', legend=None)
  3. Not sure, but I think we said in the meeting to just keep it for now or convert it to a comment.

  4. You can't do anything. We need to setup a netlify account and it will be autoamtic. I tried but got this error message You do not have permission to modify this app on UBC-DSCI. Please contact an Organization Owner. So someone with more permissions will need to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants