Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance hit with svgstring() #58

Closed
timelyportfolio opened this issue Feb 9, 2016 · 10 comments
Closed

performance hit with svgstring() #58

timelyportfolio opened this issue Feb 9, 2016 · 10 comments

Comments

@timelyportfolio
Copy link
Contributor

I decide to test for speed differences between svglite and svgstring. I was surprised that svglite actually is significantly quicker on my machine.

> system.time({
+   fl <- tempfile(fileext=".svg")
+   svglite( file = fl )
+   plot(runif(10000),1:10000)
+   dev.off()
+ })
   user  system elapsed 
   0.13    0.06    0.20 
> 
> unlink(fl)
> 
> system.time({
+   svgstring()
+   plot(runif(10000),1:10000)
+   dev.off()
+ })
   user  system elapsed 
  75.99    2.25   79.60 

I'm wondering now if we should revert inlineSVG back to svglite.

Changing to runif(1000) and using microbenchmark, I get the following.

image

library(microbenchmark)
library(svglite)

svg_lite_fun <- function(){
  fl <- tempfile(fileext=".svg")
  svglite( file = fl )
  plot(runif(1000),1:1000)
  dev.off()
  unlink(fl)
}

svg_string_fun <- function(){
  svgstring()
  plot(runif(1000),1:1000)
  dev.off()
}

mb <- microbenchmark( svg_lite_fun(), svg_string_fun(), times = 10)
boxplot(mb)
@hadley
Copy link
Member

hadley commented Feb 9, 2016

I'm really not surprised that the performance isn't as good.

@timelyportfolio
Copy link
Contributor Author

Is it bad enough to revert back to svglite in inlineSVG? It is far worse than I expected, but it is always nice not to create a file.

@yixuan
Copy link
Contributor

yixuan commented Feb 14, 2016

Probably it's because insertion in string stream requires constructing new strings repeatedly?

@hadley
Copy link
Member

hadley commented Aug 15, 2016

@yixuan it seems to be copying the string from C++ to R that's slow. If I rewrite flush() to be:

  void flush() {
    stream_.flush();
    std::string x = stream_.str();
    # env_["svg_string"] = x;
  }

I get

Unit: milliseconds
             expr  min   lq  mean median    uq  max neval cld
   svg_lite_fun() 9.52 9.88 11.07  10.14 13.19 14.0    10   b
 svg_string_fun() 6.71 6.80  8.33   8.08  9.86 10.3    10  a 

If I uncomment the last line I get:

Unit: milliseconds
             expr    min    lq  mean median    uq max neval cld
   svg_lite_fun()   9.63  11.2  12.4   11.5  11.8  22    10  a 
 svg_string_fun() 330.97 336.7 371.3  349.1 431.0 453    10   b

@hadley
Copy link
Member

hadley commented Aug 15, 2016

If I construct the STRSXP directly by hand with Rf_mkCharLenCE(&x[0], x.length(), CE_UTF8), I get the same performance. It's probably related to R's global string pool - hashing that giant string takes some time.

@yixuan
Copy link
Contributor

yixuan commented Aug 15, 2016

This a good catch. Thanks Hadley.

One possible way to solve this is to cache the SVG string in C++ rather than in R, and we only do the copy when the string is requested in R. Some pseudo code may look like

svgstring <- function(width = 10, height = 8, bg = "white",
                      pointsize = 12, standalone = TRUE) {

  env <- new.env(parent = emptyenv())
  svgstring_(env, width = width, height = height, bg = bg,
    pointsize = pointsize, standalone = standalone)

  function() {
    .Call("get_string_from_cpp", env$svg_string_ptr)
  }
}

where env$svg_string_ptr saves the address of the cached C++ string. And in C++

class SvgStreamString : public SvgStream {
  std::stringstream stream_;
  Rcpp::Environment env_;
  std::string cached_string;

public:
  SvgStreamString(Rcpp::Environment env): env_(env) {
    stream_ << std::fixed << std::setprecision(2);
    env_["svg_string_ptr"] = somehow_return_the_pointer(&cached_string);
  }
  void flush() {
    stream_.flush();
    cached_string = stream_.str() + "</svg>";
  }
};

This way requires to take care of some other stuffs of course, for example returning proper string when device is closed.

@hadley
Copy link
Member

hadley commented Aug 15, 2016

I agree that that's a better interface - although it would be even better to return an Xptr from svgstring_, and then wrap with an accessor function. Then flush() wouldn't have to do anything - you'd call a special method from the accessor.

I didn't think it would solve the performance issue, but I guess flush() gets called multiple times so it might be a lot better.

@yixuan
Copy link
Contributor

yixuan commented Aug 15, 2016

Oh yeah, you are right. flush() can be empty and we direct return string from stream_.

@hadley
Copy link
Member

hadley commented Aug 16, 2016

Do you want to have a go at a PR? Do you get what I mean about the external pointer?

@yixuan
Copy link
Contributor

yixuan commented Aug 16, 2016

Yeah I think so. I can have a try maybe later this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants