Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with encoding (latin1 vs. UTF-8) when running devtools:test() #1306

Closed
hansharhoff opened this issue Aug 25, 2016 · 4 comments
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@hansharhoff
Copy link
Contributor

TL;DR

UTF-8 encoded files with non-ASCII strings are not correctly represented when running devtools::test()
on Windows. There should be an option to get the encoding from the Description file or from e.g. devtools::test(encoding = "UTF-8")

Description of issue

I am writing a package where strings containing non-ASCII characters need to be sent to a SQL server through RODBC.

By saving a query string with non-ASCII characters and running this in RStudio, R-terminal or using source from RStudio I am able to get the correct result.

However if I use devtools:test() my code fails since it seems that it does not source the functions using the correct encoding.

I have made a minimal example test functions (not using the SQL server). Here is the relevant code:

library(testthat)
context("Test encoding")

test_that("encoding works", {
  query <- "Test encoding: ó æ ø å  END TEST"
  print(query)
})

Running this code in RStudio (using Ctrl-R) returns:

[1] "Test encoding: ó æ ø å END TEST"
Which is the correct and expected result.

Pressing the source button in RStudio runs:
source('C:/Users/hahad/workspaces/bugreports/tests/testthat/test_encoding.R', encoding = 'UTF-8', echo=TRUE)
and returns the correct, expected result:

> source('C:/Users/hahad/workspaces/bugreports/tests/testthat/test_encoding.R', echo=TRUE)

> library(testthat)

> context("Test encoding")

> test_that("encoding works", {
+   query <- "Test encoding: ó æ ø å  END TEST"
+   print(query)
+ })
[1] "Test encoding: ó æ ø å  END TEST"

From this I conclude that the files are saved as UTF-8 and should therefore be sourced as UTF-8 (also confirmed using external program and by inspecting the RStudio settings).

If I save this script in a testthat folder in an otherwise empty package and run

devtools::test()

I get the following result:

devtools::test()
Loading bugreports
Testing bugreports
Test encoding: [1] "Test encoding: ó æ ø å  END TEST"


DONE =========================================================================================

This means that devtools is sourcing the files as latin1 instead of the UTF-8 files that they are. This happens irrespective of the Encoding: UTF-8 statement in the description file.

I have attempted to change to a UTF-8 encoding for my locale using

> Sys.setlocale("LC_ALL", 'UTF-8')
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "UTF-8") :
  OS reports request to set locale to "UTF-8" cannot be honored

Summary

When using devtools::test() the encoding is not properly handled (at least on Windows). It is not possible to change to a UTF-8 encoding for the locale on Windows (as far as I can tell), even if this is the suggested encoding to use.

Proposed solutions

  1. Make devtools::test() respect the Encoding in the DESCRIPTION file.
  2. Make optional argument to devtools:test() for encoding (as there is for source-function.
  3. I should develop package entirely in latin1
  4. Find a way to use UTF-8 compatible locale in Windows.

I would prefer solution 1
:-)

@hansharhoff
Copy link
Contributor Author

I traced the calling functions from

devtools::test()

to

testthat::test_dir

to

testthat::test_files

to

testthat::test_file

and finally to
>testthat::source_file

This uses readLines to get the lines from the test-R-file. I can confirm that using:

readLines(pathToFile, encoding = "UTF-8")

reads the file correctly.
But

readLines(pathToFile)

does not.

So propagating the encoding to this function would solve the problem.

@jimhester
Copy link
Member

Looks like testthat::test_file() and testthat::source_file() need to gain an encoding option (defaulting to "unknown") and devtools::test() needs to call testthat::test_dir(..., encoding = pkg$encoding %||% "unknown").

@hansharhoff You seem to have a good grip on the problem and solution, do you want to try submitting pull requests to devtools and testthat with the above fixes? See https://github.com/hadley/devtools/blob/master/CONTRIBUTING.md#pull-requests for information on contributing to devtools.

hansharhoff pushed a commit to hansharhoff/testthat that referenced this issue Aug 29, 2016
hansharhoff pushed a commit to hansharhoff/devtools that referenced this issue Aug 29, 2016
This was referenced Aug 30, 2016
@hansharhoff
Copy link
Contributor Author

I have made two pull requests in regards to this issue. They should fix the issue without disturbing any functionality. I also to add a test to test the filter parameter of test(), but I did not manage to make a consistent test since the indirection of pointing at another set of tests confused me.

I also did not add a test of the encoding issue, in part because it only occurs on some platforms.

@hadley hadley added the bug an unexpected problem or unintended behavior label Nov 3, 2016
@hadley hadley closed this as completed Nov 4, 2016
hansharhoff pushed a commit to hansharhoff/testthat that referenced this issue Nov 24, 2016
…ompatability on Windows (see e.g. r-lib/devtools#1306)

Add NEWS.md line about the change.
hadley pushed a commit to r-lib/testthat that referenced this issue Dec 15, 2016
* Add encoding parameter to source_file and test_file to ensure UTF-8 compatability on Windows (see e.g. r-lib/devtools#1306)

* Add encoding parameter to source_file and test_file to ensure UTF-8 compatability on Windows (see e.g. r-lib/devtools#1306)
Add NEWS.md line about the change.

* Simplify documentation of encoding parameter.
Remove ... from call/definition of test_files
@lock
Copy link

lock bot commented Sep 18, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants