-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds attempt decoding functions #436
Adds attempt decoding functions #436
Conversation
We noticed that there's a difference between Scala 2 and Scala 3. One of the tests now fails unexpectedly.
In Scala 3, decoding the CSV into the In Scala 2, decoding the CSV into the Any idea if this is intentional? |
Both Scala versions are meant to behave consistent. Thank you for your PR! I don't yet have the time to look at it closely (should find some the next days though), but will approve the workflow runs if you push more stuff so you get CI feedback until we can properly review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your use case to parse CSV files that have rows with a wrong numbers of columns in between or just erroneous data (as in: no number where expected and alike)?
I'm not entirely convinced yet that these combinators are justified on the high level API, as they are easy to do using the low level API and seem to be rather non-standard use cases. But happy to learn more!
As for the implicits, that's fine. Maintaining bincompat is sometimes slightly ugly and at least, it stays internal to the library and we can clean up in fs2-data 2.0 one day.
Test verbosity would be OK, but let's put them where they belong (csv
module), maybe in a new file. I'll add some inline comments for the tests as well.
csv/generic/shared/src/test/scala/fs2/data/csv/generic/CsvRowDecoderTest.scala
Outdated
Show resolved
Hide resolved
csv/generic/shared/src/test/scala/fs2/data/csv/generic/CsvRowDecoderTest.scala
Outdated
Show resolved
Hide resolved
csv/generic/shared/src/test/scala/fs2/data/csv/generic/CsvRowDecoderTest.scala
Outdated
Show resolved
Hide resolved
People sometimes create CSV files by manually inputting data into Excel or whatever, and those files become invalid in the weirdest ways. Strings in an expected Int field, commas within an unquoted String and who knows what else. In that situation, it's much more convenient to be able to return a full list of errors for a file so they can fix them all at once than only returning the first error and requiring the user to try over and over again until they fix all of them. The use case is similar to how you could use Cats Validated to parse several sources of data in parallel and return a list of all errors instead of throwing whatever happens to be the first exception. As for your other comments, we'll take a look after the weekend. |
It could be a matter of perspective. When using a streaming library, especially in a functional programming language like Scala, I expect errors to be first-class citizens. So it surprised me that the default behaviour is to kill the stream whenever an error is encountered. That's not something I would have expected from a resiliency perspective. In my opinion, returning errors as values is not a novelty feature someone should implement via the low-level API. I would have expected this to be available in the high-level API, though, as I mentioned initially, it could all be a matter of perspective. It took quite a while to figure out that aggregating errors is not achievable via the high-level API and that one should use the low-level API instead. It would be great to have the documentation reflect this because it could have been more evident with a concrete example of achieving this. |
We have to differentiate the errors here though:
The first group is very easy to handle today, just decode as PS: Sorry for the accidental close/re-open. |
I agree there are two different kinds of errors. But in case of a row column not matching the header column count, this could be a single error for that row because someone accidentally deleted a comma. It would make sense to me to handle those as single-row errors and accumulate them. |
No worries for the close/re-open! If you're dealing with manually created CSV files, then a single row that cannot be decoded is to be expected. That should, however, not affect the entire file from a resiliency point of view. The only way I could figure out so far is to not use Suppose we can get the headers per row by having an How would you create a resilient stream that does not fail, even if the structure is incorrect (e.g. a column is missing for a row) using the low-level API?
|
csv/generic/shared/src/test/scala/fs2/data/csv/generic/CsvRowDecoderTest.scala
Outdated
Show resolved
Hide resolved
I discussed this with @satabin and we agreed that because varying column count is not generally safe to parse per row (because it can well be a sign of the whole file being corrupted), we would like to namespace the new pipes into a So if you could move the new pipes into such object (similar to the |
…ala 2 and Scala 3.
I've also added a bit of documentation FYI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the docs! I left some comments to be resolved still, but we're getting closer to merging 👍
csv/generic/shared/src/test/scala/fs2/data/csv/generic/CsvRowDecoderTest.scala
Outdated
Show resolved
Hide resolved
csv/generic/shared/src/test/scala/fs2/data/csv/generic/CsvRowDecoderTest.scala
Outdated
Show resolved
Hide resolved
csv/generic/shared/src/test/scala/fs2/data/csv/generic/CsvRowDecoderTest.scala
Outdated
Show resolved
Hide resolved
@ybasket I've requested a review from you, just to be sure that we're not waiting on each other. If there's something still to be done in this PR, please let me know! 👍 |
👍 I was on vacation the last days, so sorry for not replying earlier! Code looks good, I'm just wondering why there are still changes in the Also saw during review we have a specific exception type ( |
@ybasket No problem! Hope you had nice days off. I've removed the tests in the generic module.
What do you mean here? All the |
I mean the specialised error type here:
I'll do the follow-up (it was apparently my oversight long ago), no worries. It's also compatible with your changes. And I'll also clean up the small remainders of changes in the |
You're welcome! Thank you in advance for approving and merging the PR! 😄 |
@GerretS and I worked on a potential improvement regarding errors as values. We noticed that when parsing CSV files, the default behaviour is to fail the stream, whenever an error occurs. For our use case, we would like to aggregate these errors at the end, so that we have all potential errors at the same time.
There are a few open questions that we would like your input on.
Tests have been added to the csv-generic
As we wanted to test the decoding of a case class, we added the tests for all our
attempt*
functions to thecsv-generic
project. Is that OK, or would you like us to change that. We didn't want to manually create our own decoder withincsv/shared
.headersAttempt (bincompat)
The existence of the headersAttempt bincompat forces us to pass the implicits along, instead of them implicitly being passed along. Is that OK for now? The only solution that we saw is to remove the bincompat version, but we have no idea what the side effects are there.
Verbose tests
The tests are now a bit verbose, as there's quite some duplication. However, the tests are rather trivial and we wonder if you are OK with the setup, or whether we should improve that.
We're looking forward hearing from you!