Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make delimiter optional #153

Merged
merged 7 commits into from
Aug 5, 2022
Merged

Conversation

jfyu
Copy link
Contributor

@jfyu jfyu commented Jun 23, 2022

This PR aims to close #39 .

Changes

  • allowed delim or delimiter options in unload_and_copy and load_and_copy to be None, in which case we will ignore that parameter
  • adjusted tests to that case
  • if PARQUET in the options, we remove the default copy options because they don't work. I cannot however test other file types (mostly because i have no time to write these files)...

open comments

At the moment, if you do unload_options = ['PARQUET'] and forget to unset delim from the default, you will get an error. This is intentional, I think we don't need to hold the user's hands too much, they can examine the error message and figure out that they need to adjust the parameter of delim. However it's possible to set it default to None so the reverse happens (if someone was using a different delimiter they can run into an error and figure out that they need to change that).

Tests

I tested on our own dataset, and it works fine (I hid the output on the locopy because it contains AWS creds)
image
image

@CLAassistant
Copy link

CLAassistant commented Jun 23, 2022

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@fdosani fdosani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfyu This is awesome. Thanks so much for the contribution! Just some comments and suggestions, happy to discuss further.

@@ -589,6 +589,22 @@ def test_redshiftcopy(mock_session, credentials, dbapi):
(),
)
)
# no delim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can add some integration tests too? I actually don't have access to a redshift cluster atm. Are you able to run those test on your instance?

@jfyu
Copy link
Contributor Author

jfyu commented Jun 26, 2022

@fdosani Hey thanks so much for the comments! I still haven't written a unit test yet for I think unload. Things just picked up from work so I'll keep this in draft for a bit and will hopefully come back to it soon.

RE: integration test -- currently running my "in-prod" integration test 😂 , I can do it but I can't create S3 buckets so most likely will....screenshot a integration test pass but not modify the code directly

@jfyu
Copy link
Contributor Author

jfyu commented Jul 5, 2022

I have a question @fdosani I actually unified delim and delimiter (it was strange to me that they are called differently in unload and load), but I am not sure if we want this to break backward compatibility.

Also I think I addressed everything except tests! will look into it

@fdosani
Copy link
Member

fdosani commented Jul 5, 2022

I have a question @fdosani I actually unified delim and delimiter (it was strange to me that they are called differently in unload and load), but I am not sure if we want this to break backward compatibility.

Also I think I addressed everything except tests! will look into it

I think that’s fine. I’m ok with aligning things and we can just a major bump and call that out

@jfyu jfyu marked this pull request as ready for review July 14, 2022 13:52
@jfyu jfyu requested review from elzzhu and ak-gupta as code owners July 14, 2022 13:52
@fdosani
Copy link
Member

fdosani commented Jul 15, 2022

@jfyu Is this ready for review? Just thought I'd check in.

@jfyu
Copy link
Contributor Author

jfyu commented Jul 15, 2022

@fdosani yes! Sorry. I was in the middle of writing a message then got pulled away to work lol. I was going to say, it looks like it's fairly complicated for me to change the integration test or even run it because I don't think I can create s3 bucket and am fairly worried about touching other people's stuff in the same bucket. You might have to trust me on the integration 😅

I just re-ran with the existing code and they worked.
image

image

@fdosani fdosani force-pushed the support-other-files branch from 23307e7 to 0156f6e Compare July 18, 2022 17:09
@jfyu
Copy link
Contributor Author

jfyu commented Jul 26, 2022

(sorry doing a merge because I"m trying to use that branch and need higher versions of boto3)

@fdosani
Copy link
Member

fdosani commented Jul 26, 2022 via email

Copy link
Member

@fdosani fdosani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fdosani fdosani merged commit f8b78c5 into capitalone:develop Aug 5, 2022
@fdosani fdosani mentioned this pull request Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for non-Delimited file types in Redshift
3 participants