Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra configuration for HttpDataSink #1480

Closed
1 of 2 tasks
lucian-torje-siemens opened this issue Jun 14, 2022 · 0 comments · Fixed by #1510
Closed
1 of 2 tasks

Extra configuration for HttpDataSink #1480

lucian-torje-siemens opened this issue Jun 14, 2022 · 0 comments · Fixed by #1510
Assignees

Comments

@lucian-torje-siemens
Copy link
Contributor

lucian-torje-siemens commented Jun 14, 2022

Feature Request

HttpDataSink supports at the moment only HTTP POST and it changes the initial endpoint by appending the part name.

In order to make it compatible with upload AWS S3 presigned URLs it needs extra adjustments, which are part of this feature request.

Which Areas Would Be Affected?

Dataplane HTTP extension

Why Is the Feature Desired?

We need this feature for small files uploads in cloud (< 50 MB).
The flow is:

  • create upload presign URL
  • use http data source dataplane to get the data
  • use http data sink dataplane to push the data to the created presign URL

Solution Proposal

In case we want to use it for uploading data to an upload presign URL it needs to be changed to support HTTP PUT and not to modify the endpoint (unfortunately a single presign URL can be used for the all content).

We propose adding 3 new asset properties:

  • httpVerb- possible values:
    - POST (default) - use HTTP POST to send data
    - PUT - use HTTP PUT to send data
  • additionalHeaders - make it possible to add extra headers like Azure x-ms-blob-type:BlockBlob
    - added not as String bu as Json key-value object like:
    "dataDestination": { "properties": { "endpoint": "https://qwerty.blob.core.windows.net/test/test.dat?sv=2021-06-08&ss=qwerty&srt=qwerty&sp=qaz&se=2022-06-20T17:19:22Z&st=2022-06-20T09:19:22Z&spr=https&sig=qwerty", "name": "", "type": "HttpData", "additionalHeaders" : {"Content-Type" : "application/octet-stream","x-ms-blob-type": "BlockBlob"} }
    and a new config property:
  • edc.dataplane.http.sink.partition.size - used to setup the partition size (5 by default as hardcoded now) - see HttpDataSinkFactory

Limitations

In order to enable multipart cloud data upload (for big files > 50MB) a different solution should be used instead - AWS/Azure extension (which require cloud credentials) or a solution similar to - see AWS doc and Boto example.

image

see CloudHttpDataSink

When to use

This solution is usable in case of:

Type of Issue

new feature

Checklist

  • assigned appropriate label?
  • Do NOT select a milestone or an assignee!
lucian-torje-siemens added a commit to mindsphere/DataSpaceConnector-Fork that referenced this issue Jun 21, 2022
lucian-torje-siemens added a commit to mindsphere/DataSpaceConnector-Fork that referenced this issue Jun 22, 2022
lucian-torje-siemens added a commit to mindsphere/DataSpaceConnector-Fork that referenced this issue Jun 27, 2022
* Reduced the number of URL methods from two to one

* Changed from where the path is read and temporarily disabled status check

* Builds in docker

* feat: Generate presigned url only on request - wip

* update postman

* feat: Generate presigned url only on request -- adding copyright

* Filesystem provisioning sample - creates file before sending

* Added collection for testing url generation

* Fixed problem with presigned url generation

Fixed problem with presigned url generation

Remove

* wip on datalake integration

* feat: Generate presigned url before copying file from http and save it to filesystem

* Work on introduction and goals

* Work on fixing merge from main

* Work on fixing merge from main

* Work on fixing merge from main

* Cleanup code

* Cleanup code

* Integrate in existing http to http example

* Cleanup code

* Add main for catena

* Wip on arch documentation

* Wip on integration

* Generate presign url works

* Fix documentation

* Cleanup code

* Cleanup code

* Modify CHANGELOG.md

* Update docs

* Update docs

* New extension for cloudhttpdata

* New extension for cloudhttpdata

* Fix failed to start

* Apply review

* Feature: lazily generate mindsphere datalake presign url
 - cleanup & review

* Feature: lazily generate mindsphere datalake presign url
 - fix failed build

* Feature: lazily generate mindsphere datalake presign url
 - cleanup

* Feature: lazily generate mindsphere datalake presign url - rename and use additional headers

* Feature: lazily generate mindsphere datalake presign url - remove cloud data source

* Feature: lazily generate mindsphere datalake presign url - remove cloud data source

* Feature: Extra configuration for HttpDataSink eclipse-edc#1480

* Use http data

* fix checkstyle

* fix conversion

Co-authored-by: Alex Ghiran <[email protected]>
bscholtes1A pushed a commit that referenced this issue Jun 27, 2022
* feat: Extra configuration for HttpDataSink #1480

* cleanup

* update header

* add more tests

* remove usePartName

* apply review

* apply review

* apply review

* apply review

* apply review - remove method from sink and use header: for additional headers

* fix missing header

* cleanup

* fix compilation

* remove test for PUT

* apply review - use assertj and move content type to it's own field

* fix style

* fix style

* fix failing test

* apply review - use putall
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants