Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: configurable s3-compliant protocols #170

Open
RussTorres opened this issue Dec 17, 2018 · 3 comments
Open

feature: configurable s3-compliant protocols #170

RussTorres opened this issue Dec 17, 2018 · 3 comments
Labels
feature A new capability is born.

Comments

@RussTorres
Copy link

Currently support for s3-compliant apis is defined by adding options to the connectionpools module. It is straightforward to modify this module to allow various s3-compliant storage systems (such as "matrix" and wasabi), but it seems tedious to maintain personal forks for different endpoints -- at the Allen Institute we have either used or proposed using s3/cloudvolume with CEPH, Isilon, igneous, and wasabi. An easier configuration which can use custom endpoint and e.g. boto profile information without modifying the codebase would be helpful.

@william-silversmith william-silversmith added the feature A new capability is born. label Dec 17, 2018
@william-silversmith
Copy link
Contributor

This is a good idea Russel! There's a few ways we could implement it. One way might be to try detecting boto or aws configuration files, another might be to pass arguments to CloudVolume's constructor though it's a bit awkward since they would be arguments that apply for only one protocol. Have you used other means (i.e. not CloudVolume) of accessing these services before? How did you do it?

@RussTorres
Copy link
Author

In a few scripts I've had to target aws, wasabi, and igneous interchangeably. I've done this using boto3 and the following ~/.aws/config

[default]
region = us-west-2
[profile wasabi]
region = us-east-1
[profile igneous]
region = iggy-1

with a ~/.aws/credentials like this:

[default]
aws_access_key_id = MYAWSKEY
aws_secret_access_key = MYAWSSECRET
[wasabi]
aws_access_key_id = MYWASABIKEY
aws_secret_access_key = MYWASABISECRET
[igneous]
aws_access_key_id = MYIGNEOUSKEY
aws_secret_access_key = MYIGNEOUSSECRET

The scripts in question have been built using argschema and a simple argschema parser built to attach to a designated s3 resource. -- A lot of this is the argschema/marshmallow boilerplate, but the S3ResourceModule.__init__() shows how to use the session and resource schemas defined above it.

import os
try:
    from urllib.parse import urlparse
except ImportError:
    from urlparse import urlparse

import argschema
import boto3

from botocore.utils import fix_s3_host

example_input = {
    "resource": {
        "endpoint_url": "https://s3.wasabisys.com",
        "session": {
            "profile_name": "wasabi"
        }
    },
    "output_path": "s3://mybucket/",
    "input_fn": "example_txtfile.txt"
}

class S3Session(argschema.schemas.DefaultSchema):
    """schema defining an s3 session"""
    aws_access_key_id = argschema.fields.String(required=False)
    aws_secret_access_key = argschema.fields.String(required=False)
    aws_session_token = argschema.fields.String(required=False)
    region_name = argschema.fields.String(required=False)
    profile_name = argschema.fields.String(required=False)
    # botocore_session = BotoSession(required=False)


class S3SessionSchema(argschema.schemas.DefaultSchema):
    session = argschema.fields.Nested(S3Session, required=False)


class S3Resource(argschema.schemas.DefaultSchema):
    endpoint_url = argschema.fields.String(required=False)
    aws_access_key_id = argschema.fields.String(required=False)
    aws_secret_access_key = argschema.fields.String(required=False)
    aws_session_token = argschema.fields.String(required=False)
    region_name = argschema.fields.String(required=False)
    session = argschema.fields.Nested(S3Session, required=False)


class S3ResourceSchema(argschema.ArgSchema):
    resource = argschema.fields.Nested(S3Resource, required=False)


class S3ResourceModule(argschema.ArgSchemaParser):
    default_schema = S3ResourceSchema

    def __init__(self, *args, **kwargs):
        super(S3ResourceModule, self).__init__(*args, **kwargs)
        self.session = boto3.session.Session(
            **self.args['resource']['session'])
        self.resource = self.session.resource(
            's3', **{k: v for k, v in self.args['resource'].items()
                     if k != 'session'})
        self.resource.meta.client.meta.events.unregister(
            "before-sign.s3", fix_s3_host)


class MyS3ResourceSchema(S3ResourceSchema):
    output_path = argschema.fields.Str(required=True)
    input_fn = argschema.fields.InputFile(required=True)

class MyS3ResourceModule(S3ResourceModule):
    """example module to write a file to an output path"""
    default_schema = MyS3ResourceSchema

    def run(self):
        parsed_dest = urlparse(self.args["output_path"])
        if not parsed_dest.scheme == "s3":
            raise ValueError("destination should be s3 bucket")
        dest_bucket = parsed_dest.netloc
        dest_path = parsed_dest.path.lstrip('/')
        # self.resource.meta.client.upload_file(self.args['input_fn'], dest_bucket, "{p}/{fn}".format(dest_path, os.path.basename(self.args['input_fn'])))
        

@william-silversmith
Copy link
Contributor

Starting to make progress on this in the new CloudFiles repo: https://github.com/seung-lab/cloud-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new capability is born.
Projects
None yet
Development

No branches or pull requests

2 participants