-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: configurable s3-compliant protocols #170
Comments
This is a good idea Russel! There's a few ways we could implement it. One way might be to try detecting boto or aws configuration files, another might be to pass arguments to CloudVolume's constructor though it's a bit awkward since they would be arguments that apply for only one protocol. Have you used other means (i.e. not CloudVolume) of accessing these services before? How did you do it? |
In a few scripts I've had to target aws, wasabi, and igneous interchangeably. I've done this using boto3 and the following
with a
The scripts in question have been built using argschema and a simple argschema parser built to attach to a designated s3 resource. -- A lot of this is the argschema/marshmallow boilerplate, but the import os
try:
from urllib.parse import urlparse
except ImportError:
from urlparse import urlparse
import argschema
import boto3
from botocore.utils import fix_s3_host
example_input = {
"resource": {
"endpoint_url": "https://s3.wasabisys.com",
"session": {
"profile_name": "wasabi"
}
},
"output_path": "s3://mybucket/",
"input_fn": "example_txtfile.txt"
}
class S3Session(argschema.schemas.DefaultSchema):
"""schema defining an s3 session"""
aws_access_key_id = argschema.fields.String(required=False)
aws_secret_access_key = argschema.fields.String(required=False)
aws_session_token = argschema.fields.String(required=False)
region_name = argschema.fields.String(required=False)
profile_name = argschema.fields.String(required=False)
# botocore_session = BotoSession(required=False)
class S3SessionSchema(argschema.schemas.DefaultSchema):
session = argschema.fields.Nested(S3Session, required=False)
class S3Resource(argschema.schemas.DefaultSchema):
endpoint_url = argschema.fields.String(required=False)
aws_access_key_id = argschema.fields.String(required=False)
aws_secret_access_key = argschema.fields.String(required=False)
aws_session_token = argschema.fields.String(required=False)
region_name = argschema.fields.String(required=False)
session = argschema.fields.Nested(S3Session, required=False)
class S3ResourceSchema(argschema.ArgSchema):
resource = argschema.fields.Nested(S3Resource, required=False)
class S3ResourceModule(argschema.ArgSchemaParser):
default_schema = S3ResourceSchema
def __init__(self, *args, **kwargs):
super(S3ResourceModule, self).__init__(*args, **kwargs)
self.session = boto3.session.Session(
**self.args['resource']['session'])
self.resource = self.session.resource(
's3', **{k: v for k, v in self.args['resource'].items()
if k != 'session'})
self.resource.meta.client.meta.events.unregister(
"before-sign.s3", fix_s3_host)
class MyS3ResourceSchema(S3ResourceSchema):
output_path = argschema.fields.Str(required=True)
input_fn = argschema.fields.InputFile(required=True)
class MyS3ResourceModule(S3ResourceModule):
"""example module to write a file to an output path"""
default_schema = MyS3ResourceSchema
def run(self):
parsed_dest = urlparse(self.args["output_path"])
if not parsed_dest.scheme == "s3":
raise ValueError("destination should be s3 bucket")
dest_bucket = parsed_dest.netloc
dest_path = parsed_dest.path.lstrip('/')
# self.resource.meta.client.upload_file(self.args['input_fn'], dest_bucket, "{p}/{fn}".format(dest_path, os.path.basename(self.args['input_fn'])))
|
Starting to make progress on this in the new CloudFiles repo: https://github.com/seung-lab/cloud-files |
Currently support for s3-compliant apis is defined by adding options to the connectionpools module. It is straightforward to modify this module to allow various s3-compliant storage systems (such as "matrix" and wasabi), but it seems tedious to maintain personal forks for different endpoints -- at the Allen Institute we have either used or proposed using s3/cloudvolume with CEPH, Isilon, igneous, and wasabi. An easier configuration which can use custom endpoint and e.g. boto profile information without modifying the codebase would be helpful.
The text was updated successfully, but these errors were encountered: