Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用 SPU 实现 OneHotEncoder #725

Open
Yeekin-GYJ opened this issue Jun 11, 2024 · 9 comments
Open

使用 SPU 实现 OneHotEncoder #725

Yeekin-GYJ opened this issue Jun 11, 2024 · 9 comments
Assignees
Labels
good first issue Good for newcomers OSCP SecretFlow Open Source Contribution Plan

Comments

@Yeekin-GYJ
Copy link

Yeekin-GYJ commented Jun 11, 2024

此 ISSUE 为 隐语开源共建计划(SecretFlow Open Source Contribution Plan,简称 SF OSCP)Phase 4 任务 ISSUE,欢迎社区开发者参与共建~

This ISSUE is one of the tasks of the SecretFlow Open Source Contribution Plan (referred to as SF OSCP) Phase 4. Welcome to join us in building it together!

任务介绍

  • 任务名称: 使用 SPU 实现 OneHotEncoder
  • 技术方向: SPU/SML
  • 任务难度: 热身🌟
  • 任务期望时间: 2 周
  • 技术 reviewer@deadlywing

详细要求

  • 安全性: 无 reveal
  • 功能性: 支持数值版本的 onehot 功能即可
  • 收敛性: 包含 simulator 跑出的实验数据并且证明收敛/正确性
  • 代码规范: Python 代码需要使用 black+isort 进行格式化(流水线包含代码规范检查卡点)
  • 提交说明: 关联该 issue 并提交代码至 https://github.com/secretflow/spu/tree/main/sml (具体目录位置请与reviewer讨论)
  • 特殊说明: 若某个特性有特殊的限制,如需要 FM128,需要更多 fxp 等需要在注释文档中明确说明

能力要求

  • 熟悉经典的机器学习算法
  • 熟悉 JAX 或 NumPy,可以使用 NumPy 实现算法

操作说明

贡献说明

@Yeekin-GYJ Yeekin-GYJ added this to OSCP Jun 11, 2024
@Yeekin-GYJ Yeekin-GYJ moved this to Needs Triage in OSCP Jun 11, 2024
Copy link

Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.

@Candicepan Candicepan added good first issue Good for newcomers OSCP SecretFlow Open Source Contribution Plan and removed no-issue-activity labels Jul 16, 2024
@Candicepan Candicepan removed this from OSCP Feb 28, 2025
@Candicepan Candicepan moved this to Needs Triage in OSCP Phase4 Season of Dev Feb 28, 2025
@MiKKiYang
Copy link

38832234 Give it to me

@Candicepan Candicepan moved this from Needs Triage to In Progress in OSCP Phase4 Season of Dev Mar 3, 2025
@Candicepan
Copy link
Contributor

38832234 Give it to me

Hello! Congratulations on successfully claiming this task, and thank you for your support of the OSCP! Please complete your contribution within two weeks, otherwise, the task will be released. If you have any questions, please let us know. 😄

恭喜你成功认领了该任务,感谢对 OSCP 的支持~请在 2周内完成该任务贡献,否则,该任务将会被释放哦~如果你有任何疑问,请告知我们~😄

@MiKKiYang
Copy link

MiKKiYang commented Mar 10, 2025

@Candicepan @Yeekin-GYJ @deadlywing

hello,我这边目前实现完成了,我看提交说明里提到:“提交说明: 关联该 isuue 并提交代码至 https://github.com/secretflow/spu/tree/main/sml (具体目录位置请与reviewer讨论)”

但是我的实现依赖了如下的库:
from typing import Any, Dict, List

import jax.numpy as jnp
import numpy as np
import pandas as pd
from secretflow.data import Partition
from secretflow.data.horizontal import HDataFrame
from secretflow.device import SPU, SPUObject
from secretflow.preprocessing.base import _PreprocessBase

主要是有一些secretflow的依赖;

请问具体是提交在哪里?

@MiKKiYang
Copy link

@deadlywing 我好像理解有点问题,实现的方式没有以算法方式而是以工程方式去做了

@Candicepan
Copy link
Contributor

Candicepan commented Mar 10, 2025

@Candicepan @Yeekin-GYJ @deadlywing

hello,我这边目前实现完成了,我看提交说明里提到:“提交说明: 关联该 isuue 并提交代码至 https://github.com/secretflow/spu/tree/main/sml (具体目录位置请与reviewer讨论)”

但是我的实现依赖了如下的库: from typing import Any, Dict, List

import jax.numpy as jnp import numpy as np import pandas as pd from secretflow.data import Partition from secretflow.data.horizontal import HDataFrame from secretflow.device import SPU, SPUObject from secretflow.preprocessing.base import _PreprocessBase

主要是有一些secretflow的依赖;

请问具体是提交在哪里?

本任务具体提交要求:

@MiKKiYang
Copy link

@deadlywing

明白了,那是我之前理解错了,我在secretflow里按照组件的方式实现了横向的onehot算子;

实际上这个题的意思是:用jax.numpy以jittable的方式实现横向场景下的onehot,同时用仿真器将这个算子结合spu进行执行成功就可以了,因为只要是在spu里的算子,就是可证安全的,我这么理解对吧

@deadlywing
Copy link
Contributor

@MiKKiYang
你的理解基本是没问题的,SPU里的任务重点在于纯算法实现,即使用python层算子搭建算法执行流,SPU完成底层算子拆分和MPC执行;会更偏向于密态的sklearn的感觉,所以我们希望API上也尽可能接近sklearn;如之前的一些实现:

class Binarizer:
"""Binarize data (set feature values to 0 or 1) according to a threshold.
Parameters
----------
threshold : float, default=0.0
Feature values below or equal to this are replaced by 0, above it by 1.
"""
def __init__(self, *, threshold=0.0):
self.threshold = threshold
def transform(self, X):
"""Binarize each element of X.
Parameters
----------
X : {array-like} of shape (n_samples, n_features)
The data to binarize, element by element.
Returns
-------
ndarray of shape (n_samples, n_features)
Transformed array.
"""
return binarize(X, threshold=self.threshold)

只需要关注逻辑本身即可,,至于SF中的分布式,partition等概念是完成不需要感知的,,

@deadlywing
Copy link
Contributor

Anyway,,你在实现算法的时候只要假设你是在写一个中心化的算法即可,只在测试的时候调用SPU提供的API将数据加密以及封装要运行的函数即可,具体可以参考tests/ 和 emulations/ 目录下原有的一些代码~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers OSCP SecretFlow Open Source Contribution Plan
Projects
Status: In Progress
Development

No branches or pull requests

4 participants