Skip to content
View liziniu's full-sized avatar

Highlights

  • Pro

Block or report liziniu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. ReMax ReMax Public

    Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)

    Python 175 13

  2. policy_optimization policy_optimization Public

    Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)

    Python 27 5

  3. HyperDQN HyperDQN Public

    Code for ICLR 2022 Paper (HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning)

    Python 12 1

  4. ISWBC ISWBC Public

    Code for NeurIPS 2023 Paper (Imitation Learning from Imperfection: Theoretical Justifications and Algorithms)

    Python 7

  5. GEM GEM Public

    Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)

    Python 13

  6. cold_start_rl cold_start_rl Public

    Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?

    Python 13