Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bnb] Introducing BitsAndBytesConfig #21579

Merged
merged 25 commits into from
Feb 17, 2023

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Feb 11, 2023

What does this PR do?

This PR introduces a new feature termed as BitsandbytesConfig - enabling users to play more flexibly with transformers + bitsandbytes API.
This is also a first step of one of the major refactor we are planning to support possibly more quantization features

This PR also addresses: #20281 (comment)

With this PR it will be also possible to apply advanced usecases for bnb models such as offloading parameters in cpu & gpu to run some part in int8 and the other part in cpu (but in fp32)

Draft for now, will update the docs and fix the CI tests after a first pass 💪

One of my comment being that we should keep load_in_8bit argument as it is for now (as I think this argument is quite powerful) and progressively entirely replace it with the config in the future if more quantization methods will be supported on bitsandbytes.

cc @sgugger

- add v1
- add tests
- more user-friendly API
- add docs
@alexconstant9108
Copy link

@younesbelkada great job again. :)
I really like the refactored API which supports passing a whole config object containing all relevant BnB options in one place.
I am curious are the calculations during inference on the offloaded parameters (in fp32) to CPU and disk actually executed on the CPU or do they get transferred to one of the GPUs for calculations during every pass?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I have thoughts on the API, left a couple of comments!

@@ -0,0 +1,19 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a quantization page, not just bitsandbytes. We may support more configs in the future (for instance coming from optimum).

specific language governing permissions and limitations under the License.
-->

# `bitsandbytes` Integration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should thus be one section, but not the whole title.


TODO: write documentaiton here

## BitsandbytesConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Camel-cased name please (BitsAndBytesConfig)

load_in_8bit_skip_modules (`List[str]`, *optional*):
An explicit list of the modules that we do not want to convert in 8-bit. This is useful for models such
as Jukebox that has several heads in different places and not necessarily at the last position.
bitsandbytes_config (`Dict`, *optional*):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be quantization_config here. And actualyl, whay about load_in_8bit supporting both True and a config?

Comment on lines 2101 to 2104
if load_in_8bit and bitsandbytes_config is not None:
raise ValueError(
"You can't pass both `load_in_8bit=True` and `bitsandbytes_config` as they are mutually exclusive."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes me think supporting both bool and config to load_in_8bit is a better API.

Comment on lines 2116 to 2119
logger.warning(
"The `load_in_8bit` argument will be deprecated in the future (v5). Please use "
"`BitsandbytesConfig` instead."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so that's your plan. I actually like having a simple load_in_8bit=True that loads the most basic config.

@younesbelkada younesbelkada changed the title [bnb] Introducing BitsandbytesConfig [bnb] Introducing BitsAndBytesConfig Feb 14, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 14, 2023

The documentation is not available anymore as the PR was closed or merged.

@@ -4081,6 +4085,9 @@
logging,
)

# bitsandbytes config
from .utils.quantization_config import BitsAndBytesConfig
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right approach? If I protect this import with is_bitsandbytes_available the docs won't be able to build it seems

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, especially since all bnb imports are protected, so you can import this directly here.

@younesbelkada younesbelkada marked this pull request as ready for review February 14, 2023 16:51
@younesbelkada
Copy link
Contributor Author

This PR is now ready for review! Would love to have a round of review @sgugger

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. I was thinking that maybe the **kwargs in from pretrained could be used to update any attribute of the config when you instantiate it (like for regular model config arguments) so this way load_in_8bit could become an attribute of this config and there is no need to deprecate it.

@@ -4081,6 +4085,9 @@
logging,
)

# bitsandbytes config
from .utils.quantization_config import BitsAndBytesConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, especially since all bnb imports are protected, so you can import this directly here.

@younesbelkada
Copy link
Contributor Author

Thanks for the extensive review! I should have addressed the comments now 💪

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment and we should be good to go!

Comment on lines 2119 to 2121
quantization_config_kwargs = {
k: v for k, v in kwargs.items() if k in inspect.signature(BitsAndBytesConfig).parameters
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to have BitsAndBytesConfig have a from_dict method with return_unused_kwargs, like PretrainedConfig. We can then do

quantization_config, kwargs = BitsAndBytesConfig(**kwargs)

Copy link
Contributor Author

@younesbelkada younesbelkada Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!
I think we stil need to add:

quantization_config_kwargs = {
                k: v for k, v in kwargs.items() if k in inspect.signature(BitsAndBytesConfig).parameters
            }

to check if a user has passed quantization_config together with some other associated kwargs inside from_pretrained. But this check is done only in the case elif quantization_config is not None:.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this!

Next step would be to save this in the model config and be able to re-load a checkpoint pushed to the Hub quantized!

@younesbelkada
Copy link
Contributor Author

Thanks ! Exaclty! Looking forward to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants