vllm.model_executor.models.transformers.utils ¶
Transformers backend utilities.
Style module-attribute
¶
Style = Literal[
"colwise",
"colwise_rep",
"rowwise",
"rowwise_rep",
"replicate",
]
can_enable_torch_compile ¶
can_enable_torch_compile(vllm_config: VllmConfig) -> bool
Callable to be passed to @support_torch_compile
's enable_if
argument.
Defaults to True
but is disabled in the following situations:
- The model uses dynamic rope scaling.
Source code in vllm/model_executor/models/transformers/utils.py
get_feature_request_tip ¶
Source code in vllm/model_executor/models/transformers/utils.py
init_on_device_without_buffers ¶
init_on_device_without_buffers(device: device)
A context manager under which models are initialized with all parameters on the specified device. However buffers are not initialized on specified device.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device | `torch.device` | Device to initialize all parameters on. | required |
Source code in vllm/model_executor/models/transformers/utils.py
log_replacement ¶
replace_linear_class ¶
replace_linear_class(
linear: Linear,
style: Style = "replicate",
quant_config: QuantizationConfig | None = None,
*,
prefix: str = "",
) -> (
ColumnParallelLinear
| RowParallelLinear
| ReplicatedLinear
)
Replace nn.Linear with one of vLLM's tensor parallel linear classes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
linear | Linear |
| required |
style | Style | Tensor parallel style of the new linear, e.g. "colwise". | 'replicate' |
quant_config | QuantizationConfig | None | Quantization config for the new linear. | None |
Returns: The new linear.
Source code in vllm/model_executor/models/transformers/utils.py
replace_rms_norm_class ¶
Replace a Transformers RMSNorm with vLLM's RMSNorm.
This method assumes: - Weight is stored as weight
. - Epsilon is stored as eps
or variance_epsilon
. - with_scale
indicates whether the layer has a weight (Gemma3n only). - var_hidden_size
is only ever used for Intern vision encoder in vLLM and Transformers doesn't appear to have the same concept.