vllm.compilation.matcher_utils ¶
QUANT_OPS module-attribute
¶
QUANT_OPS: dict[QuantKey, OpOverload] = {
kFp8StaticTensorSym: default,
kFp8DynamicTensorSym: default,
kFp8DynamicTokenSym: default,
}
MatcherCustomOp ¶
Bases: ABC
Source code in vllm/compilation/matcher_utils.py
__call__ ¶
__init__ ¶
__init__(enabled: bool)
Source code in vllm/compilation/matcher_utils.py
empty ¶
empty_f32 ¶
forward_custom abstractmethod
¶
forward_native abstractmethod
¶
MatcherFusedAddRMSNorm ¶
Bases: MatcherCustomOp
Source code in vllm/compilation/matcher_utils.py
MatcherQuantFP8 ¶
Bases: MatcherCustomOp
Source code in vllm/compilation/matcher_utils.py
__init__ ¶
Source code in vllm/compilation/matcher_utils.py
forward_custom ¶
Source code in vllm/compilation/matcher_utils.py
forward_native ¶
inputs ¶
make_scale ¶
make_scale(input: Tensor)
Source code in vllm/compilation/matcher_utils.py
MatcherRMSNorm ¶
Bases: MatcherCustomOp
Source code in vllm/compilation/matcher_utils.py
MatcherSiluAndMul ¶
Bases: MatcherCustomOp