Skip to content

vllm.v1.attention.ops.chunked_prefill_paged_decode

has_native_kv_cache_layout

has_native_kv_cache_layout(
    key_cache: Tensor, value_cache: Tensor
) -> bool

Return whether KV cache blocks can use the native ROCm pairing.

The native reshape_and_cache writer assumes packed blocks. If cache update needs reshape_and_cache_flash for a stride-padded hybrid layout, decode should use the matching Triton path too.

Source code in vllm/v1/attention/ops/chunked_prefill_paged_decode.py
def has_native_kv_cache_layout(
    key_cache: torch.Tensor,
    value_cache: torch.Tensor,
) -> bool:
    """Return whether KV cache blocks can use the native ROCm pairing.

    The native reshape_and_cache writer assumes packed blocks. If cache update
    needs reshape_and_cache_flash for a stride-padded hybrid layout, decode
    should use the matching Triton path too.
    """
    return (
        key_cache.stride(0) == key_cache.shape[1:].numel()
        and value_cache.stride(0) == value_cache.shape[1:].numel()
    )