SVE2 API Proposal: UQSHRNB And UQSHRNT Intrinsics
Background and Motivation
This API proposal addresses the need for unsigned versions of the ShiftRightArithmeticNarrowingSaturateEven (SQSHRNB) and ShiftRightArithmeticNarrowingSaturateOdd (SQSHRNT) operations within the System.Runtime.Intrinsics.Arm namespace. This proposal is based on a comment in this issue and the subsequent discussion. The original issue, dotnet/runtime#94015, highlighted the necessity for a complete set of SVE2 intrinsics to fully utilize the Scalable Vector Extension 2 (SVE2) capabilities in .NET. SVE2 intrinsics are crucial for vectorized operations, allowing developers to write high-performance code that can adapt to different vector lengths at runtime. This adaptability is a key feature of SVE2, enabling code to scale efficiently across various hardware platforms.
The initial approval for shift right arithmetic narrowing saturate operations included both signed and unsigned versions, as shown below:
/// T: [sbyte, short], [short, int], [int, long], [byte, ushort], [ushort, uint], [uint, ulong]
public static unsafe Vector<T> ShiftRightArithmeticNarrowingSaturateEven(Vector<T2> value, [ConstantExpected] byte count); // SQSHRNB or UQSHRNB
/// T: [sbyte, short], [short, int], [int, long], [byte, ushort], [ushort, uint], [uint, ulong]
public static unsafe Vector<T> ShiftRightArithmeticNarrowingSaturateOdd(Vector<T> even, Vector<T2> value, [ConstantExpected] byte count); // SQSHRNT or UQSHRNT
However, the actual implementation only covered the signed versions (SQSHRNB and SQSHRNT) for the following data types:
/// T: [sbyte, short], [short, int], [int, long]
public static unsafe Vector<T> ShiftRightArithmeticNarrowingSaturateEven(Vector<T2> value, [ConstantExpected] byte count); // SQSHRNB
/// T: [sbyte, short], [short, int], [int, long]
public static unsafe Vector<T> ShiftRightArithmeticNarrowingSaturateOdd(Vector<T> even, Vector<T2> value, [ConstantExpected] byte count); // SQSHRNT
This proposal aims to address this gap by introducing dedicated APIs for the unsigned versions of these operations, specifically UQSHRNB and UQSHRNT. The distinction between signed and unsigned operations is essential because they behave differently when dealing with negative numbers. Arithmetic right shifts on signed integers preserve the sign bit, effectively dividing the number by a power of 2 while maintaining its sign. In contrast, logical right shifts, which are used for unsigned integers, fill the vacated bits with zeros. Therefore, providing separate APIs for unsigned operations ensures correct and predictable behavior for unsigned data types. The inclusion of these unsigned versions will provide a more complete and consistent API surface for SVE2 intrinsics, allowing developers to leverage the full potential of the architecture when working with unsigned integer data.
API Proposal
This section outlines the proposed APIs for the unsigned shift right logical narrowing saturate even (UQSHRNB) and shift right logical narrowing saturate odd (UQSHRNT) operations. These APIs will be added to the System.Runtime.Intrinsics.Arm namespace to align with the existing SVE2 intrinsics. The proposed APIs are designed to be consistent with the naming conventions and parameter structures of the existing signed operations (SQSHRNB and SQSHRNT), making them easier to understand and use for developers already familiar with the SVE2 intrinsics. The use of Vector<T> and Vector<T2> types ensures compatibility with the .NET vector types, allowing these intrinsics to seamlessly integrate with existing vectorized code. The [ConstantExpected] attribute on the count parameter is a hint to the JIT compiler that the shift count is expected to be a constant, which can enable further optimizations. By providing these dedicated unsigned APIs, developers can more efficiently and accurately perform shift and saturation operations on unsigned integer vectors, which is crucial for a variety of applications, including image processing, signal processing, and cryptography.
namespace System.Runtime.Intrinsics.Arm
{
// T: [byte, ushort], [ushort, uint], [uint, ulong]
public static unsafe Vector<T> ShiftRightLogicalNarrowingSaturateEven(Vector<T2> value, [ConstantExpected] byte count); // UQSHRNB
// T: [byte, ushort], [ushort, uint], [uint, ulong]
public static unsafe Vector<T> ShiftRightLogicalNarrowingSaturateOdd(Vector<T> even, Vector<T2> value, [ConstantExpected] byte count); // UQSHRNT
}
Detailed API Breakdown
-
ShiftRightLogicalNarrowingSaturateEven(Vector<T2> value, [ConstantExpected] byte count)- This method performs a shift right logical narrowing saturate even operation. It takes a vector of type
T2(value) and a shift count (count) as input. The operation shifts each element invalueto the right by the specifiedcountbits, narrowing the result to the typeT. The narrowing process involves taking the lower half of the bits from each element in the input vector. Saturation is applied, meaning that if the shifted value exceeds the maximum or minimum value representable by typeT, it will be clamped to that maximum or minimum value. This intrinsic is particularly useful for packing data into smaller vector elements while preventing overflow or underflow. - The supported types for
Tarebyte,ushort, anduint. The corresponding types forT2areushort,uint, andulong, respectively. This ensures that the narrowing operation is well-defined and that the output typeTcan accommodate the lower half of the bits from typeT2. - The
[ConstantExpected]attribute on thecountparameter indicates that the JIT compiler should expect the shift count to be a constant value. This allows the compiler to perform optimizations such as using specialized instructions or pre-computing shift tables, which can significantly improve performance.
- This method performs a shift right logical narrowing saturate even operation. It takes a vector of type
-
ShiftRightLogicalNarrowingSaturateOdd(Vector<T> even, Vector<T2> value, [ConstantExpected] byte count)- This method performs a shift right logical narrowing saturate odd operation. It takes two input vectors:
evenof typeTandvalueof typeT2, along with a shift count (count). This operation is designed to work in conjunction with theShiftRightLogicalNarrowingSaturateEvenintrinsic to process vector elements in pairs. Theevenvector provides the even-numbered elements of the result, while thevaluevector is shifted and narrowed to produce the odd-numbered elements. - Similar to the even version, each element in
valueis shifted to the right bycountbits, narrowed to typeT, and saturated. The result is then interleaved with the elements from theevenvector to produce the final output. This interleaving is a key aspect of the “odd” operation, allowing for efficient processing of data that is naturally arranged in an interleaved manner. - The supported types for
TandT2are the same as in the even version:Tcan bebyte,ushort, oruint, andT2can beushort,uint, orulong, respectively. This consistency simplifies the use of these intrinsics, as developers can apply the same type mappings across both even and odd operations. - The
[ConstantExpected]attribute on thecountparameter again signals to the JIT compiler that the shift count is likely to be a constant, enabling similar optimizations as in the even version. This can lead to significant performance gains, especially in loops or other performance-critical sections of code.
- This method performs a shift right logical narrowing saturate odd operation. It takes two input vectors:
Benefits of the Proposed API
- Completeness: By adding the unsigned versions of the shift right narrowing saturate operations, the SVE2 intrinsic API becomes more complete, providing developers with the tools they need to handle a wider range of data types and operations. This completeness is essential for ensuring that .NET developers can fully leverage the capabilities of the SVE2 architecture.
- Consistency: The proposed APIs are consistent with the existing signed versions in terms of naming, parameters, and behavior. This consistency makes the APIs easier to learn and use, reducing the cognitive load on developers and improving code readability. Consistency also helps to prevent errors, as developers can apply their existing knowledge of the signed operations to the unsigned ones.
- Performance: These intrinsics are designed to map directly to hardware instructions, providing excellent performance for vectorized shift and saturation operations. The
[ConstantExpected]attribute further enhances performance by allowing the JIT compiler to generate optimized code for constant shift counts. This performance is critical for applications that require high throughput, such as image and signal processing. - Correctness: Providing separate APIs for signed and unsigned operations ensures that the correct shift behavior is applied for each data type. This is crucial for avoiding unexpected results and ensuring the accuracy of computations. Using logical shifts for unsigned integers and arithmetic shifts for signed integers is a fundamental aspect of integer arithmetic, and these APIs enforce this distinction.
In conclusion, the addition of these UQSHRNB and UQSHRNT intrinsics to the System.Runtime.Intrinsics.Arm namespace is a significant step towards providing a comprehensive and performant API for SVE2 programming in .NET. These intrinsics will empower developers to write efficient and accurate code for a variety of applications, taking full advantage of the capabilities of the SVE2 architecture.
For further reading on SIMD and vectorization, you might find the resources at Intel Intrinsics Guide helpful. This guide provides detailed information on various SIMD instructions and their usage, which can enhance your understanding of the concepts discussed in this proposal.