Skip to content

rocBLAS 2.42.0 for ROCm 5.0.0

Compare
Choose a tag to compare
@lawruble13 lawruble13 released this 09 Feb 20:30
60c5f03

Added

  • Added rocblas_get_version_string_size convenience function
  • Added rocblas_xtrmm_outofplace, an out-of-place version of rocblas_xtrmm
  • Added hpl and trig initialization for gemm_ex to rocblas-bench
  • Added source code gemm. It can be used as an alternative to Tensile for debugging and development
  • Added option ROCM_MATHLIBS_API_USE_HIP_COMPLEX to opt-in to use hipFloatComplex and hipDoubleComplex

Optimizations

  • Improved performance of non-batched and batched single-precision GER for size m > 1024. Performance enhanced by 5-10% measured on a MI100 (gfx908) GPU.
  • Improved performance of non-batched and batched HER for all sizes and data types. Performance enhanced by 2-17% measured on a MI100 (gfx908) GPU.

Changed

  • Instantiate templated rocBLAS functions to reduce size of librocblas.so
  • Removed static library dependency on msgpack
  • Removed boost dependencies for clients

Fixed

  • Option to install script to build only rocBLAS clients with a pre-built rocBLAS library
  • Correctly set output of nrm2_batched_ex and nrm2_strided_batched_ex when given bad input
  • Fix for dgmm with side == rocblas_side_left and a negative incx
  • Fixed out-of-bounds read for small trsm
  • Fixed numerical checking for tbmv_strided_batched