[RISCV] Combine shuffle of shuffles to a single shuffle (#178095)
Compressing to a single shuffle doesn't remove any information and the backend can better apply specific optimizations to a single shuffle.
Addresses #176218.
---------
Co-authored-by: Luke Lau <luke_lau at igalia.com>
[clang][modules] Add single-module-parse-mode callback (#179714)
This PR adds new preprocessor callback that's invoked whenever the
single-module-parse-mode skips over a module import. This will be used
later on from the dependency scanner.
[lld][WebAssembly] Add new __rodata_start/__rodata_end symbols (#172102)
This is similar to etext/_etext in the ELF linker. Its useful in
emscripten to know where the RO data data ends and the data begins (even
though the Wasm format itself has no concept of RO data).
See
https://github.com/emscripten-core/emscripten/discussions/25939#discussioncomment-15243731
[SPIRV] Implement lowering for HLSL Texture2D sampling intrinsics (#179312)
This patch implements the SPIR-V lowering for the following HLSL
intrinsics:
- SampleBias
- SampleGrad
- SampleLevel
- SampleCmp
- SampleCmpLevelZero
It defines the required LLVM intrinsics in 'IntrinsicsDirectX.td' and
'IntrinsicsSPIRV.td'.
It updates 'SPIRVInstructionSelector.cpp' to handle the new intrinsics
and
generates the correct 'OpImageSample*' instructions with the required
operands
(Bias, Grad, Lod, ConstOffset, MinLod, etc.).
[3 lines not shown]
[Github] Add runs-on to release-tasks.yml
This was failing validation against main and sending everyone emails.
Try adding the fix that was suggested in the workflow run.
[CI] Add compiler-rt to LLDB runtime dependencies (#180590)
Some LLDB tests will only run if compiler-rt is built. This includes at
least two tsan tests that passed in a PR (#179115) but then failed on
other PRs that included compiler-rt in the build.
tests/make: demonstrate use-after-free in the :@ modifier
This test is not run regularly because the output varies depending on
the memory allocator and its configuration. It can be run manually by:
make -r -f unit-tests/varmod-loop-delete.mk use-after-free
[flang][OpenMP] Improve locality check when determining DSA (#180583)
Follow-up to https://github.com/llvm/llvm-project/pull/178739.
The locality check assumed that immediately after the initial symbol
resolution (i.e. prior to the OpenMP code in resolve-directives.cpp),
the scope that owns a given symbol is the scope which owns the symbol's
storage. Turns out that this isn't necessarily true as illustrated by
the included testcase, roughly something like:
```
program main
integer :: j ! host j (storage-owning)
contains
subroutine f
!$omp parallel ! scope that owns j, but j is host-associated
do j = ...
end do
!$omp end parallel
end
[17 lines not shown]
[SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen (#180581)
https://github.com/llvm/llvm-project/pull/66164 changed the hashing in
`SampleContextFrame` from `std::hash` to `MD5` in a very hot function
(ContextTrieNode::getOrCrateChildContext()) in llvm-profgen. This
creates over 2x run time regression when running llvm-profgen with
csspgo preinliner enabled, since the MD5 computation is tripled
comparing to the Murmur hash in the std library. An llvm-profgen run
time comparison shows follows:
```
$ time llvm-profgen -binary $BINARY--perfscript $SAMPLES --populate-profile-symbol-list --show-density --output=XXX
# MD5 hash
real 105m31.644s
user 104m51.334s
sys 0m35.033s
# std::hash
[7 lines not shown]
[mlir][amdgpu] Update TDM ops to use the new barrier type, improve docs (#180572)
Now that we have an AMDGPU dialect type for the in-LDS barriers that the
tensor data mover can automatically visit, update the definition of the
tensor descriptor operations to use said types and document the behavior
of the barrier.