[ConstantFolding] Allow truncation when folding wasm.dot
Changes this to getSigned() to match the signedness of the calculation.
However, we still need to allow truncation because the addition
result may overflow, and the operation is specified to truncate
in that case.
Fixes https://github.com/llvm/llvm-project/issues/175159.
[OpenMP] Remove testing LTO variant on CPU targets (#175187)
Summary:
This is only really meaningful for the NVPTX target. Not all build
environments support host LTO and these are redundant tests, just clean
this up and make it run faster.
[AMDGPU] Handle `s_setreg_imm32_b32` targeting `MODE` register
On certain hardware, this instruction clobbers VGPR MSB `bits[12:19]`, so we need to restore the current mode.
[AMDGPU] Add liverange split instructions into BB Prolog (#117544)
The COPY inserted for liverange split during sgpr-regalloc
pipeline currently breaks the BB prolog during the subsequent
vgpr-regalloc phase while spilling and/or splitting the vector
liveranges. This patch fixes it by correctly including the
LR split instructions during sgpr-regalloc and wwm-regalloc
pipelines into the BB prolog.
[CodeGen] Introduce MI flag for Live Range split instructions (#117543)
For some targets, it is required to identify the COPY instruction
corresponds to the RA inserted live range split. Adding the new
flag `MachineInstr::LRSplit` to serve the purpose.
[LLD][COFF] Prefetch inputs early-on to improve link times (#169224)
This PR reduces outliers in terms of runtime performance, by asking the
OS to prefetch memory-mapped input files in advance, as early as
possible. I have implemented the Linux aspect, however I have only
tested this on Windows 11 version 24H2, with an active security stack
enabled. The machine is a AMD Threadripper PRO 3975WX 32c/64t with 128
GB of RAM and Samsung 990 PRO SSD.
I have used a Unreal Engine-based game to profile the link times. Here's
a quick summary of the input data:
```
Summary
--------------------------------------------------------------------------------
4,169 Input OBJ files (expanded from all cmd-line inputs)
26,325,429,114 Size of all consumed OBJ files (non-lazy), in bytes
9 PDB type server dependencies
0 Precomp OBJ dependencies
350,516,212 Input debug type records
[52 lines not shown]
[CodeGen] Introduce MI flag for Live Range split instructions
For some targets, it is required to identify the COPY instruction
corresponds to the RA inserted live range split. Adding the new
flag `MachineInstr::LRSplit` to serve the purpose.
[clang][bytecode] Fix APValues for arrays in dynamic allocations (#175176)
getType() returns just int for those instead of an array type, so the
previous condition resulted in the array index missing in the APValue's
LValuePath.
[clang][bytecode] Fix initializing array elems from string (#175170)
In the `= {"foo"}` case, we don't have an array filler we can use and we
need to explicitily zero the remaining elements.