[clang][LoongArch] Check target features in CheckLoongArchBuiltinFunctionCall (#191811)
Add target features check in `CheckLoongArchBuiltinFunctionCall`, thus
we could through an error
when pass the `-mno-lsx` to clang while using the builtin LSX intrinsics
for global variables instead of
trigger an ICE.
Minimal Example:
```
// clang-20 --march=loongarch64 -mno-lsx -S -o - "x.cc"
__attribute__((__vector_size__(16))) long foo = __builtin_lsx_vinsgr2vr_w(foo, 0, 0);
```
and the compiler will output
```
x.cc:1:49: error: builtin needs target feature lsx
1 | __attribute__((__vector_size__(16))) long foo = __builtin_lsx_vinsgr2vr_w(foo, 0, 0);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[2 lines not shown]
[Driver] Gnu: Move -s/-t/-u emission to match GCC order (#192883)
GCC places -s, -t, and -u sym in one contiguous group just before
Scrt1.o / crt1.o, with -L paths after the CRT files.
Match that ordering by dropping the early `push_back("-s")` and rolling
-s, -t, and -u into one addAllArgs call placed immediately after -o
output. This keeps -Wl,... after -s/-t/-u so that user overrides like
-Wl,--strip-debug still take precedence. Update linux-ld-args.c to
-T remains at the end so earlier -L paths take precedence.
[MLIR][Vector] Fix `scf.for` block-argument yields in warp distribution (#192247)
Teach WarpOpScfForOp to remap yielded `scf.for` body block arguments
through `argMapping` before creating the replacement `gpu.yield`.
Handle yielded loop-carried values and other `scf.for` body block
arguments when moving the loop body into the new inner warp op, instead
of reusing the pre-merge values.
Add a regression test for yielding a loop-carried block argument during
warp distribution.
Fix #186573
[JumpTableToSwitch] Fix wrong function used for GUID computation (#192877)
The FuncToGuid lambda's fallback path (when target functions lack !guid
metadata) was using 'F' (the caller) instead of 'Fct' (the callee) in
getIRPGOFuncName, causing all GUID lookups to resolve to the caller's
GUID.
[llvm-profgen] Add branch/target validation (#188620)
Add extra branch source and target validation checks for LBR samples.
This is to check whether there are branch source samples that do not
match a call/branch/ret instruction in the binary, and branch target
samples that do not match a resolved Imm target address, or a function
start address (in case of an indirect call).
Example output:
```
# X86
warning: 0.01%(27/376876) of sampled target addresses do not match the binary.
# AArch64
warning: 0.01%(63782/795824826) of branch samples do not match the binary.
warning: 0.01%(70468/795824826) of branch targets do not match the binary.
```
Run time overhead:
```
Before:
[8 lines not shown]
[AMDGPU] Generate waterfall for calls with SGPR(inreg) argument (#146997)
Fixing issue https://github.com/llvm/llvm-project/issues/140780
Generate waterfall loop for call using SGPR(inreg) argument but result
from divergent source (e.g. VGPR).
[clang-tidy][NFC] Add test case confirming #190944 is fixed (#192707)
Closes #190944.
This issue is already fixed, and this change just adds a test case to
confirm that.
[LV] Add additional test cases with predicated inductions. (NFC) (#192875)
Extend test coverage with predicated IVs both with and without
additional predicates from LoopAccessInfo.
[SelectOptimize] Emit Fatal Error instead of Asserting on null PSI (#192871)
SelectOptimize expects to have PSI available which will normally be
available if the pipeline is set up correctly to require
ProfileSummaryInfo at the beginning. However, we do not want to assert
if someone sets up the pipeline incorrectly, instead reporting a fatal
usage error.
Fixes #192759.
[BitcodeReader] Simplify CST_CODE_DATA constant parsing (NFC) (#190846)
Cleanup boilerplate in the CST_CODE_DATA case for bitcode parser. This
is generally an aesthetic improvement and can significantly reduce stack
usage on some compilers (such as MSVC) where each SmallVector previously
was allocated it's own individual space.
[test] Improve driver option coverage (#192861)
Split the Linux GNU-linker assertions into a dedicated file and extend
them with previously-untested passthrough flags handled by
gnutools::Linker::ConstructJob: -s, -u sym, -rdynamic (-export-dynamic),
-Wl,-z keyword pairs.
LoongArch on Linux previously had no test for -X / --no-relax, which are
emitted by Gnu.cpp for LoongArch and RISC-V.
[LPM][LegacyPM] Reenable LCSSA Verification
This was disabled about a decade ago due to issues with LoopSink.
LoopSink has since had its LegacyPM version removed and is now a
function pass due to not needing too much loop infrastructure. So we can
try enabling this again to prevent backsliding on important cases while
we work on switching to the NewPM which does enforce these things.
Eventually we will want to add assertions here for LoopStrengthReduce,
but given it does not correctly preserve LCSSA, postpone that for now.
Reviewers: arsenm, Meinersbur, nikic, fhahn
Pull Request: https://github.com/llvm/llvm-project/pull/191667
[CIR] Fix __builtin_clz/__builtin_ctz poison_zero to respect target
CIR was hardcoding poisonZero=true for all clz/ctz builtins, ignoring
the target's isCLZForZeroUndef(). This caused incorrect UB on targets
like AArch64 where clz/ctz of zero is well-defined.
Also add support for __builtin_c[lt]zg fallback (2-arg) variants with
compare+select, and add NYI stubs for elementwise variants.
[ConstantFold] Support byte values in `bitcast` constant folding (#188030)
Add support for constant folding `bitcast` instructions including
`ConstantByte` values. This patch handles bitcasts between byte types
and integer, FP, and other byte types in both directions.
`poison` source bytes are preserved, rather than letting the generic
integer fold refine them to `undef` or zero. This is because some
threading optimizations just compare the result of constant folding
(e.g., https://github.com/llvm/llvm-project/pull/114280).
Folds are skipped for byte `bitcast`s where element counts don't divide
evenly and the source contains `poison` values. Some of these casts can
be folded. However, this is left for a future PR.
[NFC][Loop] Remove unused verbose loop debug output
This has been defined out using the preprocessor for ~14 years. Given it
doesn't look like it has gotten any use since then, just remove it to
clean the code up a bit.
Reviewers: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/192830
[lldb] Rally around triple rather than arch in the API tests (#192818)
This PR removes as much uses of arch as possible, in favor of using
triple directly. Most of the changes are in the builder, which no longer
passes ARCH to Make, and of course in Makefile.rules.
This significantly simplifies the remote Darwin test suite, as it
previously had to try and piece together the triple from the platform
and the arch. As an added benefit, we now go through the same code path
for host and remote test runs.
I have tested this on Darwin and Linux and made the changes with the
remote test suites in mind, but it's possible I missed something not
caught by my local testing.
[LSR] Autogenerate some tests
pr25541.ll - Was a regression test for a crash. Make a note and
autogenerate the tests.
bin_power.ll - Was essentially doing what UTC would do already, minus
one test where the assembly output is somewhat large, but the large
assembly output shouldn't be an issue with an autoupdate script and
isn't big enough to justify not including in my opinion.
This makes updating some tests easier for planned changes to LCSSA
preservation.
Reviewers: fhahn, nikic
Pull Request: https://github.com/llvm/llvm-project/pull/191664