[APFloat] Add exp function for APFloat::IEEESsingle using expf implementation from LLVM libc. (#143959)
Discourse RFC:
https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450
- The implementation in LLVM libc is header-only.
- expf implementation in LLVM libc is correctly rounded for all rounding
modes.
- LLVM libc implementation will round to the floating point
environment's rounding mode.
- No cmake build dependency between LLVM and LLVM libc, only requires
LLVM libc source presents in llvm-project/libc folder.
[HLSL][Matrix] Add Matrix Bool and represent them as i32 elements (#171051)
fixes #171049
fixes #171050
- Allow Bools for matrix type when in HLSL mode
- use ConvertTypeForMem to figure out the bool size
- Add Bool matrix types to hlsl_basic_types.h
---------
Co-authored-by: Helena Kotas <hekotas at microsoft.com>
[SDAG] Fix incorrect usage of VECREDUCE_ADD (#171459)
The mask needs to be extended to `i32` before reducing or the reduction
can incorrectly optimized to a VECREDUCE_XOR.
[LoopPeel] Peel last iteration to enable natural-sized load widening
In loop that contain multiple consecutive small loads (e.g., 3 bytes
loading i8s), peeling the last iteration makes it safe to read beyond
the accessed region, enabling a wider load (e.g., i32) for all other
N-1 iterations.
This optimization targets patterns like:
```
%a = load i8, ptr %p
%b = load i8, ptr %p+1
%c = load i8, ptr %p+2
...
%p.next = getelementptr i8, ptr %p, 3
```
Which can be transformed to:
```
%wide = load i32, ptr %p ; Read 4 bytes
[9 lines not shown]
[llvm][RISCV] Support mulh for P extension codegen (#171581)
For mulh pattern with operands that are both signed or unsigned,
combination is performed automatically. However for mulh with operands
which are signed and unsigned respectively we need to combine them
manually same approach as what we've done for PASUB*.
Note: This is first patch for mulh which only handle basic high part
multiplication, there will be followup patches to handle rest of mulh
related instructions.
[SandboxIRTest] Use larger integer type
Use i32 instead of i1 so that the value fits. Possibly there was
some confusion with the condition argument of the select here.
[WPD] Avoid implicit truncation when creating full set
Use the bit mask for the type instead of `~0`, so that we don't
rely on implicit truncation of the top bits.
[ExpandFp] Use getSignMask() (NFC)
This was using getSigned() with an unsigned (not sign extended)
argument. Using plain get() would be correct here. We can go
one step further and use getSignMask() to avoid the issue entirely.
[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965)
This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867)
Existing code recomputes the cost for creating a shuffle instruction even for the
repeating Intrinsic operand pairs. This will result in higher newCost.
Hence the runtime will decide not to fold.
The change proposed in this pr will address this issue. When calculating
the newCost we are skipping the cost calculation of an operand pair if
it was already considered. And when creating the transformed code, we
are reusing the already created shuffle instruction for repeated operand
pair.
[NFC][SPIRV] Re-work extension parsing (#171826)
This changes the extension parsing mechanism underpinning `--spirv-ext`
to be more explicit about what it is doing and not rely on a sort. More
specifically, we partition extensions into enabled (prefixed with `+`)
and others, and then individually handle the resulting ranges.
[OpenMP] Fix libarcher tests on Ubuntu 22.04 (#170671)
When llvm-symbolizer is not found on PATH TSan uses system's addr2line
instead. On Ubuntu 22.04 addr2line can't handle DWARF v5, which results
in failures in some libarcher tests.
This PR adds the directory of the just built LLVM binaries to PATH, to
make llvm-symbolizer available to TSan.
The changes were tested on an AArch64 machine, on which
task-taskgroup-unrelated.c was flaky. Moving the test code to a separate
function, executed 10 times, solved the issue.
Fixes #170138
[OpenMP][CIR] Add basic infrastructure for CIR lowering (#171902)
This patch adds the basic infrastructure for lowering an OpenMP
directive, which should enable someone to take over the OpenMP lowering
in the future. It adds the lowering entry points to CIR in the same way
as OpenACC.
Note that this does nothing with any of the directives, which will
happen in a followup patch. No infrastructure for clauses is added
either, but that will come in a followup patch as well.
[lldb][test] Xfail 3 backtrace related tests on Windows on Arm (#172300)
Since we updated our buildbot setup, these have been failing. Ignore
them until we have time to find the real problem, which is something to
do with failing to backtrace, or missing debug info when we do.
[DAG] foldAddToAvg - optimize nested m_Reassociatable matchers (#171681)
The use of nested m_Reassociatable matchers by #169644 can result in
high compile times as the inner m_Reassociatable call is being repeated
a lot while the outer call is trying to match. Place the inner
m_ReassociatableAnd at the beginning of the pattern so it is not
repeatedly matched in recursion.
[llvm-symbolizer] Recognize and symbolize archive members (#150401)
This PR adds support for selecting specific archive members in
llvm-symbolizer using the `archive.a(member.o)` syntax, with
architecture-aware member selection.
**Key features:**
1. **Archive member selection syntax**: Specify archive members using
`archive.a(member.o)` format
2. **Architecture selection via `--default-arch` flag**: Select the
appropriate member when multiple members have the same name but
different architectures
3. **Architecture selection via `:arch` suffix**: Alternative syntax
`archive.a(member.o):arch` for specifying architecture
This functionality is primarily designed for AIX big archives, which can
contain multiple members with the same name but different architectures
(32-bit and 64-bit). However, the implementation works with all archive
formats (GNU, BSD, Darwin, big archive) and handles same-named members
[4 lines not shown]
[flang][OpenMP] Generalize checks of loop construct structure (#170735)
For an OpenMP loop construct, count how many loops will effectively be
contained in its associated block. For constructs that are loop-nest
associated this number should be 1. Report cases where this number is
different.
Take into account that the block associated with a loop construct can
contain compiler directives.
[DebugInfo][DWARF] Use DW_AT_call_target_clobbered for exprs with volatile regs (#172167)
Without this patch DW_AT_call_target is used for all indirect call address
location expressions. The DWARF spec says:
For indirect calls or jumps where the address is not computable without use
of registers or memory locations that might be clobbered by the call the
DW_AT_call_target_clobbered attribute is used instead of the
DW_AT_call_target attribute.
This patch implements that behaviour.
[GlobalISel](NFC) Refactor construction of LLTs in `LegalizerHelper` (#170664)
I spotted a number of places where we're duplicating logic provided by
the `LLT` class inline in `LegalizerHelper`. This PR tidies up these
spots.