LLVM/project 4190d57libc/src/__support/math expf_static_rounding.h, llvm CMakeLists.txt

[APFloat] Add exp function for APFloat::IEEESsingle using expf implementation from LLVM libc. (#143959)

Discourse RFC:
https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450

- The implementation in LLVM libc is header-only.
- expf implementation in LLVM libc is correctly rounded for all rounding
modes.
- LLVM libc implementation will round to the floating point
environment's rounding mode.
- No cmake build dependency between LLVM and LLVM libc, only requires
LLVM libc source presents in llvm-project/libc folder.
DeltaFile
+74-0llvm/unittests/ADT/APFloatTest.cpp
+43-0libc/src/__support/math/expf_static_rounding.h
+26-0llvm/lib/Support/APFloat.cpp
+6-0llvm/lib/Support/CMakeLists.txt
+6-0llvm/include/llvm/ADT/APFloat.h
+5-0llvm/CMakeLists.txt
+160-02 files not shown
+162-18 files

LLVM/project 491e001clang/include/clang/AST TypeBase.h, clang/lib/CodeGen CGExpr.cpp

[HLSL][Matrix] Add Matrix Bool and represent them as i32 elements (#171051)

fixes #171049
fixes #171050

- Allow Bools for matrix type when in HLSL mode
- use ConvertTypeForMem to figure out the bool size
- Add Bool matrix types to hlsl_basic_types.h

---------

Co-authored-by: Helena Kotas <hekotas at microsoft.com>
DeltaFile
+151-0clang/test/CodeGenHLSL/BoolMatrix.hlsl
+33-0clang/test/CodeGenHLSL/basic_types.hlsl
+25-4clang/include/clang/AST/TypeBase.h
+17-0clang/lib/Headers/hlsl/hlsl_basic_types.h
+7-3clang/lib/Sema/SemaChecking.cpp
+6-1clang/lib/CodeGen/CGExpr.cpp
+239-83 files not shown
+245-119 files

LLVM/project 1847a4ellvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp, llvm/test/CodeGen/AArch64 sve-vector-compress.ll

[SDAG] Fix incorrect usage of VECREDUCE_ADD (#171459)

The mask needs to be extended to `i32` before reducing or the reduction
can incorrectly optimized to a VECREDUCE_XOR.
DeltaFile
+771-910llvm/test/CodeGen/X86/vector-compress.ll
+13-12llvm/test/CodeGen/AArch64/sve-vector-compress.ll
+6-2llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+790-9243 files

LLVM/project 77c407blibc/include/llvm-libc-macros netinet-in-macros.h, libc/test/include netinet_in_test.cpp

[libc] Add `IN6_IS_ADDR_LOOPBACK`
DeltaFile
+9-0libc/include/llvm-libc-macros/netinet-in-macros.h
+5-0libc/test/include/netinet_in_test.cpp
+14-02 files

LLVM/project b596878clang/lib/Sema SemaRISCV.cpp

[llvm][RISCV] Add frm range check for xsfvfnrclipxfqf (#172135)

DeltaFile
+10-0clang/lib/Sema/SemaRISCV.cpp
+10-01 files

LLVM/project db2ebdalibc/include/llvm-libc-macros netinet-in-macros.h, libc/test/include netinet_in_test.cpp

[libc] Add `IN6_IS_ADDR_UNSPECIFIED`
DeltaFile
+7-0libc/test/include/netinet_in_test.cpp
+6-0libc/include/llvm-libc-macros/netinet-in-macros.h
+13-02 files

LLVM/project c5920cbllvm/include/llvm/Transforms/Utils LoopPeel.h UnrollLoop.h, llvm/lib/Transforms/Scalar LoopUnrollPass.cpp

[LoopPeel] Peel last iteration to enable natural-sized load widening

In loop that contain multiple consecutive small loads (e.g., 3 bytes
loading i8s), peeling the last iteration makes it safe to read beyond
the accessed region, enabling a wider load (e.g., i32) for all other
N-1 iterations.

This optimization targets patterns like:
```
  %a = load i8, ptr %p
  %b = load i8, ptr %p+1
  %c = load i8, ptr %p+2
  ...
  %p.next = getelementptr i8, ptr %p, 3
```

Which can be transformed to:
```
  %wide = load i32, ptr %p  ; Read 4 bytes

    [9 lines not shown]
DeltaFile
+617-0llvm/test/Transforms/LoopUnroll/peel-last-iteration-load-widening.ll
+233-1llvm/lib/Transforms/Utils/LoopPeel.cpp
+104-0llvm/test/Transforms/LoopUnroll/peel-last-iteration-load-widening-be.ll
+15-3llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+12-1llvm/include/llvm/Transforms/Utils/LoopPeel.h
+2-1llvm/include/llvm/Transforms/Utils/UnrollLoop.h
+983-66 files

LLVM/project ef927aellvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVInstrInfoP.td, llvm/test/CodeGen/RISCV rvp-ext-rv64.ll rvp-ext-rv32.ll

[llvm][RISCV] Support mulh for P extension codegen (#171581)

For mulh pattern with operands that are both signed or unsigned,
combination is performed automatically. However for mulh with operands
which are signed and unsigned respectively we need to combine them
manually same approach as what we've done for PASUB*.

Note: This is first patch for mulh which only handle basic high part
multiplication, there will be followup patches to handle rest of mulh
related instructions.
DeltaFile
+155-0llvm/test/CodeGen/RISCV/rvp-ext-rv64.ll
+78-0llvm/test/CodeGen/RISCV/rvp-ext-rv32.ll
+47-30llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+17-6llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+297-364 files

LLVM/project 197dea8libc/include/llvm-libc-macros netinet-in-macros.h, libc/test/include netinet_in_test.cpp

[libc] Add `IN6_IS_ADDR_UNSPECIFIED`
DeltaFile
+6-0libc/include/llvm-libc-macros/netinet-in-macros.h
+6-0libc/test/include/netinet_in_test.cpp
+12-02 files

LLVM/project 1154ed8llvm/unittests/SandboxIR SandboxIRTest.cpp

[SandboxIRTest] Use larger integer type

Use i32 instead of i1 so that the value fits. Possibly there was
some confusion with the condition argument of the select here.
DeltaFile
+1-1llvm/unittests/SandboxIR/SandboxIRTest.cpp
+1-11 files

LLVM/project 8975eb3llvm/lib/FuzzMutate OpDescriptor.cpp

[FuzzerMutate] Allow implicit truncation

If the fixed value 42 does not fit the integer type, truncate it.
DeltaFile
+2-1llvm/lib/FuzzMutate/OpDescriptor.cpp
+2-11 files

LLVM/project 42a47bfllvm/lib/Transforms/IPO WholeProgramDevirt.cpp

[WPD] Avoid implicit truncation when creating full set

Use the bit mask for the type instead of `~0`, so that we don't
rely on implicit truncation of the top bits.
DeltaFile
+5-3llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+5-31 files

LLVM/project 015ab4ellvm/lib/Transforms/Scalar Reassociate.cpp

[Reassociate] Allow implicit truncation when converting adds to mul

It's okay if the number of adds overflows. Explicitly allow implicit
truncation.
DeltaFile
+5-2llvm/lib/Transforms/Scalar/Reassociate.cpp
+5-21 files

LLVM/project 818c913llvm/lib/Transforms/Utils SimplifyCFG.cpp

[SimplifyCFG] Use getSigned() for signed value

Base is a sized quantity derived via getSExtValue(), so we should
use getSigned().
DeltaFile
+1-1llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+1-11 files

LLVM/project 3f82a8allvm/lib/CodeGen ExpandFp.cpp

[ExpandFp] Use getSignMask() (NFC)

This was using getSigned() with an unsigned (not sign extended)
argument. Using plain get() would be correct here. We can go
one step further and use getSignMask() to avoid the issue entirely.
DeltaFile
+1-1llvm/lib/CodeGen/ExpandFp.cpp
+1-11 files

LLVM/project 0b2fe07llvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/X86 shuffle-of-intrinsics.ll

[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965)

This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867)

Existing code recomputes the cost for creating a shuffle instruction even for the
repeating Intrinsic operand pairs. This will result in higher newCost.
Hence the runtime will decide not to fold.

The change proposed in this pr will address this issue. When calculating
the newCost we are skipping the cost calculation of an operand pair if
it was already considered. And when creating the transformed code, we
are reusing the already created shuffle instruction for repeated operand
pair.
DeltaFile
+28-0llvm/test/Transforms/VectorCombine/X86/shuffle-of-intrinsics.ll
+17-0llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+45-02 files

LLVM/project 0fff58allvm/lib/Target/SPIRV SPIRVCommandLine.cpp

[NFC][SPIRV] Re-work extension parsing (#171826)

This changes the extension parsing mechanism underpinning `--spirv-ext`
to be more explicit about what it is doing and not rely on a sort. More
specifically, we partition extensions into enabled (prefixed with `+`)
and others, and then individually handle the resulting ranges.
DeltaFile
+36-39llvm/lib/Target/SPIRV/SPIRVCommandLine.cpp
+36-391 files

LLVM/project de776fbopenmp/tools/archer/tests lit.cfg, openmp/tools/archer/tests/races task-taskgroup-unrelated.c

[OpenMP] Fix libarcher tests on Ubuntu 22.04 (#170671)

When llvm-symbolizer is not found on PATH TSan uses system's addr2line
instead. On Ubuntu 22.04 addr2line can't handle DWARF v5, which results
in failures in some libarcher tests.

This PR adds the directory of the just built LLVM binaries to PATH, to
make llvm-symbolizer available to TSan.

The changes were tested on an AArch64 machine, on which
task-taskgroup-unrelated.c was flaky. Moving the test code to a separate
function, executed 10 times, solved the issue.

Fixes #170138
DeltaFile
+17-3openmp/tools/archer/tests/lit.cfg
+13-1openmp/tools/archer/tests/races/task-taskgroup-unrelated.c
+30-42 files

LLVM/project df14096llvm/lib/Target/AMDGPU VOP3PInstructions.td

[NFC][AMDGPU] Refactor the multiclass for WMMA_F8F6F4 instructions (#172245)

DeltaFile
+34-13llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+34-131 files

LLVM/project 2e2e48fclang/lib/CIR/CodeGen CIRGenStmtOpenMP.cpp CIRGenStmt.cpp

[OpenMP][CIR] Add basic infrastructure for CIR lowering (#171902)

This patch adds the basic infrastructure for lowering an OpenMP
directive, which should enable someone to take over the OpenMP lowering
in the future. It adds the lowering entry points to CIR in the same way
as OpenACC.

Note that this does nothing with any of the directives, which will
happen in a followup patch. No infrastructure for clauses is added
either, but that will come in a followup patch as well.
DeltaFile
+460-0clang/lib/CIR/CodeGen/CIRGenStmtOpenMP.cpp
+181-76clang/lib/CIR/CodeGen/CIRGenStmt.cpp
+133-0clang/lib/CIR/CodeGen/CIRGenFunction.h
+129-0clang/lib/CIR/CodeGen/CIRGenDeclOpenMP.cpp
+21-7clang/lib/CIR/CodeGen/CIRGenDecl.cpp
+27-0clang/lib/CIR/CodeGen/CIRGenModule.cpp
+951-835 files not shown
+984-8311 files

LLVM/project f5a198blldb/test/API/functionalities/bt-interrupt TestInterruptBacktrace.py, lldb/test/API/python_api/hello_world TestHelloWorld.py

[lldb][test] Xfail 3 backtrace related tests on Windows on Arm (#172300)

Since we updated our buildbot setup, these have been failing. Ignore
them until we have time to find the real problem, which is something to
do with failing to backtrace, or missing debug info when we do.
DeltaFile
+2-0lldb/test/API/python_api/hello_world/TestHelloWorld.py
+1-0lldb/test/API/functionalities/bt-interrupt/TestInterruptBacktrace.py
+3-02 files

LLVM/project a86613aclang/test/SemaOpenCL amdgpu-ds-atomic-fadd.cl

[NFC][Clang] Add OpenCL Sema test for __builtin_amdgcn_ds_atomic_fadd_f32/f64
DeltaFile
+29-0clang/test/SemaOpenCL/amdgpu-ds-atomic-fadd.cl
+29-01 files

LLVM/project a68fde5llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/X86 avgflooru-scalar.ll avgfloors-scalar.ll

[DAG] foldAddToAvg - optimize nested m_Reassociatable matchers (#171681)

The use of nested m_Reassociatable matchers by #169644 can result in
high compile times as the inner m_Reassociatable call is being repeated
a lot while the outer call is trying to match. Place the inner
m_ReassociatableAnd at the beginning of the pattern so it is not
repeatedly matched in recursion.
DeltaFile
+6-6llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+5-5llvm/test/CodeGen/X86/avgflooru-scalar.ll
+5-5llvm/test/CodeGen/X86/avgfloors-scalar.ll
+16-163 files

LLVM/project 56d661allvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU fsqrt.f32.ll

AMDGPU: Teach lowering that exp and log intrinsics cannot return denormals
DeltaFile
+103-0llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll
+5-0llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+3-0llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+111-03 files

LLVM/project a93214cclang/cmake/caches Release.cmake

Build and ship OpenMP with LLVM releases (#160581)

Fixes #135021

Suggested-by: Kawashima Takahiro <t-kawashima at fujitsu.com>
DeltaFile
+5-1clang/cmake/caches/Release.cmake
+5-11 files

LLVM/project 7fefee3llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 fixed-vector-deinterleave.ll fixed-vector-interleave.ll

[LLVM][CodeGen][AArch64] Add NEON lowering for vector.(de)interleave intrinsics. (#169700)

DeltaFile
+257-13llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll
+242-1llvm/test/CodeGen/AArch64/fixed-vector-interleave.ll
+12-8llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+511-223 files

LLVM/project ee5b9cdllvm/docs/CommandGuide llvm-symbolizer.rst, llvm/include/llvm/DebugInfo/Symbolize Symbolize.h

[llvm-symbolizer] Recognize and symbolize archive members (#150401)

This PR adds support for selecting specific archive members in
llvm-symbolizer using the `archive.a(member.o)` syntax, with
architecture-aware member selection.

  **Key features:**
1. **Archive member selection syntax**: Specify archive members using
`archive.a(member.o)` format
2. **Architecture selection via `--default-arch` flag**: Select the
appropriate member when multiple members have the same name but
different architectures
3. **Architecture selection via `:arch` suffix**: Alternative syntax
`archive.a(member.o):arch` for specifying architecture

This functionality is primarily designed for AIX big archives, which can
contain multiple members with the same name but different architectures
(32-bit and 64-bit). However, the implementation works with all archive
formats (GNU, BSD, Darwin, big archive) and handles same-named members

    [4 lines not shown]
DeltaFile
+148-40llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+129-0llvm/test/tools/llvm-symbolizer/archive-member-big-archive.test
+126-0llvm/test/tools/llvm-symbolizer/archive-member-gnu.test
+42-9llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+14-4llvm/docs/CommandGuide/llvm-symbolizer.rst
+1-2llvm/tools/llvm-symbolizer/Opts.td
+460-556 files

LLVM/project 6e01ea4flang/lib/Semantics check-omp-loop.cpp, flang/test/Parser/OpenMP tile-fail.f90

[flang][OpenMP] Generalize checks of loop construct structure (#170735)

For an OpenMP loop construct, count how many loops will effectively be
contained in its associated block. For constructs that are loop-nest
associated this number should be 1. Report cases where this number is
different.

Take into account that the block associated with a loop construct can
contain compiler directives.
DeltaFile
+121-82flang/lib/Semantics/check-omp-loop.cpp
+15-1flang/test/Semantics/OpenMP/loop-transformation-clauses01.f90
+5-5flang/test/Semantics/OpenMP/do21.f90
+4-4flang/test/Parser/OpenMP/tile-fail.f90
+4-4flang/test/Semantics/OpenMP/loop-transformation-construct02.f90
+3-3flang/test/Semantics/OpenMP/loop-association.f90
+152-993 files not shown
+158-1069 files

LLVM/project 7927040llvm/lib/CodeGen/AsmPrinter DwarfCompileUnit.cpp, llvm/test/DebugInfo/X86 dwarf-call-target-clobbered.mir dwarf-call-target-mem-loc.mir

[DebugInfo][DWARF] Use DW_AT_call_target_clobbered for exprs with volatile regs (#172167)

Without this patch DW_AT_call_target is used for all indirect call address
location expressions. The DWARF spec says:

    For indirect calls or jumps where the address is not computable without use
    of registers or memory locations that might be clobbered by the call the
    DW_AT_call_target_clobbered attribute is used instead of the
    DW_AT_call_target attribute.

This patch implements that behaviour.
DeltaFile
+96-0llvm/test/DebugInfo/X86/dwarf-call-target-clobbered.mir
+12-5llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+3-2llvm/test/DebugInfo/X86/dwarf-call-target-mem-loc.mir
+2-2llvm/test/DebugInfo/X86/dwarf-callsite-related-attrs-indirect.ll
+113-94 files

LLVM/project 2f9bf3fllvm/include/llvm/CodeGenTypes LowLevelType.h, llvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp

[GlobalISel](NFC) Refactor construction of LLTs in `LegalizerHelper` (#170664)

I spotted a number of places where we're duplicating logic provided by
the `LLT` class inline in `LegalizerHelper`. This PR tidies up these
spots.
DeltaFile
+22-27llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+15-1llvm/include/llvm/CodeGenTypes/LowLevelType.h
+37-282 files