[CodeGen] Treat hasOrderedMemoryRef as implying arbitrary loads or stores (#182000)
This prevents MachineSink from sinking loads past fences (or any other instruction marked as hasSideEffects).
Fixes: #181708
[flang][acc] Allow orphaned acc cache directive (#184448)
While the spec allows the cache directive at the top of a loop body, the
directive has also been utilized at the top of an acc routine. This PR
removes the semantic check that rejects the cache directive outside of a
loop, allowing orphaned `!$acc cache` similar to CIR.
The OpenACC.md deviation document is updated to note this extension.
[DebugInfo] Emit DW_AT_const_value for constexpr array static members (#182442)
Clang does not emit a `DW_AT_const_value` in DWARF, while GCC does. This
patch fixes this issue through handling Array `APValues` and respective
handling in the backend through `ConstantDataSequential`
Fixes #165220
[clang-doc] Add a Mustache Markdown generator (#177221)
Adds a Markdown generator that uses Mustache templates. This patch adds
the templates themselves and implements changes to the JSONGenerator to
allow for the creation of specific files needed by the MD tests like
`all-files.json`.
This backend should be considered experimental. It satisfies all the
same tests that the current MD backend is tested against, but those
don't seem to provide full coverage for all functionality inside that
backend. It also doesn't output everything provided by JSON. It doesn't
use the MD unittests because the Mustache templates must currently be
written to files.
[AIX] Sort relocations in XCOFF object writer. (#180807)
Some relocations (like R_REF) are emitted to an offset 0 within the CSECT. If other relocations have already been emitted then the relocations are not in increasing order and the linker will emit an error. Sort the relocations before emitting to fix the problem.
[lld][Hexagon] Fix findMaskR8 missing duplex support (#183936)
findMaskR8() lacked an isDuplex() check, unlike findMaskR6(),
findMaskR11(), and findMaskR16() which all handle duplex instructions.
When the assembler generates R_HEX_8_X on a duplex SA1_addi instruction
(e.g. `{ r0 = add(r0, ##target); memw(r1+#0) = r2 }`), the wrong mask
0x00001fe0 placed relocation bits at [12:5] instead of [25:20],
corrupting the low sub-instruction (e.g. memw became memb).
Add the isDuplex() check returning 0x03f00000, and add a comprehensive
test covering all duplex instruction x relocation type combinations
across findMaskR6, findMaskR8, findMaskR11, and findMaskR16.
[AArch64] Fold zero-high vector inserts in MI peephole optimisation
Summary
This patch follows on from #178227.
The previous ISel fold lowers the 64-bit case to:
fmov d0, x0
fmov d0, d0
which is not ideal and could be fmov d0, x0.
A redundant copy comes from the INSERT_SUBREG/INSvi64lane.
This peephole detects <2 x i64> vectors made of a zeroed upper and low
lane produced by FMOVXDr/FMOVDr, then removes the redundant copy.
Further updated tests and added MIR tests.
[AArch64] Add lowering for misc NEON intrinsics (#183050)
This patch adds custom lowering for the following NEON intrinsics to
enable better codegen for convert and load/store operations:
- suqadd
- usqadd
- abs
- sqabs
- sqneg
[AArch64] Fold zero-high vector inserts in MI peephole optimisation
Summary
This patch follows on from #178227.
The previous ISel fold lowers the 64-bit case to:
fmov d0, x0
fmov d0, d0
which is not ideal and could be fmov d0, x0.
A redundant copy comes from the INSERT_SUBREG/INSvi64lane.
This peephole detects <2 x i64> vectors made of a zeroed upper and low
lane produced by FMOVXDr/FMOVDr, then removes the redundant copy.
Further updated tests and added MIR tests.
[mlir][xegpu] Add support for accessing the default order of a layout. (#184451)
Currently, `getOrder` returns null if the user does not provide an
`order` in xegpu layout. This behavior is undesirable when coupled with
utility functions that work on top of layouts (like `isTransposeOf`).
This PR introduce a `getEffectiveOrder` which always returns the true
order, even if user decides to omit it.
[SPIRV] Fix return value of runOnModule for SPIRVPrepareFunctions (#184636)
We need to return `true` if the module is changed by the pass, and we
forgot to do that in this new case.
This fixes a buildbot
[fail](https://lab.llvm.org/buildbot/#/builders/187/builds/17475).
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[clang][Modules] Fixing unexpected warnings triggered by a PCH and a module with config macros (#177078)
When a PCH is compiled with macro definitions on the command line, such
as `-DCONFIG1`, an unexpected warning can occur if the macro definitions
happen to belong to an imported module's config macros. The warning may
look like the following:
```
definition of configuration macro 'CONFIG1' has no effect on the import of 'Mod1'; pass '-DCONFIG1=...' on the command line to configure the module
```
while `-DCONFIG1` is clearly on the command line when `clang` compiles
the source that uses the PCH and the module.
The reason this can happen is a combination of two things:
1. The logic that checks for config macros is not aware of any command
line macros passed through the PCH
([here](https://github.com/llvm/llvm-project/blob/7976ac990000a58a7474269a3ca95e16aed8c35b/clang/lib/Frontend/CompilerInstance.cpp#L1562)).
2. `clang` _replaces_ the predefined macros on the command line with the
predefined macros from the PCH, which does not include any builtins
[7 lines not shown]
[bazel] Fix more parse_headers cases in lldb (#184534)
CI doesn't have a toolchain that supports this so we don't validate this
there, but locally this fixes some issues if you're using a toolchain
that does. Mostly since lldb has a bunch of circular deps we just have
to disable it for the header only targets we create to avoid those
circular deps.
[OpenMP][AIX] Add libpthreads for -fopenmp (#184629)
The compiler uses TLS for OpenMP thread‑private data, which results in
references to symbols such as `__tls_get_addr` in `libpthreads`.
Therefore, this PR adds `libpthreads` to the link command when
`-fopenmp` is specified.
[OpenACC] Replace terminators with scf.yield in wrapMultiBlockRegionWithSCFExecuteRegion (#184458)
When wrapping a multi-block region in `scf.execute_region`, replace
`func::ReturnOp` (if flag `convertFuncReturn` is set) and `acc::YieldOp`
in all the blocks with `scf.yield` so the region has a valid SCF
terminator.