[Clang] add support for C23 'H', 'D', and 'DD' length modifiers (#201098)
This patch adds `-Wformat` support for the C23 `H`, `D`, and `DD` length
modifiers in `printf`/`scanf` format strings. #116962
[ObjectYAML] Avoid comparison of compressed data (#202413)
The result of zlib compression isn't consistent across versions.
Downstream this test was failing due to our version giving slightly
different results. This version passes both upstream and downstream.
Assisted-by: Automated tooling, human reviewed.
[flang] Inline DerivedTypeSpec::GetScope to fix shared-lib link
FortranEvaluate referenced DerivedTypeSpec::GetScope(), defined out-of-line
in FortranSemantics, producing an undefined reference in libFortranEvaluate.so
under BUILD_SHARED_LIBS=ON. Make GetScope() inline in symbol.h so no
cross-library symbol is needed.
This is the fix missing from the original PR (#192651), which was reverted
in #202408.
[ubsan] Add [undefined] section to ignorelist (#202380)
`-fsanitize-blacklist` this files passed as which apply to any
sanitizers.
So if Ubsan is combined with Asan, as-is these suppressions apply to
Asan
which is clearly was not the intention.
Update clang/unittests/ScalableStaticAnalysisFramework/Analyses/PointerFlow/PointerFlowTest.cpp
Co-authored-by: Balázs Benics <benicsbalazs at gmail.com>
[InstCombine] Fix incorrect is_zero_poison when folding select+ctlz to cttz (#202388)
foldSelectCtlzToCttz folds
%lz = call i32 @llvm.ctlz.i32(i32 (x & -x), i1 is_zero_poison)
%r = select (icmp eq x, 0), i32 32, i32 (xor %lz, 31)
into
%r = call i32 @llvm.cttz.i32(i32 x, i1 is_zero_poison)
The original select's result is defined when x is zero, even if
is_zero_poison is true. Therefore in the new cttz call, we need to pass
false for the second param, we can't reuse is_zero_poison.
[InstCombine] Fix invalid IR when folding frexp(frexp(x)) with mismatched exponent types (#202419)
Instcombine folds the idempotent frexp pattern
%inner = call { double, i64 } @llvm.frexp.f64.i64(double %x)
%f = extractvalue { double, i64 } %inner, 0
%outer = call { double, i32 } @llvm.frexp.f64.i32(double %f)
to `{ %f, 0 }`, because the fraction after the first frexp call is known
0. It did this by reusing the inner frexp's result struct and
overwriting field 1 with zero.
But you can see in this example that reusing the inner frexp's
result struct is invalid, because that call returns { double, i64 },
whereas the second call returns { double, i32 }.
Fix this by building the new struct instead of modifying the old one.
[ARM] Reject invalid BF encoding when target is next instruction (#201533)
When the BF instruction targets the immediately following label, the
encoded branch offset becomes zero, causing LLVM to emit invalid machine
code.
Add validation in the fixup_bf_branch path to reject this case and emit
an error instead.
Add MC regression test to cover new validation.
Assisted by ChatGPT. Human-verified, debugged, tested and validating by
author.
[flang][acc] Fix separate compilation for module !$acc declare create on allocatables. (#202409)
With separate compilation, a module defining `!$acc declare create` on
an allocatable and a using file that allocates it did not get
declare-action lowering in the using Translation Unit(TU):
`ACCDeclareActionConversion` could not resolve the post-alloc recipe
(defined only in the module .o), so no `fir.call` was emitted.
Add `acc.declare_action` for allocatable/pointer symbols under !$acc
declare.
* In the defining TU: Export module-global post-alloc/post-dealloc
recipes as linkable definitions and mark them with acc.declare_action at
creation.
* In the using TU: When declaring a USE-associated module global, emit
private external recipe stubs so the declare-action conversion pass can
insert fir.calls that link to the module definition.
[MLIR][XeGPU] Update Wg dpas_mx integration test. (#201680)
Make problem size smaller and add K loop.
Add host code to call gpu kernel.
Add test input and reference output.
Add comparison code to check output against reference output.
[mlir][acc] Format consistency for reduction accumulate (#202414)
Avoid use of parentheses so that format for
`acc.reduction_accumulate` is consistent with rest of acc reduction
operations.
[MLIR][XeGPU] Support transposed load_nd of sub-32-bit elements (#201636)
The 2D block load transpose feature is only available for 32-bit
elements. When a transposed load_nd is requested for a sub-32-bit
element type, the XeGPU-to-XeVM lowering now emulates it by
reinterpreting the tile as 32-bit elements: the element size is promoted
to 32 bits, the tile width is scaled down by (32 / elemBitSize), and the
column offset (offsetW) is right-shifted by log2(32 / elemBitSize) to
account for the wider element.
Add a conversion test (loadstore_nd_transpose.mlir) covering the f16
transposed load path.
[Frontend][Offloading] Restore silent ignore for non-existing input files in nvlink (#202352) (#202403)
Partially revert commit
https://github.com/llvm/llvm-project/commit/a0ccab35110951afc9adc5d7dc733ba8c58cf3f9
to restore
the original behavior of silently skipping non-existent positional input
files
in resolveArchiveMembers(), while preserving strict validation in
clang-sycl-linker.
Background:
The original commit added error reporting for non-existent input files
in the
shared resolveArchiveMembers() function to catch genuine user errors.
However,
this broke clang-nvlink-wrapper when unrecognized options were misparsed
as
input files (e.g., "relro" from "-z relro" before the -z option was
properly
[19 lines not shown]
[ASan] add pattern 'cmp BYTE PTR [rdx], XX' to win instruction decoder (#202407)
**Context:** The ASan instruction decoder in `interception_win.cpp` has
a manual case-based list for each instruction pattern we expect to see
in function prologues.
Today, we have an instruction decoder for `cmp BYTE PTR [rcx], XX`, but
we do not have the equivalent for `cmp BYTE PTR [rdx], XX`. In recent
builds of Windows, that latter is now seen in `ucrtbase!strstr`.
**This PR** adds the missing case.
[lldb][NativePDB] Handle invalid type references gracefully (#202371)
Incrementally linked PDBs can contain semantically incorrect references
to types from the symbol streams and the IPI stream.
I can't reproduce it reliably, but as mentioned in #200452, at some
point the references become incorrect.
We should not crash if we receive such PDBs as input. Here I noticed two
issues:
1. `CVTagRecord` requires the passed type to be a type record
(union/struct/class). We should check that this is the case with
`IsTagRecord`.
2. After casting the return of `GetOrCreateClangType`, check that it's
the expected type (not null).
I added a test for both cases.
[InstCombine] Drop non-power-of-two alignment assumptions (#202396)
These assumptions aren't actually used anywhere, so we might as well
drop them. We might want consider emitting an assumption that the value
is zero at some point, but we don't have the bundles for that currently
and it seems rather low priority.
[AArch64][MacroFusion] Fuse only tied AES pairs post-RA (#201610)
This patch adds an ad-hoc check to macro fusion to only fuse AES pairs
that are tied post-RA as a guardrail.
Currently, ISel captures every RAW dependent AESE/D+AES[I]MC pair (by
data-dependence DAG), and applies a constraint that the pair must write
to the same dest, i.e the second instruction is tied (a thing that
cannot be expressed in SSA IR). So this is effectively a NFC in that
perspective, as AES is not really being lowered through other paths.
Here we add an appropriate check to macro fusion, if registers are
physical, to avoid pre-RA regression (maintaining the current status
where pre-RA fusion hides theoretical better schedules even if the pari
is not tied). Otherwise the tests in
llvm/test/CodeGen/AArch64/misched-fusion-aes.ll may not catch an ISel
change that would happen to pass, satisfying the register allocation
being filechecked.
If it appears in the future that a subtarget can fuse untied pairs, we
[5 lines not shown]
[lldb] Fixup address in JSONUtils::ValuePointsToCode (#201951)
On platforms with metadata in pointers (i.e. arm64e), the address must
be fixed up before requesting a load address.
This fixes TestDAP_evaluate.py for arm64e.
Revert "[docs] update CI to use latest release of doxygen" (#202412)
Reverts llvm/llvm-project#191501 as cmake CI is having an issue with it:
```
-- Doxygen enabled (1.14.0).
CMake Error at /work/as-worker-4/publish-doxygen-docs/llvm-project/cmake/Modules/HandleDoxygen.cmake:29 (add_custom_target):
add_custom_target cannot create target "doxygen" because another target
with the same name already exists. The existing target is a custom target
created in source directory
"/work/as-worker-4/publish-doxygen-docs/llvm-project/llvm". See
documentation for policy CMP0002 for more details.
Call Stack (most recent call first):
docs/CMakeLists.txt:59 (include)
```
[libc][math] Improve the performance of sin/cos for small inputs |x| < 2^-4. (#201748)
- Use a degree-9 polynomial for fast path for sin(x), generated by Sollya,
with errors bounded by `|x| * 2^-68 + 2* ulp(x^3 / 6)`.
- Use a degree-8 polynomial for fast path for cos(x), generated by Sollya,
with errors bounded by `2^-69 + ulp(x^2/ 2 )`.
[AArch64][TTI] Allow mixed-extension partial reductions with +dotprod (#199762)
With the backend now lowering SUMLA via two udot products on targets
that have +dotprod (#199761), lower the cost on targets without +i8mm.
PR: https://github.com/llvm/llvm-project/pull/199762
[Flang][OpenMP] Support iterator modifiers in map and motion clauses
Support iterated array elements and array sections in map and motion clauses for
target data, target enter data, target exit data, and target update constructs.
Preserve mapper resolution for iterated entries, including explicit mappers,
user-defined default mappers, declare mapper entries, and implicit default
mappers.
This PR stacked on top of #197047 and #197752.
This patch is part of the feature work for #188061.
Assisted with copilot.