[CIR] Allow _setjmp and _setjmpex to fall through to library calls (#193021)
This change allows calls to _setjmp and _setjmpex to fall through the
builtin handling and be emitted as library calls when we are not
targeting OSMSVCRT. It also adds the code to set "returns_twice" on
functions matching an explicit list, as they are in classic codegen.
[dsymutil] Add --embed-resource to copy files into dSYM bundles. (#190663)
Add a new --embed-resource flag that copies files or directories into
the dSYM bundle's Contents/Resources/ directory during generation.
Projects often need to embed files such as LLDB Python scripts into dSYM
bundles, and this is usually done with a post dsym generation script,
which may race stripping and code signing steps.
rdar://50633614
[llvm-nm] Drop STT_FILE/STT_SECTION from --special-syms (#192129)
The filter for SF_FormatSpecific symbols exempted all such symbols
for architectures having mapping symbols. This caused STT_FILE and
STT_SECTION symbols to appear with --special-syms on these targets
but not on x86_64. Narrow the exemption to only STT_NOTYPE symbols,
which are the actual mapping symbols ($d, $x, etc.).
[HLSL] Add codegen for accessing resource members of a struct (#187127)
Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable.
When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration
is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access.
Fixes #182989
Don't pass RecipeBuilder
Legacy calls `setRecipe` on all processed recipes but really queries `getRecipe`
for memory operations only, that we don't touch in the scalarization as that
happens after all memory recipes has been processed.
[VPlan] Scalarize to first-lane-only directly on VPlan
This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595.
I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because that would require us to introduce
interleave groups much earlier in VPlan pipeline, and without that we
can't really `assert` this new decision matches the previous CM-based
one. And without those `assert`s it's really hard to ensure we properly
port all the previous logic.
As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.
Depends on https://github.com/llvm/llvm-project/pull/182592 but is stacked on
top of https://github.com/llvm/llvm-project/pull/182594 to enable linear
stacking for https://github.com/llvm/llvm-project/pull/182595.
[clang] Disable some module tests on AIX (#193008)
PR https://github.com/llvm/llvm-project/pull/190062 makes two module
tests fail on AIX. Disable them on that platform until we get to the
bottom of it.
[mlir][MemRef][GPU] Migrate GPU dialect ops to IndexedAccessOpInterface (#190380)
This commit migrates the handling of GPU dialect ops in
fold-memref-alias-ops from hard-coded support to the new
IndexedAccessOphinterface, which also adds expand_shape folding support
for those ops.
Once other memref-dialect passes are migrated to use this interface,
this will allow us to break the dependency between the memref and gpu
dialects.
[flang][cuda] Only apply the implicit data attribute on the component for use_device (#192146)
For interoperability between CUDA Fortran and OpenACC, the OpenACC
host_data use_device clause needs implicitly add the DEVICE attribute to
the object symbol for the duration of the region. When the object was a
component, we were adding the symbol to the base which is not what we
want.
Update the handling to copy the base symbol with a new DerivedTypeScope
and set the attribute on the component. New test is added to test the
behavior is indeed on the component.
Option to control signaling NaN support
This change implements the Clang command-line option `-fsignaling-nans`,
which is a counterpart of the GCC option with the same name. It allows a
user to control support for signaling NaNs. This option instructs the
compiler that signaling NaNs are to be treated according to IEEE 754:
they are quieted in arithmetic operations and raise `Invalid`
floating-point exception. The opposite option, `-fno-signaling-nans`,
does the reverse, - it indicates that signaling NaNs are handled
identically to quiet NaNs. If neither of these options is specified, no
signaling NaNs support is assumed, except for functions that have
`strictfp` attribute.
At the IR level, signaling NaN support is represented by the function
attribute "signaling-nans". It is set by Clang when it generates code in
cases when signaling NaNs are supported. If the target architecture does
not support signaling NaNs, Clang does not set this attribute.
The primary motivation for this change is the optimization of strictfp
[11 lines not shown]
[mlir][CSE] Pre-process trivially dead ops (NFC) (#191135)
This PR avoids calling `simplifyRegion` on dead region ops.
`simplifyRegion` attempts to perform CSE optimization on the ops within
the region, which is unnecessary for ops that are already trivially
dead.
[mlir][SparseTensor] add `numSymbols` information to simplify affine expressions (#191649)
Previously, the `translateShape` function hard-coded the `numSymbols`
parameter to 0. This makes the affine expression fail when the sparse
tensor encoding has symbols.
This PR fixes the issue by extracting and passing the `numSymbols`
information during translation. A regression test has also been added to
ensure this behavior remains supported.
Closes #191209
[AMDGPU] Report only local per-function resource usage when object linking is enabled (#192594)
With object linking the linker aggregates resource usage across TUs, so
compile-time pessimism and call-graph propagation duplicate the linker's
work or pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building
call-graph max/or expressions.
[CodeGen] Fix non-determinism in MachineBlockHashInfo (#192826)
The previous implementation used `hash_value(MachineOperand)`, which
is not guaranteed to be stable across different executions because it
hashes pointers for certain operand types (like MBB, GlobalAddress,
etc).
Use existing stableHashValue which has no problem.
The rest of the file should the same, but it may break profile
compatibility.
Changing behavior for Operand is not an issue, as existing one is a low
quality RNG.
Code does not have test coverage, it will be fixed in #192911.
Fixes #173933.
AMDGPU/GlobalISel: RegbankLegalize rules for merge-like opcodes
Move RegbankLegalize handling for G_BUILD_VECTOR, G_MERGE_VALUES and
G_CONCAT_VECTORS from AMDGPURegBankLegalize to AMDGPURegBankLegalizeRules
by implementing rules for all supported types.
AMDGPU/GlobalISel: RegbankLegalize rules for G_BITCAST
Move RegbankLegalize handling for G_BITCAST from AMDGPURegBankLegalize to
AMDGPURegBankLegalizeRules by implementing rules for all supported types.
AMDGPU/GlobalISel: RegbankLegalize rules for undef and constants
Move RegbankLegalize handling for G_IMPLICIT_DEF, G_CONSTANT and G_FCONSTANT
from AMDGPURegBankLegalize to AMDGPURegBankLegalizeRules by implementing
rules for all supported types.
[Clang][AMDGPU] Deprecate `amdgpu-num-vgpr` and `amdgpu-num-sgpr`
We will just emit a warning at this moment. This will still take effect for
regular compilation, but in object linking, we will simply ignore them.