[flang][OpenMP] Implement utility to locate OmpClause in ODS, NFC (#184866)
Simplify looking for a specific clause in OmpDirectiveSpecification.
This is alternative to DirectiveStructureChecker::FindClause for when
the internal checker structures have not yet been updated in the AST
traversal.
[flang][openacc] Relax semantic check on cache directive (#184887)
The specification doesn't really forbid the colon notation to be used to
specify the full array. Reference compiler accepts this and our lowering
can already handle it.
[AMDGPU] add back the true16 pattern for cvt_pk_rtz (#184857)
I found that the `SupportedRoundMode` pattern for true16 mode is removed
in https://github.com/llvm/llvm-project/pull/177069 by mistake. Added it
back in this patch and add gfx11 to the test which runs true16 mode
[CIR][AArch64] Add missing lowerings for vceqz_* NEON builtins
Implement the remaining CIR lowerings for the AdvSIMD (NEON)
`vceqz{|q|d|s}_*` intrinsic group (bitwise equal to zero).
The `vceqzd_s64` variant was already supported; this patch completes
the rest of the group.
Tests for these intrinsics are moved from:
test/CodeGen/AArch64/neon-misc.c
to:
test/CodeGen/AArch64/neon/intrinsics.c
The implementation largely mirrors the existing lowering in
CodeGen/TargetBuiltins/ARM.cpp.
`emitCommonNeonBuiltinExpr` is introduced to support these lowerings.
`getNeonType` is moved without functional changes.
[2 lines not shown]
[flang][acc] Handle ViewLike ops with OutlineRematerializationOpInterface in OffloadLiveInValueCanonicalization (#184218)
`fir::ConvertOp` implements both `ViewLikeOpInterface` and
`OutlineRematerializationOpInterface`. `fir.convert` is also used for
ptr-to-int conversions like `(!fir.ref<i32>) -> i64`. That is not really
a "view" — it converts a pointer to an integer — but
`ViewLikeOpInterface` is still attached, so `getOriginalValue` traces
through it to the underlying value.
When the underlying value is not a rematerialization candidate (e.g.,
`fir.alloca`, a block argument, or a `fir.call` result),
`isRematerializationCandidate` returns false and the `fir.convert` is
left as a live-in. This prevents `ACCImplicitData` from tracing back to
the original pointer to create the data mapping.
This PR:
1. Registers `fir::ConvertOp` with
`OutlineRematerializationOpInterface`.
2. Adds a fallback in `isRematerializationCandidate`: when the traced
[16 lines not shown]
[Clang][CIR][AArch64] NFC: Cleanups in AArch64 builtins lowering (#184404)
This patch performs small cleanups and fixes in the AArch64 builtins
lowering code, with the goal of aligning the CIR path more closely
with the existing Clang CodeGen implementation.
Changes include:
* Make sure that `noundef` is consistently matched using `{{.*}}`.
* Rename `AArch64BuiltinInfo` to `armVectorIntrinsicInfo` for better
consistency with the original CodeGen implementation.
* Simplify `emitAArch64CompareBuiltinExpr`, fix an incorrect
assert condition (missing `!`) and make sure to use the input `kind`
condition instead of hard-coding `cir::CmpOpKind::eq`.
* Improve and clarify comments.
No functional changes intended (NFC).
[mlir][acc] Add acc.compute_region and acc.par_width operations (#184864)
Introduce two new codegen operations to the acc dialect that model GPU
compute region execution and parallel launch configuration:
- acc.par_width: specifies a parallel dimension.
- acc.compute_region: wraps a region of code for GPU execution,
capturing
launch configuration (from acc.par_width results) and input values as
block arguments.
These operations bridge the gap between high-level OpenACC compute
constructs (acc.parallel, acc.kernels, acc.serial) and gpu.launch. The
passes that do these transformations will soon follow.
---------
Co-authored-by: Scott Manley <rscottmanley at gmail.com>
[CIR] Add support for delete cleanup after new operators (#184707)
This adds support for calling operator delete when an exception is
thrown during initialization following an operator new call.
This does not yet handle the case where a temporary object is
materialized during the object initialization. That case is marked by
the "setupCleanupBlockActivation" diagnostic in deactivateCleanupBlock
and will be implemented in a future change.
[HLSL] Fix interleaved vector and matrix return types in AST dump
HLSL vector and matrix types were previously printed with their closing
syntax (', N>') in 'printAfter', causing them to interleave with function
parameters when used as return types (e.g., 'vector<float (args), 4>').
This change moves the HLSL vector and matrix closing syntax into
'printBefore' when 'UseHLSLTypes' is enabled, ensuring the type is
printed completely before the parameter list.
Note that address space qualifiers are now printed after the type
(e.g., 'vector<float, 4>hlsl_device'). This is because
'canPrefixQualifiers' in 'TypePrinter.cpp' returns false for these types.
We cannot easily change this to check 'UseHLSLTypes' because
'canPrefixQualifiers' is a static method and does not have access to the
PrintingPolicy at that point.
Fixes interleaved output in HLSL AST tests.
Assisted-by: Gemini
Revert "[mlir][arith] Add `exact` to `index_cast{,ui}` (#183395)" (#184876)
This reverts commit 7ad2c6db54a0e77249f2edb3c589ccf4c930d455.
PR #183395 introduced the `exact` flag to `index_cast` and
`index_castui` and updated some canonicalization patterns.
These canonicalization patterns were found to be unsound. For example:
* `index_cast(index_cast(x)) -> x`
* where one first truncates and then widens x
the rewrite is unsound because information is lost on the first cast as
it **may** truncate the value of x, therefore losing information. The
`exact` flag was made to make this transformation sound. Its semantics
are that when the `exact` flag is present, then it is assumed that the
operand to index_cast does not lose information (i.e., fits perfectly in
the destination type).
In PR #183395, the canonicalization rule was rewritten such that would
[25 lines not shown]
[HLSL] Implement Texture2D::Gather and Texture2D::GatherCmp (#183323)
Add the Gather functions for Texture2D. Variations for all components
are added (Red, Blue, Greed, Alpha). If targeting Vulkan then the
GatherCmp* function for a component other than 0 will result in an
error, as that will lead to invalid SPIR-V.
Part of https://github.com/llvm/llvm-project/issues/175630.
Assisted by: Gemini
Don't crash when given an empty input filename. (#184718)
Commands such as `clang -- ''` hit two different crash bugs: a buffer
overflow caused by using a `memcmp` that might be larger than the input,
and a bogus assert in the option parser when attempting typo correction.
[RISCV] Add RISCVISD opcodes for PSHL/PSRL/PSRA and lower to them. (#184836)
We only support splat shift amounts. Previously we checked if the shift
amount was a splat_vector and considered it legal.
I don't think there is a guarantee that the splat_vector will stick
around as a splat_vector. It's safer if we capture the splat and create
a dedicated node with a scalar shift amount.
Refactor createIteratorLoop to use OMPIRBuilder utility functions and make end-of-block insertion robust.
- Replace manual splitBasicBlock/branch with splitBB
and redirectTo()
- When insertion point is at BB.end() and the block is terminated, split
before the terminator so the original successor path is preserved
through omp.it.cont
- Add test for unterminated blocks
clang/AMDGPU: Do not emit __oclc_ABI_version references with environment (#184868)
Assume a sufficently new code object version if the environment is set
to something indicating we should have a real library.
[lldb] Don't link TestingSupport as a component (#184310)
This doesn't work with dylib builds, because TestingSupport is not part
of the dylib. Instead, we should link it via LINK_LIBS, like other tests
already do.
(cherry picked from commit d1c563beee794b3a967786fd07c437ffc66fb7f0)
[lldb][Target] Allow eLanguageTypeAssembly to use ScratchTypeSystemClang (#183771)
After cleaning up some of our `LanguageType`/`SourceLangage`
round-tripping (see `7f51a2a47d2e706d04855b0e41690ebafa2b3238`), a CU
with `DW_LANG_MIPS_Assembler` will get a language type of
`eLanguageTypeAssembly` (as opposed to `eLanguageTypeMipsAssembler`).
Reason being that there is no `DW_LNAME_` (DWARFv6 language code) for
`MIPS Assembler`, only for generic `Assembly`. So it's not possible to
round-trip cleanly between pre-DWARFv6 and DWARFv6 language codes, which
LLDB relies on for storing language types (and will lean into more
heavily in the future). This broke a special provision we have where we
allow `ScratchTypeSystemClang` to be used when evaluating expressions in
assembly CUs (i.e., CUs where the debug-info explicitly sets the
language to assembly).
If we ever want to distinguish MIPS from other Assembly, the proper way
to do so is introduce a `DW_LNAME_Mips_Assembler`. For now, this patch
adds another case for `eLanguageTypeAssembly` in
`GetScratchTypeSystemForLanguage`.
[9 lines not shown]
Revert "Add the ability to "allow another thread to see the private state" mode. (#184272)"
This reverts commit 97572c1860efeeb97b5940927cee72081b61810a.
This patch seems to cause TestWatchpointCommandPython.py to time out
on the ubuntu buildbots (but nowhere else that I can find so far.) The
timeout is weird too, the TEST FILE is timing out but the individual
tests aren't being shown and there's no other output. Grrr...
Anyway I'll revert this and then see if I can do some guessing about
how this change might cause the test to fail.
Revert "When hijacking events, don't let the user thread that was allowed"
This reverts commit a8af467fad7e5fff71643a3d6f2d06ac4f637e66.
This was a follow-on to 97572c1860efeeb97b5940927cee72081b61810a which was me
trying to guess why the ubuntu bots were failing with an entirely unhelpful
failure mode. I'll have to figure out how I can reproduce this somewhere so
I can look at it for real.
clang/AMDGPU: Do not emit __oclc_ABI_version references with environment
Assume a sufficently new code object version if the environment is set to
something indicating we should have a real library.
[clang-doc] Introduce Serializer class
Serialization has mostly been done with static functions, but soon we
will need to share state, like alocator references. To avoid blowing up
our parameter lists, we can just wrap the local functions within a
class.