[mlir][llvmir] Fix crash when a CallSiteLoc has a UnknownLoc callee (#186860)
Avoids reading a null StringAttr when no file name is present by
manufacturing a default instead.
[CIR] Fix missing RegionBranchTerminatorOpInterface declarations
After https://github.com/llvm/llvm-project/pull/186832 operations with RegionBranchTerminatorOpInterface needs to declare `getMutableSuccessorOperands`.
[libc]: implement 'iswpunct' entrypoint (#186968)
Added entrypoints:
- baremetal/arm
- baremetal/aarch64
- baremetal/riscv
- darwin/aarch64
- linux/aarch64
- linux/arm
- linux/riscv
- linux/x86_64
- windows
Also added the unit test for iswpunct.
Part of the issue: #185136
[flang][OpenMP] Remove unused function declaration, NFC (#187101)
The function `GetNumGeneratedNestsFrom` has been removed, but repeated
local rebases stubbornly inserted the declaration back in.
[clang-format] Fix Macros configuration not working with try/catch expansions (#184891)
This is a superseding followup to my previous PR,
https://github.com/llvm/llvm-project/pull/183352.
In my previous PR, I proposed adding TryMacros and CatchMacros
configuration options, similar in spirit to IfMacros and ForEachMacros.
I did so because I noticed that configuration like
`Macros=["TRY_MACRO=try", "CATCH_MACRO(e)=catch(e)]` did not format
configured macro(s) as try/catch blocks. @owenca confirmed in my
previous PR that this observed behavior is undesired, and we should
prefer to fix it rather than introduce new features.
This PR proposes a fix, described in detail in the commit message below
the break. In general terms, it deletes a heuristic from the lexing
phase, where it interacted poorly with the Macros option, and moves its
functionality to the parsing phase instead.
I describe a possibly cleaner fix in [a comment
[34 lines not shown]
[AMDGPU] Add structural stall heuristic to scheduling strategies
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
[SLP][NFC] Refactor BinOpSameOpcodeHelper BIT enum (#187067)
More readable syntax and increase type width to avoid silent errors if
we reach 17 members.
[mlir][GPU] Set nsw/nuw when expanding out subgroup ID (#187099)
There's no world where the subgroup ID (or the intermediate values
needed to compute it) will be negative or will have signed overflow.
This commit adds flags accordingly, which is helpful as this is a rather
low-level rewrite that might run after the analyses that would
ordinarily add these flags.
[flang][OpenMP] Remove unused function declaration, NFC
The function `GetNumGeneratedNestsFrom` has been removed, but repeated
local rebases stubbornly inserted the declaration back in.
[z/OS] Recognize EBCDIC archive magic (#186854)
`z/OS` archives use the same structural layout as traditional Unix
archives but encode all text fields in EBCDIC. The magic string is the
EBCDIC representation of `\"!<arch>\n\" (hex: 5A 4C 81 99 83 88 6E 15)`.
This patch adds recognition of the `z/OS` archive magic to
`identify_magic()` and defines the `ZOSArchiveMagic` constant. This is
the first in a series of patches adding `z/OS` archive support to LLVM.
[DirectX] Fix assertion in PointerTypeAnalysis with empty global_ctors (#179034)
When `llvm.global_ctors` has no elements (e.g., when all resources are
unused in a shader library), its initializer is a `zeroinitializer`
(`ConstantAggregateZero`) rather than a `ConstantArray`. The previous
code used `cast<ConstantArray>` which asserts on incompatible types:
> "cast<Ty>() argument of incompatible type!"
This patch uses `dyn_cast` and returns early if the initializer is not a
`ConstantArray`, handling the edge case gracefully.
Fixes #178993.
Co-authored-by: Kaitlin Peng <kaitlinpeng at microsoft.com>
[mlir][GPU] Refactor, improve constant size information handling (#186907)
1. There was duplicate code between the integer range analysis's
handling of static dimension size information (ex. gpu.known_block_dim
attributes) and the handling during the lowering of those operations.
The code from integer range analysis was given a dialect-wide entry
point (and had its types fixed to be more accurate), which the lowering
templates now call.
2. The templated lowering for block/grid/cluster_dim now produces
precise ranges (indicating the constant value) where one is known, and
the lowerings in rocdl (including those for subgroup_id) have been fixed
appropriately.
3. While I was here, the gpu.dimension enum has been moved to GPUBase so
it lives next to the other enums.
4. The pattern that expands subgroup_id operations now adds any thread
dimension bounds it finds in context.
(Claude was used for an initial round of review, I did the main coding
myself.)
[3 lines not shown]