[BOLT][DebugInfo] Speed up DIEBuilder with DenseMap (#197655)
We replaced `std::unordered_map` with LLVM's `DenseMap` for the DIE maps
in DIEBuilder. Since this map is accessed frequently during DWARF
rewriting, the improved data layout translates directly into reduced
cache misses. As shown in the benchmark results, this change yields
1.22x–1.27x speedup.
**Program from Bytedance**
| BatchSize | Baseline (s) | Optimized (s) | Speedup |
|---|---|---|---|
| 2 | 120.01 | 98.32 | 1.22x |
| 4 | 104.12 | 85.37 | 1.22x |
| 16 | 82.31 | 66.41 | 1.24x |
| 32 | 77.45 | 61.01 | 1.27x |
| 64 | 71.69 | 56.35 | 1.27x |
[LoopInterchange] Bail out when memory instruction ratio is high (#192954)
Currently, to save compile-time, LoopInterchange limits the number of
memory instructions and bails out early if it exceeds a threshold.
However, the dependence analysis phase in LoopInterchange has `O(N^2)`
complexity, where `N` is the number of memory instructions. This means
that even a small number of memory instructions can have a
non‑negligible impact on compile-time. In fact, I found such a case
(about +5% compile‑time regression), which the most instructions in the
loop are stores.
This patch replaces the heuristic which determines whether we should
continue the analysis or bail out to save compile time. The idea is that
if the ratio of the squared number of memory instructions to the total
number of instructions is small, LoopInterchange is allowed to continue
its analysis. The existing option `-loop-interchange-max-meminstr-count`
is removed.
Compile-time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=f344adcd2fb876d61f016fb92369a6530cc85a5b&to=6f7e5b0e4b35116728563913f2d98b7f9341409b&stat=instructions:u
www/freenginx: update njs 0.9.8 -> 0.9.9
Bump PKGREVISION.
Sponsored by: tipi.work
<ChangeLog>
nginx modules:
*) Security: a heap buffer overflow might occur in a worker process
when the "js_fetch_proxy" directive value contains nginx
variables derived from the client request ($http_*, $arg_*,
$cookie_*, etc.) and the location's JS handler invokes
ngx.fetch(). The issue was introduced in dea83189 (0.9.4).
*) Feature: added js_access directive.
*) Feature: added r.readRequestText(), r.readRequestArrayBuffer(),
[21 lines not shown]
[flang][AddAliasTags] Fix segfault when type contains `fir.boxproc`
`fir.boxproc` currently has no LLVM representation (its converter
returns `std::nullopt`). When `AddAliasTags` called `getTypeSizeAndAlignment`
on a type containing `fir.boxproc` (e.g. a sequence of a derived type with
procedure pointer components), `convertRecordType` and `convertSequenceType`
would crash trying to mlir::cast a null type.
For any type that might recursively contain a non-convertible type
(`fir.boxproc` in this case), `TypeConverter` would now propagate an
empty optional `mlir::Type` and emit a debug warning that conversion
failed. This helps us avoid seg faulting expecting that the type or some
part of it were converted correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
www/freenginx-devel: update from 1.31.0 to 1.31.1
Sponsored by: tipi.work
<ChangeLog>
*) Feature: the "off" parameter of the "index" directive.
Thanks to Fabiano Furtado.
*) Bugfix: a segmentation fault might occur in a worker process if the
"rewrite" directive was used to change request arguments and other
directives of the ngx_http_rewrite_module were executed afterwards.
*) Bugfix: in the "set" directive.
*) Bugfix: a segmentation fault might occur in a worker process if the
ngx_http_charset_module was used to convert responses from UTF-8.
*) Bugfix: in the ngx_http_charset_module.
[12 lines not shown]
[X86] Update PMADDWD tests to more closely match middle-end vector.reduce.add codegen (#198993)
The middle-end will detect vector.reduce.add patterns - update the
Codegen tests to use the intrinsics directly and add PhaseOrdering tests
to ensure vector.reduce.add intrinsics are created
www/freenginx-devel: update njs 0.9.8 -> 0.9.9
Bump PKGREVISION.
Sponsored by: tipi.work
<ChangeLog>
nginx modules:
*) Security: a heap buffer overflow might occur in a worker process
when the "js_fetch_proxy" directive value contains nginx
variables derived from the client request ($http_*, $arg_*,
$cookie_*, etc.) and the location's JS handler invokes
ngx.fetch(). The issue was introduced in dea83189 (0.9.4).
*) Feature: added js_access directive.
*) Feature: added r.readRequestText(), r.readRequestArrayBuffer(),
[21 lines not shown]
aarch64: pmap: misc improvements to pmap_test_mod_ref
- remove the need for pmap_debugva by using uvm_km_{alloc,free}
- deactivate curlwp so the kernel pmap is always activate
- sprinkle pmap_udpate()
CVS: ----------------------------------------------------------------------
[LV] Convert gather loads with constant stride into strided loads (#147297)
This patch detects non-consecutive load accesses (i.e. gather) with a
constant stride, such as:
```
void stride(int* a, int *b, int n) {
for (int i = 0; i < n; i++)
a[i * 5] = b[i * 5] + i;
}
```
and converts them into strided loads when legal and profitable, using
experimental_vp_strided_load.
The new VPlan transformation, convertToStridedAccesses, hoists the
functionality of RISCVGatherScatterLowering into the vectorizer,
enabling a more precise cost estimation during vectorization.
Additionally, by leveraging SCEV for stride analysis, the vectorizer can
potentially detect more opportunities to optimize gathers into strided
loads.
This enables more efficient code generation for targets like RISC-V that
support strided loads natively.
[X86] LowerVECREDUCE - don't attempt to handle vectors with non-pow2 element counts (#198989)
32-bit targets will attempt to lower vXi64 reductions prior to argtype legalization
Crash fix - we can improve the handling in a future commit