-
Reproducibility is undermined by missing concrete pipeline specifications (patch definition, projection/pixelization, preprocessing, mask construction, apodization, weighting, binning). Despite emphasizing a fully public-products approach, the paper does not provide enough numerical detail to reproduce the spectra/residuals or to test robustness to analysis choices (Sec. II, Sec. III.A–B, Sec. V). Ambiguities include the exact cropped AA patch boundaries and projection, pixel resolution, any large-scale mode treatment, mask-building thresholds from inverse-variance and cross-linking maps, apodization kernel/scale, and the exact bandpower bin edges/$\Delta\ell$ used for each quoted $\ell$-range.
Recommendation: Expand Sec. III.A–B (or add an Appendix) with a checklist-level methodological specification: (i) exact AA patch boundary definition (RA/Dec bounds or a referenced mask file), projection, and pixel resolution; (ii) map units and any preprocessing/filtering or mean/gradient removal; (iii) explicit mask rules with numerical thresholds on inverse-variance and cross-linking maps, how intersections across channels/arrays are formed, and apodization functional form $+$ scale; (iv) estimator weighting (uniform vs inverse-variance) and treatment of anisotropic depth; and (v) binning scheme with explicit bin edges/centers and which bins enter the reported $500$–$1600$, $500$–$2500$, and $600$–$2800$ summaries. If possible, also list the exact DR6.02 product filenames/IDs used.
-
Key reported percent-level metrics are not mathematically defined: “fractional difference,” mean-absolute vs RMS summaries, and the within-channel “split-cross scatter” are described verbally but not specified as unambiguous equations. In particular, the denominator choice in $\Delta C/C$ (A, B, or symmetrized) materially affects quoted $7$–$9\%$ and cross-frequency residuals, and the order of operations (compute bandpowers then form ratios vs ratio per-$\ell$ then bin) can also matter (Sec. III.D, Sec. IV.A–C; Figs. 1–3, 6; Table II).
Recommendation: Add explicit equations in Sec. III.D (or an Appendix) defining: (i) the binned pseudo-$C_{\ell}$ estimator used; (ii) the exact plotted “fractional difference” statistic for each comparison type (e.g., $(C_A-C_B)/[0.5(C_A+C_B)]$); (iii) how mean-absolute and RMS are computed over an $\ell$-range (including bin weights such as uniform vs $(2\ell+1)$); and (iv) the within-channel split-cross scatter statistic in Fig. 3 (e.g., per-bin std/MAD across the $6$ cross-spectra normalized to a reference spectrum, then averaged over $\ell$). State explicitly whether residuals are computed from bandpowers or vice versa.
-
Uncertainty estimation and significance reporting are not reproducible and may be statistically inconsistent given split-cross correlations. The manuscript uses “empirical split-cross scatter” as an uncertainty proxy and reports peak significances and $\chi^2/{\rm dof}$, but does not define how spectra are combined, how covariance is estimated/normalized with a finite number of cross-spectra, whether off-diagonal bin covariance is neglected, and how dependence among cross-spectra that share splits is handled (Sec. III.D, Sec. IV.D; Fig. 4). This makes numbers like $0.69\sigma$ and $\chi^2/{\rm dof}$ opaque and potentially misleading.
Recommendation: In Sec. III.D and Sec. IV.D, provide a precise statistical recipe: (i) how many cross-spectra enter each comparison ($6$, $12$, etc.), and whether all are used despite shared splits; (ii) how the mean spectrum and per-bin variance are computed (including finite-sample normalization); (iii) whether bin-to-bin covariance is assumed diagonal and why (or show a quick check that correlations are small for the adopted binning); (iv) how uncertainties on fractional residuals/ratios are computed (propagation vs direct scatter of residuals); and (v) exact definitions of $\chi^2$, dof, and “peak significance.” If correlations among cross-spectra are ignored, explicitly label the resulting $\sigma$-values as heuristic/conservative diagnostics rather than formal significances.
-
The impact of flat-sky pseudo-$C_{\ell}$ approximations (cropped patch, approximate mask normalization, no mode-coupling matrix, no transfer-function correction) is only qualitatively acknowledged, leaving it unclear how much of the observed few-percent residual structure could be methodological rather than instrumental/foreground-driven (Sec. II, Sec. III.A–B, Sec. V). This is especially important because the headline $1$–$3\%$ and $7$–$9\%$ levels are derived entirely within this simplified estimator.
Recommendation: Add at least one quantitative validation of methodological bias: e.g., run a simple simulation ($\Lambda$CDM sky convolved with public beams $+$ representative anisotropic noise) through the same patch/mask pipeline and show recovery of input spectra and/or ratios over $500$–$1600$ and $600$–$2800$. Alternatively (or additionally), vary apodization length and patch boundaries and report how the main summary metrics shift. Provide order-of-magnitude bounds (e.g., “pipeline-induced multiplicative bias is $\leq X\%$ over $500$–$1600$”) so readers can contextualize observed residual magnitudes.
-
Common-beam harmonization is central but not documented at an implementation level consistent with the paper’s reproducibility goals, and the transform-space notation is potentially confusing. The text references applying $F_{\ell}$ in $a_{\ell m}$-space (pixell indexing) while the spectra are computed with flat-sky FFTs on cropped patches (Sec. III.C; also noted in Sec. I, Sec. V–VI). It is also unclear how $B_{\rm target}$ is chosen (broader beam vs fixed reference), whether pixel window functions are included, and how high-$\ell$ regularization/tapering is handled when dividing by small $B_{\rm source,\ell}$.
Recommendation: In Sec. III.C: (i) state explicitly whether beam harmonization is done on the full map before patching (spherical-harmonic/pixell $a_{\ell m}$) or in $2{\rm D}$ Fourier space on the patch, and keep notation consistent ($a_{\ell m}$ vs $T(\vec{\ell})$); (ii) define the target beam choice per test (within-channel, same-band, cross-frequency), including the rationale; (iii) specify inclusion of pixel window functions and any additional filtering/$\ell$-cuts/tapers; (iv) describe numerical steps unambiguously (map$\rightarrow$harmonics, multiply by $F_{\ell}$, inverse transform). Given the emphasis on the pixell indexing fix, include a short pseudocode snippet or minimal example (Appendix) and (optionally) a before/after plot demonstrating the impact of the corrected harmonization on one representative ratio.
-
The “beam/leak/passband expectation envelope” used to interpret day/night and cross-array diagnostics is under-specified, yet it plays a central role in qualitative conclusions about whether residuals are “expected” (Sec. III.D, Sec. IV.D–E; Fig. 4). The manuscript does not state how beam-split products are converted to $\ell$-dependent power envelopes, what leakage model/amplitude is assumed, how $\nu_{\rm eff}$/passband differences are translated into TT power expectations (including assumed foreground spectra), or how components are combined (linear vs quadrature vs max).
Recommendation: Provide a concrete, reconstructable recipe (Sec. III.D and/or Sec. IV.E): (i) how each beam-split $B_{\ell}$ variant is mapped to fractional TT power change vs $\ell$; (ii) the assumed leakage parameterization and numerical amplitude used; (iii) a simple passband/$\nu_{\rm eff}$-based model for how foreground TT power changes with $\nu$ (state assumed spectral indices/components); and (iv) the rule for combining terms into the final envelope. Include at least one explicit example curve (or representative numbers in $500$–$1600$ and $2000$–$3000$) so readers can rebuild the orange bands from public products.
-
The large $150~{\rm GHz}$ cross-array residuals ($\sim 7$–$9\%$) are potentially the most consequential “user-facing” result, but the paper does not sufficiently decompose whether this is primarily an overall amplitude offset or a scale-dependent shape difference. Without this decomposition, it is hard to diagnose whether beam mismatch, filtering/transfer effects, or calibration-like differences dominate (Sec. IV.A–B, Sec. V).
Recommendation: Add a targeted diagnostic in Sec. IV.A/IV.B: fit and remove a single multiplicative factor between the two $150~{\rm GHz}$ cross-array spectra over $500$–$1600$ (clearly labeled as a descriptive fit, not a recalibration), then re-plot residuals to separate amplitude vs shape. Comment on whether the remaining residual grows with $\ell$ (beam-like) or has more complex structure (filtering/mode-coupling-like). If feasible with public products, include a sanity visualization of each array relative to a common reference (e.g., Planck TT on the same patch, explicitly labeled as non-likelihood-grade) to help contextualize which side is driving the discrepancy.
-
Cross-frequency ($90\times 150$) residuals are attributed to $\nu_{\rm eff}$ mismatch and foreground color largely qualitatively. Given that Table I provides $\nu_{\rm eff}$ and the paper quotes few-to-$10\%$ residuals, the interpretation would be much stronger with even a back-of-the-envelope quantitative model and one robustness check (Sec. IV.B, Sec. V; Table I).
Recommendation: Augment Sec. IV.B/Sec. V with: (i) a simple parametric estimate translating $\nu_{\rm eff}$ differences into expected TT power changes for plausible foreground mixtures (e.g., power-law components for dust/CIB/radio with stated spectral indices, plus an estimate of foreground fraction in AA over $500$–$1600$); and (ii) one robustness test (e.g., stricter point-source/Galactic masking, or restricting to a lower-$\ell$ range where foregrounds are subdominant) to see whether cross-frequency mean-absolute residuals decrease as expected. This will help distinguish foreground-driven effects from potential beam/transfer/calibration systematics.
-
Source-free vs standard map validation is advertised as a central axis (Abstract/Intro), but Sec. IV.F currently reads as conceptual/methodological rather than presenting executed quantitative results comparable to the other tests (Sec. I–II, Sec. IV.F, Sec. V–VI). This creates a mismatch between stated goals and delivered evidence.
Recommendation: Either (a) add at least one concrete source-free vs standard comparison per major channel (e.g., pa5\_f090, pa5\_f150), with plots and a small table of mean-absolute and RMS fractional differences over clearly specified $\ell$-ranges (including a higher-$\ell$ band where point sources matter), or (b) explicitly re-scope the paper: remove/soften claims that this axis is completed in the present work (Abstract/Sec. II/Sec. VI) and describe it as future work or guidance.
-
Internal consistency of reported summary numbers appears questionable in at least one place (Table II and accompanying text range statement), and Table II is incomplete/unclear (rms column marked “—” while rms is referenced elsewhere). This undermines confidence in headline percent-level summaries (Sec. III.D; Sec. IV.B; Table II).
Recommendation: Audit Table II end-to-end: ensure column headers match populated columns, provide RMS values if referenced (or remove RMS from caption/text), and re-check the textual range statements derived from the table entries. Consider adding a single consolidated summary table (Sec. IV or Sec. VI) listing, for each channel/pair and each comparison type, mean-absolute and RMS over the same clearly defined $\ell$-range.