-
Set–coadd tests are not statistically independent if the coadd includes the tested set, so the null expectation $\sigma_{\rm norm}=1$ used for interpreting set–coadd residual widths can be incorrect (Sec. 3.3; interpretation in Sec. 4.2). This directly affects the quantitative meaning of the reported $\sigma_{\rm norm}$ excess in the set–coadd panel(s) and could bias conclusions about “ivar underprediction” for those particular tests.
Recommendation: In Sec. 3.3, explicitly define how the “nighttime coadd” used in set–coadd comparisons is constructed: (a) coadd of all four sets (including the tested set), or (b) leave-one-out coadd excluding the tested set. Prefer (b) for a clean independence assumption. If (a) is kept, derive (and state) the expected variance of (set $-$ coadd) including the covariance term, and adjust the normalization/expectation for $\sigma_{\rm norm}$ accordingly. Update Fig. 2 / Sec. 4.2 text to interpret set–coadd values against the correct null expectation.
-
Several core methodology choices are under-specified, limiting exact reproducibility and blurring interpretation: masking beyond “positive-ivar support,” treatment of low-ivar/edge pixels, whether any monopole/dipole (or low-order mode) removal is applied before differences, and the exact weighting conventions for weighted means/standard deviations (Sec. 2; Sec. 3.1–3.4; Sec. 4.1–4.3). Because $\sigma_{\rm norm}$ is sensitive to sky support and large-scale modes, these details materially matter.
Recommendation: Add (or expand) a dedicated “Implementation details” subsection in Sec. 2–3 specifying: (i) the exact sky mask(s) used (or explicitly state none beyond common positive-ivar), including whether Galactic plane / bright-source masks are applied; (ii) any ivar thresholds or trimming (e.g., excluding boundary pixels, applying an ivar percentile cut) and how missing pixels are handled; (iii) whether mean/monopole (and/or dipole) is removed from maps or from each difference map prior to computing $\sigma_{\rm norm}$ and roughness; (iv) explicit formulas for weighted mean and weighted standard deviation (including which weight map is used: $w_A$, $w_B$, or the combined weight from Eq. (1), and whether any Bessel/effective-$N$ correction is applied). Provide a small snippet of pseudo-code or an appendix with the exact computation pipeline so others can reproduce the numbers in Tables 1–2 and Figs. 1–5 unambiguously.
-
The paper’s language sometimes implies “ivar miscalibration,” but $\sigma_{\rm norm}>1$ conflates multiple effects beyond an overall ivar scaling error: correlated (non-white) noise, mapmaking transfer-function differences between splits, beam mismatch, and residual sky-signal leakage into differences (Sec. 3.1; Sec. 5). Without clarifying what DR6 ivar is intended to represent, it is ambiguous whether the results diagnose an ivar normalization problem or (more generally) the limits of a per-pixel white-noise model for map-to-map scatter.
Recommendation: In Sec. 3.1 and Sec. 5, explicitly list the assumptions required for $\sigma_{\rm norm}\approx1$: independence of noise between splits, Gaussianity, correct per-pixel variance $1/{\tt ivar}$, and identical effective beam/transfer function between the compared maps. Then, in Sec. 2/Sec. 5, summarize (with citations to Naess et al. 2025 and any DR6 documentation) how DR6 ivar maps are constructed and what they are/are not intended to capture (e.g., white-noise pixel variance vs inclusion/exclusion of correlated noise and filtering). Adjust phrasing throughout to distinguish: “ivar underpredicts the variance of difference maps under an independence+white-noise model” from the stronger claim “ivar is miscalibrated,” unless you add evidence isolating a pure scaling error.
-
Beam/transfer-function mismatch and scale dependence are not controlled, yet the diagnostics are performed on raw maps at native pixelization with no explicit beam matching. In pixel-space residuals, small differences in effective beam or filtering between splits can inflate $\sigma_{\rm norm}$ and the roughness metric even if power-spectrum analyses remain well behaved (Sec. 1; Sec. 4.1–4.3).
Recommendation: Add one controlled, map-space scale test: repeat the headline $\sigma_{\rm norm}$ (day–night; at least one representative set–set; and std vs null-el1) after smoothing both maps to a common lower resolution (e.g., Gaussian $2'$–$5'$) and/or applying a mild low-pass filter, and report how $\sigma_{\rm norm}$ changes with smoothing scale. This can be presented as a small additional panel/figure or a short table in Sec. 4. If $\sigma_{\rm norm}$ remains high after smoothing, that supports large-scale/low-$\ell$ contributions; if it drops, that points to high-$\ell$/beam/transfer or pixel-scale structure.
-
Spatial dependence and robustness to sky-signal contamination are only qualitatively addressed. Global $\sigma_{\rm norm}$ values over “positive-ivar” (and “top-10% ivar core” for null-el1) may hide localization to edges, specific scan regions, or foreground-contaminated areas, and may incorporate residual sky signal (e.g., foregrounds or variable sources) that does not cancel perfectly (Sec. 3.2–3.4; Sec. 4; Sec. 5).
Recommendation: Augment Sec. 4 with one lightweight robustness set: (i) compute $\sigma_{\rm norm}$ (and null-el1 variance/roughness ratios) in several large sky tiles/patches and show the distribution or scatter across patches; (ii) repeat headline numbers with a conservative high-latitude/foreground-avoidance mask (or a simple bright-source mask / $|T|$ clipping) to bound sky-signal leakage; (iii) for null-el1, repeat ratios for alternative ivar-core thresholds (e.g., top $5\%$, $20\%$, $50\%$, and full positive-ivar) to demonstrate stability. These checks can remain fully map-domain and would substantially strengthen the attribution and user guidance.
-
Null-el1 comparisons need clearer specification of weights and of what the null-map ivar means. Null maps are constructed by differencing subsets, so their noise properties and appropriate weighting are not necessarily comparable to standard coadds. As written, it is unclear which ivar is used for the weighted variance/roughness and why that choice yields an apples-to-apples comparison (Sec. 3.4; Sec. 4.3; Table 2; Fig. 5).
Recommendation: In Sec. 3.4 / Sec. 4.3, explicitly state whether the null-el1 statistics use: (a) the null map’s ivar, (b) the standard coadd’s ivar, or (c) a combined/constructed weight map. Justify the choice given the null construction. Consider adding a parallel unweighted (or fixed-mask, fixed-weight) comparison over the same geometric region so that the std vs null differences in variance/roughness are not driven by differing weight definitions. Clarify in Table 2 and Fig. 5 captions what “weighted $\sigma_T$” and roughness are weighted by.
-
The manuscript provides no uncertainty estimates (error bars / confidence intervals) for $\sigma_{\rm norm}$, variance ratios, or roughness ratios, making it difficult to assess the significance of differences between frequencies, split types, or arrays, and to interpret apparent changes at the $\sim10$–$20\%$ level (Sec. 4; Tables 1–2; Figs. 1–5).
Recommendation: Add uncertainty estimates via a block jackknife/bootstrap over sky patches (e.g., CAR tiles or RA/Dec blocks) to account for spatial correlations and non-Gaussianity. Report error bars (or at least patch-to-patch scatter) for the headline $\sigma_{\rm norm}$ values and for the null-el1 ratios in Table 2 / Figs. 4–5. Include brief methodological details (tile size, number of patches, estimator) in Sec. 3 or figure captions.