-
Validation benchmark is inconsistent and currently unclear (Quijote vs CAMB HaloFit). The Abstract and Sec. 2.4 refer to comparison against the “high-fidelity Quijote simulation suite,” while Sec. 3.2 and the plotted comparisons describe a nonlinear CAMB HaloFit prediction, with no explicit Quijote comparison shown. These are materially different references (simulation vs fitting formula), and the inconsistency undermines the accuracy claims and interpretation of the reported percent-level agreement (Abstract; Secs. 2.4, 3.2; Fig. 1/2 captions; Sec. 4).
Recommendation: Make the validation reference consistent throughout. If HaloFit is the main reference, revise the Abstract, Sec. 2.4, and Sec. 4 to say so, and explicitly acknowledge limitations of treating HaloFit as “truth” at the few-percent level (especially beyond linear scales). If Quijote is intended as the benchmark, add a dedicated comparison in Sec. 3.2 (ratio to Quijote $P(k)$ with quantitative metrics over a stated $k$-range), and document exactly which Quijote runs/outputs are used (resolution, method, box size, redshifts), plus how $P(k)$ is measured/matched (mass assignment, deconvolution, binning, shot noise) so the comparison is like-for-like.
-
Core PM gravity solve and integration details are underspecified, preventing reproducibility and obscuring the origin of scale-dependent power suppression. In Sec. 2.3 and Eq. (3), the Poisson solve is written as $\hat\phi(k) = -4\pi G\,\hat\rho(k)/k^2$ but the comoving cosmological form is not fully specified (physical vs comoving density; use of overdensity $\delta$; any factors of $a$ and $\bar\rho$; treatment of the $k=0$ mode). Additional key implementation choices are missing: continuous $k^2$ vs discrete lattice Green’s function, whether CIC deconvolution is applied in the force computation (not just in measuring $P(k)$), how forces are differenced (stencil/order), boundary conditions (presumably periodic), and whether any additional softening exists beyond the mesh (Sec. 2.3).
Recommendation: Expand Sec. 2.3 (or add an Appendix referenced from Sec. 2.3) with the exact equations and discrete operators implemented: (i) write the comoving Poisson equation being solved and define the source field ($\rho$, $\rho-\bar\rho$, or $\delta$), including any $a$-dependent prefactors; (ii) state explicitly how the $k=0$ mode is handled; (iii) specify the Fourier-space Green’s function (continuous $-1/k^2$ vs discrete Laplacian/lattice Green’s function) and any force-kernel filtering; (iv) state whether and how CIC is deconvolved in the force computation and how forces are interpolated back to particles; (v) specify the finite-difference stencil used to compute $\nabla\phi$ on the grid and the assumed periodic boundary conditions.
-
Time integration choices are not justified and no convergence evidence is provided. Sec. 2.3 states $200$ timesteps uniformly spaced in scale factor $a$ and uses a leapfrog method, but does not specify the leapfrog variant (KDK vs DKD), the exact update variables (peculiar velocity vs canonical momentum), how Hubble drag and $aH(a)$ enter the updates, or why $200$ steps is sufficient for the claimed large-scale accuracy. Without a basic timestep convergence test it is difficult to separate integration error from force-resolution error in Sec. 3.2.
Recommendation: In Sec. 2.3, provide the explicit leapfrog update equations, state KDK vs DKD, and clarify whether you integrate velocities or momenta (and how Hubble drag is included). Add a minimal convergence check in Sec. 3.2 (e.g., $N_\text{steps}=100/200/400$) reporting fractional differences in $P(k)$ at $z=0$ (and ideally also at an intermediate redshift) over the $k$-ranges used for claims (e.g., $k<0.03\, h/\rm{Mpc}$). If additional runs are not feasible, soften/qualify accuracy claims and cite prior PM timestep guidance consistent with your setup.
-
Performance claims are difficult to interpret without a measured baseline and complete timing methodology. Sec. 3.1 (and the Abstract) compare $\sim 20\,\text{s}$ on GPU to an “estimated” $\sim 5$ hours on CPU without specifying CPU code, hardware, threading, FFT library, or measurement procedure. The GPU timing itself is not fully defined (does it include IC generation, FFT plan creation, I/O, and $P(k)$ measurement, or only the time-stepping loop?). Without this, the implied orders-of-magnitude speedup could be misleading and is not reproducible (Sec. 3.1).
Recommendation: In Sec. 3.1, provide a clear and reproducible timing methodology: list GPU model $+$ memory, host CPU/RAM, OS, CUDA/driver versions, Warp version, FFT backend (e.g., cuFFT), and precision (float32/float64) for particles/FFTs. Define exactly what is included in the reported $\sim 20\,\text{s}$. Add a measured CPU baseline for the same algorithm (or a standard PM code) on specified CPU hardware with core count, compiler flags, threading (OpenMP/MPI), and FFT library; if not feasible, clearly label the $5$-hour value as a rough estimate, explain how it was obtained, and tone down language in the Abstract/Sec. 4 accordingly. A simple kernel-level breakdown (CIC deposit, FFT Poisson, force interp, drift/kick) would further strengthen Sec. 3.1.
-
Physical setup and initial-condition details are incomplete, limiting reproducibility and potentially affecting the reported large-scale offset. Across Secs. 2.1–2.2, the manuscript does not fully specify: the exact cosmological parameters (e.g., Quijote fiducial values), the tool and settings used to compute the linear power spectrum at $z=127$ (CAMB/CLASS version/config), how growth factors $f_1$ and $f_2$ are obtained, random seed/phase generation for the $10$-realization ensemble, and whether phases are matched to any external benchmark (if Quijote comparison is intended). These omissions make it hard to diagnose the $\sim 0.96$ large-scale ratio noted in Sec. 3.2 (possible normalization/growth mismatch vs measurement pipeline mismatch).
Recommendation: Add a self-contained table (Sec. 2.1) listing the cosmological parameters used and explicitly cite the source (Quijote fiducial, if applicable). In Sec. 2.2, state the linear-theory code $+$ version $+$ key settings (transfer function, normalization, massive neutrinos if any), how $f_1/f_2$ are computed, the grid used for the Gaussian random field and displacement fields, and the RNG/seed strategy for independent realizations. If comparing to Quijote, clarify whether initial phases are matched; if not, avoid implying a realization-by-realization comparison. In Sec. 3.2, comment explicitly on possible causes of the $\sim 4\%$ large-scale offset and add a sanity check (e.g., compare early-time $P(k)$ to linear theory at high $z$, or verify that $\sigma_8$ of the ICs matches the reference).
-
Power-spectrum measurement and uncertainty treatment lack key specifications (binning, Fourier conventions, aliasing control, ratio definition), and this affects both reproducibility and interpretation of “few-percent” agreement. Sec. 2.4 does not fully specify the Fourier normalization convention, the exact CIC window used for deconvolution, $k$-bin edges/spacing, treatment near Nyquist, or whether any anti-aliasing/interlacing is used. Sec. 3.2/figures do not unambiguously state whether ratios are computed as $\langle P\rangle/P_{\rm ref}$ or $\langle P/P_{\rm ref}\rangle$, and the shaded band risks being misread as error-on-the-mean rather than scatter across volumes (Sec. 2.4; Sec. 3.2; Fig. 2).
Recommendation: In Sec. 2.4, explicitly define $\delta(x)$, your FFT normalization, and write $P(k)$ consistently under that convention so the shot-noise subtraction term is unambiguous. Specify $k$-binning (bin edges, linear/log spacing, number of bins, mode weighting), Nyquist frequency, and whether modes near Nyquist are excluded. Provide the explicit CIC window $W_{\rm CIC}(\mathbf{k})$ used for deconvolution and describe any regularization near zeros. State whether you use interlacing or another aliasing-control technique; if not, discuss expected aliasing impact and restrict/flag affected $k$-ranges. In Sec. 3.2 and the Fig. 2 caption, state precisely how the mean/ratio and the shaded band are computed (standard deviation across $10$ realizations vs standard error on the mean), and consider showing both if you interpret mean offsets at the percent level.
-
Validation scope is narrow relative to the breadth of claims. Results focus on a single box size, single particle/mesh resolution, a single cosmology, and primarily $z=0$ $P(k)$ (Secs. 2.1, 3.2), while Sec. 4 suggests broad applicability for BAO studies and covariance estimation. Without at least minimal checks across redshift and/or resolution, it is unclear how robust the conclusions are for typical analysis workflows.
Recommendation: Either broaden validation or narrow/qualify claims. If feasible, add in Sec. 3.2: (i) $P(k)$ at one or two intermediate redshifts (e.g., $z=1$, $0.5$), and/or (ii) a simple resolution/mesh test (e.g., $256^3$ vs $512^3$ mesh at fixed particles, or vice versa) to demonstrate expected convergence trends. If additional runs are not feasible, revise Sec. 4 to explicitly limit claims to the demonstrated regime (large scales, this specific setup) and clarify which applications are plausible (e.g., large-scale-only statistics) versus not yet validated (halo-scale/non-Gaussian/covariances requiring accurate mode coupling).