-
The likelihood function used for inference is not written down explicitly, and critical measure/normalization details are ambiguous (Sec. 2.2–2.3). In particular, Eq. (2) specifies $f(r|M)$ only up to proportionality, without stating whether it is a 3D spatial density $f_\mathrm{3D}(x|M)$ (units $1/\mathrm{volume}$) or a 1D radial pdf $p(r|M)$ (units $1/\mathrm{length}$). The stated “intensity” $\lambda(r|M,\theta)=\langle N_\mathrm{sat}(M)\rangle\cdot f(r|M)$ is therefore ambiguous for a Poisson point-process likelihood. It is also unclear whether the likelihood includes a Poisson term for satellite counts per halo ($N_\mathrm{sat}|M$), or whether counts are treated as fixed and only positions are used. Finally, the hard fitting cut $r < 5~\mathrm{Mpc}/h$ must enter as truncation/renormalization; this is not shown, yet it directly impacts parameter estimates and the interpretation of reported biases (Sec. 3.1).
Recommendation: In Sec. 2.2–2.3, add the full conditional log-likelihood used for MLE. A clear option is to write, per halo $h$ with mass $M_h$, a Poisson term for the count and a product over satellite locations: $\ln L = \Sigma_h [ -\mu_h(\theta) + N_h \ln \mu_h(\theta) - \ln(N_h!) + \Sigma_{i=1}^{N_h} \ln f_\mathrm{trunc}(x_{hi}|M_h,\theta) ]$, where $\mu_h(\theta)=\langle N_\mathrm{sat}(M_h;\theta)\rangle\times P(r<r_\mathrm{cut}|M_h,\theta)$ if you model counts in the truncated region, and $f_\mathrm{trunc}$ is the spatial kernel normalized over the truncated domain $r<r_\mathrm{cut}$. Explicitly define whether $f$ is a 3D isotropic density (with normalization $\int f_\mathrm{3D}~d^3x = 1$) or a 1D radial pdf (with the implied $4\pi r^2$ Jacobian when mapping to 3D). State exactly how the $r<5~\mathrm{Mpc}/h$ cut is implemented (truncated kernel and/or truncated mean), and clarify independence assumptions (halos independent given the halo catalog; satellites i.i.d. within halos) that justify factorization.
-
Mock-catalog construction and “ground truth” are under-specified, limiting interpretability of recovery/bias and preventing reproducibility (Sec. 2.1, Sec. 3.1, Sec. 3.3). The paper does not clearly document how halos are generated (N-body vs analytic; box size; cosmology; mass definition; halo finder), the full HOD used to populate galaxies (central occupation, scatter, any non-Poisson satellite dispersion), the true satellite radial profile used in the mock (and whether it matches Eq. (2)), the true $\alpha_c(M)$ (constant vs mass-dependent and its parameters), and how luminosities/marks are assigned (including scatter and whether centrals/satellites differ).
Recommendation: Expand Sec. 2.1 (or add an Appendix) to fully specify the mock pipeline: simulation/cosmology details; halo definition (e.g., $M_\mathrm{200c}$ vs $M_\mathrm{vir}$), halo selection ($M_\mathrm{vir} \geq 10^{13}~M_\odot/h$), and any boundary conditions; exact central+satellite HOD functional forms and stochasticity (Poisson or otherwise); the injected satellite radial profile and its parameter values (including whether $\alpha_c$ is mass-dependent in the truth); and the luminosity assignment/mark model (functional form, scatter, and whether it is radius-dependent). Provide enough detail that an independent reader can regenerate the catalogs and reproduce Figures in Sec. 3.
-
Uncertainty quantification, parameter degeneracies, and validation of inference are missing, making it hard to assess the practical significance of the reported trends (Sec. 3.1–3.2, Sec. 4). The paper reports point estimates and an AIC difference, but gives no confidence intervals on ($M_\mathrm{sat}$, $\alpha_\mathrm{sat}$, $\alpha_0$, $\beta$), no correlation/degeneracy analysis (e.g., between $\beta$ and HOD parameters), and no coverage or mock-to-mock variability assessment despite having ten catalogs. This is especially important because the reported HOD bias ($\sim0.25$ dex in $M_\mathrm{sat}$) cannot be judged without uncertainties, and because $\Delta$AIC can appear large for large $N$ even if effect sizes are small per satellite.
Recommendation: In Sec. 3.1–3.2, report uncertainties and covariances for all fitted parameters (e.g., via the observed Fisher/Hessian at the MLE, bootstrap over halos, or per-mock fits with scatter across the ten catalogs). Add at least one diagnostic of degeneracies (a covariance matrix table or 2D contours in an appendix). For the mass-dependent concentration result, report $\beta \pm \sigma_\beta$ and show $68\%$ bands on $\alpha_c(M)$ vs $M$ in the relevant plot. Consider a simulation-based calibration / coverage check on the synthetic setup (even a lightweight version): generate multiple realizations at the fitted parameters and verify that the true parameters are recovered with correct coverage under the assumed likelihood.
-
The interpretation and use of AIC as “decisive evidence” for mass-dependent concentration need additional context and robustness checks (Sec. 2.3, Sec. 3.2, Sec. 4). With $\approx68$k satellites, small per-point likelihood improvements can yield very large $\Delta\mathrm{AIC}$. Additionally, AIC assumes an effectively independent sample; conditional independence given halos may be violated by substructure, exclusion, or mock-generation correlations, reducing the effective sample size and potentially inflating $\Delta\mathrm{AIC}$. Since the models are nested ($\beta=0$), complementary diagnostics (likelihood-ratio behavior, predictive performance) would strengthen the claim.
Recommendation: In Sec. 3.2, report the per-satellite (or per-halo) log-likelihood improvement between models (e.g., $\Delta\ln L/N$) alongside $\Delta$AIC to communicate effect size. Add at least one robustness check: (i) cross-validation across halos (fit on a subset of halos, evaluate predictive log-likelihood on held-out halos), or (ii) out-of-sample validation across mocks (fit on some catalogs, score on others). Optionally report BIC as a sensitivity check (not as a replacement). In Sec. 4, soften language around “decisive” thresholds and explicitly note the dependence of AIC strength on $N$ and on independence assumptions.
-
The radial kernel choice (exponential) and halo-centering assumptions are not sufficiently justified, and robustness to profile misspecification is not demonstrated (Sec. 2.2, Eq. (2); Sec. 3.2). An exponential satellite profile is non-standard relative to common NFW/gNFW satellite prescriptions in HOD work. If the mocks were generated with the same exponential, the strong AIC preference for $\beta$ may be partly a “matched-model” effect; if not, kernel mismatch can spuriously induce mass dependence in $\alpha_c(M)$.
Recommendation: Clarify in Sec. 2.1–2.2 whether the mock truth uses the same exponential form as Eq. (2). Then add a robustness experiment in Sec. 3.2: either (a) fit an alternative, more standard profile family (e.g., NFW with a concentration–mass relation, or a gNFW) and check whether $\beta$ remains $>0$, or (b) generate additional mocks with an NFW-like truth and fit the exponential model to test whether $\beta$ is spuriously inferred. Explicitly discuss the mapping between $\alpha_c$ and a physical scale radius (scale length $=R_{\rm vir}/\alpha_c$) to avoid confusion with standard NFW concentration conventions.
-
Conditioning vs full Neyman–Scott generative story is conceptually mixed, affecting interpretation of the residual analysis (Sec. 2.2–2.5, Sec. 3.4). The text alternates between (i) conditioning on an observed halo catalog (centers/masses treated as fixed covariates for the satellite likelihood) and (ii) describing halos as a homogeneous Poisson parent process whose failure is diagnosed by residuals at $5$–$10~\mathrm{Mpc}/h$. However, if halos are conditioned upon, no parent-process likelihood is being fit, and the enormous underprediction at $5$–$10~\mathrm{Mpc}/h$ is largely an expected consequence of fitting only a 1-halo component (and/or truncating at $5~\mathrm{Mpc}/h$), not uniquely a diagnosis of “parent Poisson” failure.
Recommendation: Revise Sec. 2.2–2.5 to clearly separate: (A) the conditional satellite model given the halo catalog (the likelihood you optimize), versus (B) any model for the halo (parent) process or 2-halo component (not currently included). Reframe Sec. 2.5 and Sec. 3.4 to state explicitly that the $5$–$10~\mathrm{Mpc}/h$ residuals demonstrate the breakdown of a pure 1-halo conditional model and motivate adding an explicit 2-halo term and/or a clustered parent process, rather than diagnosing a parent Poisson assumption unless that term is truly included in the likelihood. If you do include a parent term, write it down and justify the homogenous Poisson assumption.