-
Physical meaning of $\sigma_a$ within proxy bins is not sufficiently justified, which weakens interpretation of ODG as a Yarkovsky diagnostic (Secs. 2.4–2.5, 3.2–3.3). Unlike classic V-shape/envelope approaches, $\sigma_a$ uses the full interior distribution and can be strongly influenced by initial ejection velocities, resonant sculpting/asymmetries, interlopers, and observational truncation. As a result, a trend in $\sigma_a$ vs proxy may not uniquely or linearly reflect Yarkovsky drift, and failures like Flora (Sec. 3.2.2) could arise from dynamics/membership rather than proxy inadequacy.
Recommendation: Add a short analytic justification and/or a controlled numerical experiment demonstrating when $\sigma_a$(proxy) should increase monotonically (approximately linearly) with a Yarkovsky-sensitive proxy. At minimum, in Sec. 2.5 and/or Sec. 3.3: (i) state explicit assumptions (single collisional family, roughly symmetric drift about a center, limited resonance truncation, limited contamination); (ii) discuss how asymmetry/truncation changes $\sigma_a$ even under normal Yarkovsky drift; and (iii) for one well-behaved family (e.g., Maria or Eos) and one problematic one (Flora), add diagnostics linking departures from linearity to known dynamical structures (e.g., proximity to resonances, one-sided truncation).
-
Core methodological steps are underspecified, threatening numerical reproducibility and potentially biasing both $G$ and $R^2$: (a) binning and bin-merging (Sec. 2.4), (b) definition/uncertainty of $\sigma_a$ per bin and the regression weights (Sec. 2.5), and (c) the exact definition of $R^2$ in the weighted setting (Secs. 2.5, 3.2; Table 1). In particular, “inverse of the variance of $\sigma_a$” is ambiguous (estimator variance vs $\sigma_a^2$), and different weighted-$R^2$ conventions can change values and the occurrence/interpretation of negative $R^2$.
Recommendation: Expand Secs. 2.4–2.5 into an algorithmic specification sufficient to reproduce Table 1 and Figs. 1–6: (i) define initial bin edges (min/max bounds, closed/open conventions) and whether bins are equal-width in proxy or otherwise; (ii) provide the exact bin-merging rule (direction, nearest-neighbor tie-breaking, iteration order), and report the final number of bins per family/proxy after merging; (iii) define $\sigma_a$ precisely (sample vs population SD; proper vs osculating $a$—see separate issue below) and how $\mathrm{Var}(\sigma_a)$ is estimated (analytic formula or bootstrap); (iv) state the exact weight $w_i$ used; and (v) give the explicit weighted-$R^2$ formula used (weighted SSE/SST about a weighted mean, etc.). Consider adding brief pseudocode and reporting $N$ per final bin (in captions or supplement).
-
Linearity is assumed but not tested beyond reporting $R^2$, and $R^2$ alone is used as “proxy efficacy” (Secs. 2.5, 3.2–3.3). Even if $da/dt \propto 1/D$, $\sigma_a$ need not be linear in $\log_{10}(1/D)$ (or any log proxy), especially under truncation/asymmetry or heterogeneous physical properties. With only $\sim 10$ binned points (often fewer after merging), $R^2$ can be unstable and can reward overfitting/accidental linearity; Flora indicates clear model failure (Sec. 3.2.2).
Recommendation: In Sec. 2.5 and Results (Secs. 3.2–3.3), test at least two alternative model forms for representative families (e.g., Maria/Eos and Flora): (i) piecewise/segmented linear regression, and (ii) a simple non-linear alternative (e.g., quadratic term or a model in $1/D$ without log). Compare fits using AIC/BIC or cross-validated prediction error (even leave-one-bin-out). Report residual plots (or summarize curvature/heteroscedasticity). If linear ODG remains the headline statistic, justify it explicitly as a robust summary and delineate when it fails (e.g., multi-component families).
-
Data provenance and dynamical-element choice are unclear, undermining interpretation of $\sigma_a$ and susceptibility to short-period variations (Secs. 2.1–2.2). The manuscript does not clearly state whether semimajor axes are proper or osculating, how families are assigned (which HCM/family catalog), and what (if any) interloper filtering is applied. Because $\sigma_a$ is sensitive to membership contamination and to dynamical environment, these details materially affect the ODG results and proxy comparisons.
Recommendation: In Secs. 2.1–2.2: (i) explicitly state whether $a$ is proper (preferred) or osculating and cite the source catalog; (ii) cite the family-classification source and version/date; (iii) provide a brief description of any interloper mitigation (taxonomy/albedo cuts, “core” memberships) or state explicitly that none was performed; and (iv) add a short robustness check (or an appendix) repeating ODG for at least one family using a stricter membership subset (e.g., “core” members, if available) or with a simple outlier-robust dispersion (see next issue).
-
Robustness to binning choices, outliers, and interior contamination is not demonstrated (Sec. 2.4, Sec. 3.2–3.3). Equal-width proxy bins plus merging can induce algorithmic dependence, particularly with skewed size distributions; $\sigma_a$ is also sensitive to outliers/interlopers. Consequently, the identity of the “best proxy” (Table 1) may not be stable—especially for marginal cases like Eunomia where Proxy_PD wins but differences may be small.
Recommendation: Add a sensitivity/robustness analysis (main text or supplement): (i) equal-count (quantile) bins vs equal-width; (ii) number of bins (e.g., $8/10/12/15$); (iii) minimum bin occupancy (e.g., $5$ vs $10$); and (iv) replace $\sigma_a$ with a robust alternative such as MAD (scaled to $\sigma$) or trimmed SD. Report whether (a) slopes $G$ and (b) the “best proxy” choice in Table 1 change under these variants. A bootstrap over the full pipeline (resample objects within family, recompute bins+fits) would also provide uncertainty on “best proxy” decisions.
-
Proxy definitions and units/dimensionality are not fully rigorous: taking $\log_{10}$ of dimensional quantities ($D$, $P$, $P\cdot D$) is unit-dependent, affecting intercept $C$ and potentially confusing cross-study comparability (Sec. 2.3). Units for $D$ and $P$ are not consistently stated, and the dimensional interpretation of $G$ is not provided (Secs. 2.3, 2.5, 3.2.2). Additionally, the physical motivation for Proxy_P and Proxy_PD as “Yarkovsky-sensitive” is only heuristic; the dependence on spin rate enters through the thermal parameter and obliquity, not simply $1/P$.
Recommendation: In Sec. 2.3: (i) redefine proxies in dimensionless form (e.g., $\log_{10}(D_0/D)$, $\log_{10}(P_0/P)$, $\log_{10}((P_0 D_0)/(P D))$) or explicitly fix units and note that unit changes shift $C$ but not $G$; (ii) state the units used for $D$ and $P$ and the implied units/interpretation of $G$; (iii) add $2$–$4$ citations to Yarkovsky theory describing rotation-rate dependence via the thermal parameter and clarify that Proxy_P/Proxy_PD are heuristic. Optionally, add an exploratory alternative spin-inclusive proxy closer to theory (e.g., involving $\omega$ or $\sqrt{\omega}$) and report whether conclusions change.
-
Selection effects from requiring both diameters and spin periods are not quantified, limiting interpretation of poor performance for spin-based proxies (Secs. 2.1–2.3, 3.1, 3.3, 3.5). The retained subset is likely biased toward larger/brighter objects and may under-sample the smallest (most drifted) members, potentially flattening $\sigma_a$–proxy relations and distorting comparisons among proxies.
Recommendation: In Sec. 3.1 (or a new subsection): report, per family, (i) total cataloged members vs those with $D$, vs those with $P$, vs those with both (final sample); (ii) distributions of $D$ and $P$ for retained vs full membership (where available); and (iii) a brief discussion of how missing small objects could bias $\sigma_a$(proxy). Explicitly scope the conclusion “spin period is ineffective” to the observed subset and note that incompleteness in $P$ may dominate Proxy_P/Proxy_PD performance.
-
Benchmarking and claims of robustness/universality are currently stronger than what is demonstrated (Secs. 1, 3.5, 4.1, 4.4). ODG avoids specifying a center, but it still assumes a coherent single-collision family with a monotonic dispersion–proxy relation; Flora and the excluded Nysa-Polana case suggest important limitations. The paper also does not directly compare ODG to traditional V-shape/envelope methods, so the claimed practical advantage remains largely qualitative.
Recommendation: Temper “universal/robust” phrasing in Sec. 1 and Discussion (Secs. 3.5, 4.1, 4.4), explicitly stating applicability conditions and citing Flora as a boundary case (Sec. 3.2.2). Add one direct benchmark: for at least one family, compare ODG outputs to a standard V-shape boundary fit (or literature values), and/or show via a Monte Carlo that ODG is stable under plausible center uncertainties that would affect apex-based methods.
-
Uncertainty reporting is incomplete for the main fitted quantities and the age–gradient correlation (Secs. 2.5, 2.6.2, 3.4; Table 1). Gradients are reported without uncertainties in Table 1 despite mentioning $\sigma_G$; confidence in whether $G$ differs from zero is unclear for weak fits. The Spearman test uses $N=6$ families and ignores uncertainties in both ages and $G$. Family age values are not consistently cited with uncertainties (Secs. 2.2, 3.2.1, 3.4; Table 1).
Recommendation: Add $\sigma_G$ (or $95\%$ CI) to Table 1 and describe how it is computed under the chosen weighting (Sec. 2.5). In Sec. 3.2.2, comment on which slopes are significantly non-zero. For the age–$G$ analysis (Secs. 2.6.2, 3.4): provide explicit literature citations and uncertainty ranges for each family age in/near Table 1, and propagate uncertainties via a simple Monte Carlo sampling of ages (and optionally $G$) to give a confidence interval for Spearman $\rho$. Also clarify which proxy’s $G$ is used per family and provide a sensitivity test excluding families with very low/negative $R^2$.
-
Inconsistency in Nysa-Polana exclusion rationale between Methods and Results (Secs. 2.2, 3.1) and inconsistent Spearman reporting across sections (Abstract; Sec. 3.4; Sec. 4.3). These inconsistencies impair trust in the selection function and secondary statistics.
Recommendation: Make Secs. 2.2 and 3.1 consistent by explicitly stating: (i) whether Nysa-Polana meets the $\geq 100$-member completeness criterion, (ii) how many objects have $D$ and $P$, and (iii) whether structural complexity is the primary reason for exclusion (with $1$–$2$ citations). Re-audit and harmonize the reported Spearman ($\rho$, $p$) values across Abstract, Sec. 3.4, and Sec. 4.3, noting any differences in sample definition if applicable.