-
Selection effects, completeness, and dataset provenance are not documented or controlled, yet several central results (e.g., the outward increase in mean diameter; spectral-type fractions vs. $a$; resonance-region contrasts) are sensitive to detectability and heterogeneous catalog coverage (Sec. 2.1–2.2, Sec. 3.1–3.2, Sec. 3.4). Requiring both diameter and spectral type can impose a complex, distance- and albedo-dependent selection function that can mimic or amplify radial size/composition gradients. The manuscript also does not clearly specify the sources/versions of diameters and taxonomies, uncertainty scales, or conflict-resolution rules when multiple sources exist.
Recommendation: Add a dedicated subsection on data provenance and selection/completeness (expand Sec. 2.1–2.2 or add Sec. 2.2.1). (i) Provide a table mapping each key field ($D$, $a/e/i$, spectral type, $H$/albedo if used) to its source catalog(s), release/version, access date, and any quality-flag filters; state how conflicting values were resolved and typical uncertainties for $D$ and taxonomy. (ii) Add an attrition/flow table (or diagram) reporting counts after each merge/filter step (initial catalogs $\rightarrow$ merged $\rightarrow$ after requiring $D$ $\rightarrow$ after requiring type $\rightarrow$ after main-belt cuts in Sec. 2.2). (iii) Quantify how the final “$D$+type” sample differs from the parent diameter-only sample in $a$, $H$ (or $D$), and (if available) albedo; include a diagnostic plot such as $D$ (or $H$) vs. $a$ for both samples. (iv) Define at least one conservative “quasi-complete” subsample (e.g., a diameter threshold that is plausibly complete across $2.0$–$3.5$ AU for the adopted diameter catalog) and repeat the key gradient and resonance analyses in Sec. 3.2 and Sec. 3.4 to demonstrate robustness; clearly state in Sec. 3.5/Sec. 4 which conclusions are conditional on selection effects.
-
The headline “mean size increases with semimajor axis” result is not sufficiently validated against confounding by selection and compositional/family structure (Sec. 3.2.1). Given strong type–$a$ zoning and potential completeness variation with distance, an apparent size–$a$ trend can arise without an intrinsic physical gradient.
Recommendation: Strengthen Sec. 3.2.1 with controlled and uncertainty-aware quantification: (i) report an effect size with uncertainty (e.g., slope of $\log_{10}(D)$ vs. $a$ with $95\%$ CI; not only Pearson $r$ and $p \approx 0$). (ii) Repeat the trend within major taxonomic complexes (e.g., S-only, C-only, X-only; or “S-complex/C-complex/X-complex/V”) to check whether the trend persists after conditioning on composition. (iii) Perform sensitivity analyses using multiple minimum-diameter cuts (e.g., $D>5/10/20$ km) and show whether the slope and binned means remain stable. (iv) Where possible, separate family-dominated regions vs. background (or at least show the trend excluding the largest few families if family labels are available). Summarize these robustness checks in a short table/appendix and update Sec. 3.5/Sec. 4 wording to reflect what remains significant after controls.
-
Clustering claims are not yet physically well anchored: clustering is performed in $(a,e,i)$ without clearly stating whether elements are proper or osculating, and the linkage to known asteroid families is asserted largely qualitatively with limited quantitative validation (Sec. 2.4.2, Sec. 3.3.2). Because families are typically defined in proper elements ($a_p$, $e_p$, $i_p$), clustering in osculating elements may mix or fragment families depending on epoch and secular evolution.
Recommendation: In Sec. 2.1–2.2 and Sec. 2.4.2, explicitly state whether $a/e/i$ are proper or osculating; if osculating, justify the choice and discuss limitations in Sec. 3.5. If proper elements are available, rerun DBSCAN/GMM in $(a_p,e_p,i_p)$ (or provide a comparison on a subset). In Sec. 3.3.2, add quantitative cluster–family validation using published family membership labels: report purity/completeness per major family and/or an overall metric (e.g., adjusted Rand index), plus a contingency table for the main families. Also add robustness checks showing how the number of clusters, noise fraction, and the identity of major clusters change under reasonable DBSCAN ${\epsilon}/{\rm min\_samples}$ variations and GMM component-number ranges; this will allow you to narrow claims to the clusters that are stable and physically interpretable.
-
Resonance/Kirkwood-gap analysis is under-specified and currently risks confounding resonance effects with global radial gradients and taxonomy/family mix (Sec. 2.5, Sec. 3.4). The definitions of “Inside Gaps”, “Adjacent to Gaps”, and “Background” are not given as explicit semimajor-axis intervals, and pooled comparisons may inadvertently compare different $a$-regimes rather than isolating resonance proximity. Interpreting observed differences as size-dependent dynamical filtering (e.g., Yarkovsky delivery into resonances) therefore remains ambiguous.
Recommendation: In Sec. 2.5, list the exact $a$-intervals used for each resonance and for each category (“inside/adjacent/background”), including window widths and boundary conventions. In Sec. 3.4, reframe the analysis to use local, controlled comparisons: for each resonance separately, compare inside vs. adjacent within a narrow $a$-range matched in distance from the resonance center (and ideally matched in inclination/eccentricity range), and report effect sizes per resonance ($\Delta$mean $\log D$ with CI; Cramer’s $V$ for type changes). Where feasible, repeat within major taxonomic complexes (S/C/X) and/or family vs. background to reduce confounding. Add a sensitivity analysis varying window widths/centers within literature-reasonable ranges and show that conclusions persist. If you retain the Yarkovsky interpretation, explicitly label it as a consistent hypothesis and cite relevant dynamical work; otherwise soften causal language (Sec. 3.5, Sec. 4).
-
Predictive modeling (XGBoost/RF) is reported at a high level without sufficient evaluation protocol detail, baselines, leakage controls, or uncertainty estimates, limiting interpretability of “limited predictive power” and its physical implications (Sec. 2.4.3, Sec. 3.3.3). Random train/test splits can leak family/cluster structure across splits (near-duplicates in orbital space), inflating performance; accuracy $\approx 53\%$ alongside macro-F1 $\approx 0.16$ suggests strong class-imbalance or majority-class dominance that is not discussed.
Recommendation: Expand Sec. 2.4.3 and Sec. 3.3.3 to include: (i) explicit sample sizes for each task and per class; (ii) the exact split/CV scheme (fold count, stratification, repeats, random seeds) and whether hyperparameter tuning is nested; (iii) class-imbalance handling (class weights/resampling) and full metric reporting as mean$\pm$SD (regression: $R^2$/MAE/RMSE; classification: accuracy, macro- and weighted-F1, per-class precision/recall/F1) plus confusion matrices (appendix). Add simple baselines (e.g., predict mean $\log D$; predict majority type; or $a$-only logistic baseline) and compare against them. To mitigate leakage, consider blocked CV (e.g., by semimajor-axis bins) and/or group CV by family/cluster if labels exist; report how performance changes. Update interpretation in Sec. 3.5/Sec. 4 to distinguish “orbit-only features are insufficient” from stronger claims about stochasticity.
-
Core methodological choices needed for reproducibility are missing or scattered, particularly numerical specifications for binning, KDE bandwidths, DBSCAN/GMM settings, and resonance-window definitions; additionally, statistical reporting focuses on tiny $p$-values with limited effect sizes/uncertainty, making practical significance hard to judge (Sec. 2.3–2.5, Sec. 3.2–3.4).
Recommendation: Consolidate and specify all analysis parameters in Sec. 2.3–2.5 (and/or an appendix): (i) explicit bin edges (or start/end + number of bins) for $a/e/i$ and boundary conventions; (ii) KDE kernel type, dimensionality (2D/3D), bandwidth selection procedure and final bandwidths, feature scaling, and boundary treatment; (iii) DBSCAN ${\epsilon}/{\rm min\_samples}$ used for final results and the exact $k$-distance configuration; (iv) GMM covariance type, initialization, convergence criteria, and component-number search range for AIC/BIC; (v) resonance intervals as per the resonance issue above. In Results (Sec. 3.2–3.4), supplement $p$-values with effect sizes and uncertainty (e.g., $\eta^2$ or rank-based analogs for Kruskal–Wallis; Cramer’s $V$ for chi-squared; confidence intervals for correlations and mean differences) and briefly address multiple-testing control if many binwise tests are performed.