-
The central empirical aim (identifying genuine neuro‑cognitive decoupling/resilience in aging bats) is not achieved because all cognitive outcomes used for decoupling analyses are synthetic (normal draws), so reported ROI associations, p-values, and $R^2$ cannot be interpreted biologically and may be misread as real discoveries (Abstract; Sec. 2.2; Sec. 3.2.1; Sec. 3.3.2; Sec. 3.4; Sec. 4.3–4.4).
Recommendation: Decide and implement one of two coherent paper identities: (i) Methods/pipeline + simulation validation: explicitly reframe the manuscript as a technical paper and restructure Results so that any brain–behavior “findings” are presented only as simulation-based validation with known ground truth (see next issue). (ii) Empirical bat aging study: repair behavioral extraction and rerun the full pipeline on real behavioral metrics, then (and only then) interpret ROI patterns. In either case, remove or heavily qualify inferential language in Sec. 3.3.2/Sec. 4 that reads like biological discovery, and add prominent labeling in-text and in figure captions wherever synthetic behavior is used.
-
Behavioral extraction failure is the practical bottleneck, but the manuscript provides only a high-level statement (NaNs/zero variance) without a technical post‑mortem, making it hard for others (or the authors) to reproduce, diagnose, or fix the pipeline (Sec. 2.2.1–2.2.2; Sec. 3.2.1).
Recommendation: Expand Sec. 2.2 and Sec. 3.2.1 with an actionable failure analysis: include representative raw log excerpts (rows/columns), enumerate parsing assumptions (timestamp formats, event codes, sheet names, delimiters, missing values), specify which assumptions were violated, and show intermediate sanity checks (e.g., number of visits per trial, time ordering). Provide a corrected or more robust parsing approach (e.g., schema validation, flexible timestamp parsing, explicit trial boundary detection) and a minimal manual-validation protocol (spot-check $N$ trials vs. video/hand labels if available). If permitted, share anonymized example logs and the parsing script.
-
Synthetic behavioral data are under-specified and internally inconsistent: the manuscript alternates between simulating “behavioral residuals” vs. simulating raw behavioral metrics and then residualizing them, which changes the mathematics and interpretation (Sec. 3.2.1 vs. Sec. 3.3.1; Sec. 2.4.1–2.4.2). The current approach is not a meaningful validation because random normal draws can still yield nominally significant results under multiple testing.
Recommendation: Add a dedicated Methods subsection (e.g., Sec. 2.2.3) specifying exactly what is simulated (raw metrics vs residuals), distributions (means/SDs), constraints (non-negativity, time caps), correlation structure across behavioral metrics, and random seeds. Replace ad hoc normal draws with a formal simulation study aligned to the paper’s goal: • Null simulations (no planted brain–behavior link) to quantify type‑I error under the full $6\times25$ testing regime. • Signal simulations with planted effects in selected ROIs (and with correlated ROI predictors reflecting the empirical ROI correlation matrix from Sec. 3.3.1) to show sensitivity/specificity and effect recovery. Report performance with and without multiple-comparisons correction (see next issue).
-
Multiplicity and correlated predictors are not handled: the framework implies $\sim150$ ROI-by-metric brain–behavior tests ($6$ behavior metrics $\times$ (Global$+$24 ROIs)), with ROIs strongly correlated (Sec. 3.3.1), yet p-values are presented without an explicit correction strategy (Sec. 2.4.3; Sec. 3.3.2). With $N\approx30$–$33$, uncorrected significance is not interpretable.
Recommendation: In Sec. 2.4.3, specify a primary inferential plan for real data (e.g., FDR across all ROI tests per behavioral metric, or across all tests; or permutation-based max‑$T$ controlling family-wise error). In Sec. 3.3.2, either (a) report corrected results (even if illustrative) or (b) label all p-values as uncorrected and non-inferential. Consider adding a complementary multivariate strategy that reduces the multiple-testing burden and handles correlated ROIs (e.g., PCA/PLS on ROI $MD$ residuals; ridge/elastic-net with cross-validation) while keeping the residual-based “decoupling index” as the interpretability layer.
-
Two-stage residualization (brain residuals and behavior residuals computed from fitted normative models, then regressed residual-on-residual) raises inference/overfitting concerns in small samples if uncertainty from stage-1 fits is ignored (“generated regressor” / potential optimism). The manuscript does not clarify equivalence to single-stage covariate adjustment or how uncertainty is propagated (Sec. 2.4.1–2.4.3; Sec. 3.3).
Recommendation: Clarify in Sec. 2.4 that the residual–residual regression is (under standard OLS conditions) related to a single-stage model (e.g., $Behavior \sim ROI\_MD + Age + Sex + Colony$) but that two-stage procedures can complicate uncertainty accounting. For robust inference, add one of: • Cross-fitting: fit normative models in training folds and compute residuals in held-out folds before testing associations. • Full-pipeline bootstrap: resample animals, refit normative models, recompute residuals, and refit decoupling models to obtain confidence intervals for $\beta$ and $R^2$. Also state whether interactions/nonlinear age terms were considered (e.g., splines/quadratic), since mis-specified normative models can distort residuals.
-
Behavioral task protocol and MRI acquisition/preprocessing are under-described, limiting reproducibility and interpretability (Sec. 2.2; Sec. 2.3). Missing details include key spatial-memory task design/definitions and core DTI acquisition parameters and preprocessing/QC steps (scanner/sequence, b-values, directions, voxel size, motion/eddy correction, tensor fitting, atlas registration, exclusion criteria).
Recommendation: Substantially expand Sec. 2.2 with a precise task description: apparatus geometry, number/identity of boxes, definition of trial/session, phase structure/timing, reward contingencies, criteria for progression, and handling of non-compliance/aborts; define edge cases for each metric (e.g., if correct box never visited). Expand Sec. 2.3 into (i) acquisition (scanner, field strength, sequence, b-values, \#directions, TR/TE, resolution, anesthesia/handling) and (ii) preprocessing/QC (denoising, motion/eddy/susceptibility correction, tensor estimation, software versions, registration method/metrics, QC thresholds, exclusions). Provide atlas provenance (Sec. 2.3.1) sufficient for ROI extraction reproduction.
-
Cohort size/composition is inconsistent across sections (e.g., $N=30$ in Sec. 2.1 vs $N=33$ in Sec. 3.1/Sec. 3.2.2; sex/colony counts differ; epigenetic age max differs 13.84 vs 15.07). This makes it unclear which animals contribute to which models and figures.
Recommendation: Create one authoritative cohort accounting table (in Sec. 2.1 or as Table 1) listing $N$ for: (a) DTI available, (b) behavior logs available, (c) behavior metrics successfully extracted, (d) multimodal intersection used in each analysis. Include sex/colony breakdown and epigenetic-age range per subset. Update all text and figure captions in Sec. 3.1–3.3 to match, and briefly state exclusion reasons (missingness, QC failures).
-
ROI interpretability is blocked because ROIs are referred to only as ROI_1…ROI_24 without anatomical names; atlas details are minimal (Sec. 2.3.1; Sec. 3.3.2; Sec. 4.3). This limits both biological interpretation and future reuse.
Recommendation: Provide an ROI lookup table (main text or Appendix) mapping ROI indices to anatomical labels (and laterality), plus voxel counts/volumes. In Sec. 3.3.2 and Discussion, refer to ROIs as “$ROI_k$ (RegionName)”. Expand Sec. 2.3.1 with atlas origin (species-specific vs adapted), resolution, and registration QC examples (e.g., overlay snapshots).
-
Positioning/novelty is somewhat overstated relative to established human-literature residual frameworks (cognitive reserve, brain maintenance, brain-age/cognitive-age residuals). The manuscript currently under-engages with this lineage, making it hard to see what is fundamentally new beyond application to bats and epigenetic age (Sec. 1; Sec. 4.1).
Recommendation: In Sec. 1 and Sec. 4.1–4.4, add a concise related-work paragraph explicitly connecting to residual-based resilience/cognitive reserve frameworks and clarifying what is novel here (e.g., epigenetic-age normative model; bat model; atlas-based DTI pipeline). Moderate claims in Sec. 3.4/Sec. 4.4 to reflect proof-of-concept status until real behavioral metrics are available.