-
The manuscript’s interpretive claims about identifying “brain-aging signatures” and defining “Resilient/Vulnerable” phenotypes are not supported by the evidence because the predictive model is non-informative ($R^2 = -0.101$; Sec. 3.3.1) and the reported imaging–DNAm associations are weak (Sec. 3.2.2). Interpreting non-zero Elastic Net coefficients from a poor/unstable model as biological “signatures,” and describing residual-quartile groups ($N\approx8$ each) as phenotypes without uncertainty/effect sizes, risks overinterpretation (Sec. 3.3.2–3.3.3, Sec. 3.4, Conclusions).
Recommendation: Reframe Sec. 3.3.2–3.3.3, Sec. 3.4, and Conclusions so coefficient patterns and residual-based grouping are explicitly presented as workflow demonstrations and hypothesis generation only (not evidence of signatures/phenotypes). If any region-level comparisons remain in the main text, add effect sizes and uncertainty (e.g., bootstrap CIs), clearly label them exploratory, and avoid phenotype language unless the predictor shows at least modest validity/calibration. Consider moving the residual-based phenotype subsection to an appendix as a worked example.
-
The neuroimaging feature used—atlas-based regional mean “signal intensity” from a single 3D volume—is not sufficiently defined, controlled, or justified. The manuscript does not clearly specify what the 3D image represents (e.g., $b_0$, FA/MD map, T1/T2, magnitude image), nor does it describe essential preprocessing steps (bias-field correction, skull stripping/brain masking, registration type, QC) or intensity normalization/harmonization. Without these, between-bat intensity differences may reflect scanner/session scaling, coil loading, registration errors, or other nuisance variation rather than biology (Sec. 2.3.1–2.3.2, Sec. 3.2.2).
Recommendation: Add a dedicated subsection in Sec. 2.3 detailing MRI acquisition and preprocessing: scanner/field strength, sequence/contrast, TR/TE, voxel size, whether these are raw images or diffusion-derived maps, and why the term “DTI” is appropriate (or remove it). Describe brain masking, bias-field correction, atlas registration (rigid/affine/nonlinear; software; interpolation), and QC (e.g., visual checks, registration failures). Implement and report an intensity normalization strategy across subjects (e.g., within-brain $z$-scoring, histogram matching, reference-tissue scaling) and re-run core analyses (Sec. 3.2.2, Sec. 3.3.1) to assess robustness. Provide a table listing the 24 atlas regions with anatomical names, voxel counts, and label IDs used in figures/text.
-
Behavioral metrics (Exploration Entropy; Navigational Redundancy) being exactly zero for all bats and phases is highly suggestive of a data extraction/parsing bug or an empty-sequence failure mode, yet the manuscript does not provide diagnostic evidence to distinguish technical failure from true behavioral invariance (Sec. 2.2.1–2.2.3, Sec. 3.2.1). This undermines the multi-modal premise and leaves a key modality unresolved.
Recommendation: Extend Sec. 2.2 and Sec. 3.2.1 with a systematic behavioral pipeline audit: (1) report per-bat/per-phase counts of events, sequence lengths, and number of unique boxes visited; (2) show 2–3 representative raw table snippets (rows/columns) from the xlsx files and the parsed sequences to verify correct column names, action codes, timestamps, and box IDs; (3) validate entropy/redundancy on toy synthetic sequences that must yield non-zero values; (4) explicitly report how cases with zero post-discovery actions are handled; and (5) if a bug is found, recompute behavioral features and re-run multi-modal analyses. If the task truly yields near-deterministic behavior, add simpler robust summaries (e.g., latency to first discovery, total visits, perseveration/repeat count) as fallback features and discuss experimental causes.
-
Key aspects of the Elastic Net modeling and evaluation pipeline are under-specified and may be methodologically flawed for small $n$ ($N=31$) and correlated predictors. It is unclear whether (i) predictors were standardized within each training fold (to prevent leakage), (ii) alpha/l1_ratio and regularization strength were selected via nested CV (inner tuning within each LOOCV training set), (iii) categorical variables (Sex, Origin) were encoded consistently, and (iv) missing values/outliers were handled (Sec. 2.4.2, Sec. 3.3.1). Coefficient interpretation is especially fragile without stability analysis.
Recommendation: Expand Sec. 2.4.2 to fully specify: exact design matrix columns; encoding of Sex/Origin; preprocessing/scaling; missing-data handling; software and versions; and hyperparameter search ranges. Implement leakage-free training by fitting scalers/encoders inside each training fold only. Prefer nested CV (inner CV/grid search for hyperparameters; outer LOOCV for evaluation). Add model sanity checks: permutation test (shuffle DNAm age and re-fit) and coefficient stability (bootstrap or selection frequency across folds). Report additional metrics (RMSE, correlation $r$, calibration slope/intercept) alongside $R^2$/MAE in Sec. 3.3.1.
-
Residual-quartile “Resilient/Vulnerable” phenotypes are not meaningful when derived from a model with negative predictive value; residuals mainly reflect model error/noise. Additionally, the paper does not clarify the biologically standard alternative—DNAm age acceleration relative to chronological age—because chronological age availability/usage is unclear (Sec. 2.4.3, Sec. 3.3.3; also target column suggests skin clock).
Recommendation: Revise Sec. 2.4.3 and Sec. 3.3.3 to avoid resilience/vulnerability labels unless a validated predictor exists. If chronological age exists, add it explicitly and analyze DNAm age acceleration (DNAmAge residualized on chronological age, with sex/origin covariates as appropriate), and report basic clock validation in this cohort (DNAm vs chronological correlation and error). If chronological age is unavailable, state this prominently and tone down any “biological age discrepancy” interpretation. If stratification is still desired for exploration, use model-free approaches (e.g., clustering/PCA on normalized imaging features) and then test association with DNAm age, clearly labeled exploratory.
-
There is a persistent mismatch between the paper’s stated goals (DW‑SV + advanced behavior + multi-modal prediction) and what was actually implemented (regional mean intensity + demographics), which risks overselling the contribution and confusing readers about what was tested vs. planned (Abstract, Introduction, Sec. 3.4, Conclusions).
Recommendation: Revise the Abstract, Introduction, and Conclusions to separate (i) planned aims (DW‑SV from 4D DWI; behavioral metrics) from (ii) achieved analyses (3D intensity + demographics). State explicitly that the dataset as provided cannot test the DW‑SV hypothesis. Reposition the manuscript as a feasibility/analysis pipeline report under real-world constraints, and list concrete requirements for future data collection (true 4D diffusion with bvals/bvecs; behavior design to elicit variability; larger $N$).
-
Small sample size ($N=31$) relative to the number of candidate predictors (24 regions + global + Sex + Origin; plus potential one-hot expansion) and multiple exploratory views (correlations, coefficient inspection, residual grouping) creates substantial risks of overfitting, unstable feature selection, and false positives. The manuscript acknowledges some limitations but does not quantify instability or address multiple comparisons systematically (Sec. 3.2.2–3.3.2, Sec. 3.4).
Recommendation: Add a focused limitations/statistics paragraph in Sec. 3.4 (or a dedicated Limitations subsection) quantifying the $n$-to-$p$ challenge and explicitly stating that region-level findings are exploratory. Apply multiple-comparison control where univariate tests are presented (e.g., FDR). Include stability/sensitivity analyses (bootstrap coefficients; repeated CV seeds where applicable; permutation baselines). Consider dimension reduction (e.g., PCA on normalized regional intensities) as a more stable exploratory alternative and report whether any principal components associate with DNAm age.