-
Epigenetic age (DNAmAge) estimation is insufficiently documented and not validated in this cohort, limiting interpretability of both the “age” predictor and the CRS residualization that conditions on it (Sec. 2.1.1, Sec. 2.3.1, Sec. 3.2–3.3, Sec. 4). The variable name suggests a skin-based clock (e.g., `DNAmAgeBat.Rousettus.aegyptiacus_Skin`), but tissue provenance, assay platform, clock training/coverage, expected error (MAE), preprocessing, and the DNAmAge–chronological age relationship in this sample are not reported.
Recommendation: Add a dedicated Methods subsection describing: tissue source and collection timing relative to behavior/MRI; methylation assay and preprocessing/normalization; whether the clock is published or newly trained (training set/species, CpG count, model type); and clock performance (correlation, MAE) including a DNAmAge vs chronological-age plot for this cohort. Report chronological age summary stats (range/mean/SD) and its correlation with DNAmAge (Sec. 3.1), and discuss how clock uncertainty could attenuate associations (Sec. 4).
-
Behavioral metric definitions and the construction of CFI are central but currently ambiguous/inconsistently specified, compromising reproducibility and potentially validity (Sec. 2.2.1–2.2.2). In particular: (i) the adaptation-score equations are mathematically unclear because the operator with $\text{Time\_to\_First\_Correct}$ is missing (division vs multiplication; Sec. 2.2.1, p. $4$); (ii) short- and long-term adaptation scores are repeatedly mislabeled (both as $\text{Adaptation\_Score\_P1}$); (iii) $\text{Time\_to\_First\_Correct\_P2}$ is listed but it is unclear how/if it enters CFI; (iv) edge cases ($\text{Total\_Visits}=0$; never finding correct; $\text{time}=0$; capping at $10800$ s) and censoring are not handled explicitly; and (v) mixing error fractions with time components creates unit/scale issues that may drive the z-scored composite.
Recommendation: Rewrite Sec. 2.2.1–2.2.2 with formal, unambiguous definitions (symbols, units, allowed ranges) for every raw measure and derived score; explicitly state the implemented adaptation formula (e.g., $(1-\text{PE}/\text{TV})/\text{Time\_to\_First\_Correct}$ vs $(1-\text{PE}/\text{TV})\times(1/\text{Time\_to\_First\_Correct})$) and ensure the text matches code. Fix naming/labeling (Adaptation_Score_STM vs _LTM consistently). Specify guardrails for division-by-zero and define how “no correct visit” is treated (censoring vs imputation to $10800$ s), including counts per phase. Add a compact table summarizing all components and sign conventions (higher=better), and (ideally) provide pseudocode or a script in Supplementary Materials.
-
Behavioral outcomes appear to have restricted dynamic range/ceiling or floor effects, which can easily produce null associations even if true effects exist (Sec. 3.2; Fig. 3). The manuscript notes concentrations/limited variability but does not quantify the extent (e.g., proportion of zero errors, tied times, or near-ceiling learning scores). This undermines the interpretability of null DNAmAge–CFI and CRS–MD results.
Recommendation: In Sec. 3.2 (and/or a Supplement), report descriptive statistics for each raw component and phase score (mean/SD, median/IQR), plus proportions at bounds (e.g., $\%$ with $0$ perseverative errors; $\%$ with no incorrect visits; $\%$ censored at $10800$ s). Provide histograms/density plots for key components and a correlation matrix among component scores. Consider reporting an internal-consistency/reliability check for the composite (e.g., correlations among components; a simple omega/alpha with appropriate caveats) to justify CFI as a unified construct.
-
Use of CRS as in-sample regression residuals is under-motivated and can obscure more direct, interpretable brain–behavior relationships (Sec. 2.3.1–2.3.3, Sec. 3.3–3.4). Stating CRS is “uncorrelated with predictors by design” is a mathematical property of OLS residuals (with intercept) rather than evidence of “age-independent cognition.” If the $\text{CFI}\sim\text{DNAmAge}$ model is weak/non-significant, CRS may be nearly identical to CFI but noisier, and the two-stage procedure creates a generated-regressor setup that is not discussed.
Recommendation: Clarify in Sec. 2.3.1 exactly how predictors were encoded (Sex/Origin reference levels; intercept; centering/scaling of DNAmAge), which $N$ was used to fit the CRS model, and which diagnostics/outlier checks were applied. In Sec. 3.4, add a primary or parallel “single-step” model that directly tests brain–behavior links with covariates, e.g., $\text{CFI} \sim \text{MD} + \text{DNAmAge} + \text{Sex} + \text{Origin}_{colony}$ (and similarly for ROI MD with multiplicity correction). If CRS remains central, reframe it as an “age-adjusted performance residual” with limitations, and consider out-of-sample residualization (e.g., cross-validation) as a robustness check.
-
DTI acquisition, preprocessing, registration, and QC are too sparsely described to evaluate validity or enable replication (Sec. 2.1.4, Sec. 2.3.2). The current description suggests matching atlas and MD-map dimensions, but dimension matching does not ensure anatomical alignment. MD is also sensitive to partial volume (CSF contamination), which is particularly relevant for small brains and ROI averages.
Recommendation: Expand Sec. 2.1.4 to include scanner/sequence details (field strength; $b$-values; number of directions; voxel size; TR/TE; anesthesia/motion context), preprocessing steps (denoising, motion/eddy-current correction, susceptibility distortion correction, brain extraction), tensor-fitting software/algorithm, and explicit atlas registration (linear/nonlinear, reference space, QC criteria). Report QC/exclusion criteria and whether ROI erosion or tissue masking was used to mitigate partial volume. Provide enough detail that another lab could reproduce MD maps and ROI means.
-
ROI analysis strategy is likely underpowered and difficult to interpret with $N\approx33$ and $24$ ROIs, especially after FDR, and currently lacks effect-size uncertainty reporting (Sec. 3.4.2). Additionally, the ROI models use $\text{CRS} \sim \text{Regional\_MD\_Value}$ without scan-quality covariates; residualization handles DNAmAge/sex/colony but not imaging confounds (motion/SNR, session effects).
Recommendation: For Sec. 3.4.1–3.4.2, report standardized effect sizes and confidence intervals for global and ROI regressions (not only $p/q$-values), and consider robust regression or bootstrap CIs as a sensitivity check. If available, include scan QC covariates (e.g., motion, SNR, outlier counts). To reduce multiplicity and increase interpretability, consider (i) a priori ROIs motivated by spatial-memory circuitry, and/or (ii) dimensionality reduction (e.g., PCA of ROI MD) with pre-specified components.
-
Interpretation of null results as evidence for preserved cognitive flexibility and “decoupling” from microstructural integrity is currently stronger than warranted given limited $N$, multiple comparisons, restricted behavioral range, and potentially modest age span (Sec. 3.5, Sec. 4). With the current design, null results are consistent with both biological resilience and insufficient sensitivity/type-II error.
Recommendation: Temper claims in Sec. 3.5 and Sec. 4 and explicitly present detectable-effect considerations: provide approximate minimum detectable correlations for $N=33$ (global MD) under the chosen thresholds, and discuss how ceiling/restricted range attenuates observed effects. Add a clearly labeled limitations paragraph distinguishing (i) what the study can conclude (within sampled ages and task sensitivity) from (ii) what it cannot. Optionally include a small simulation/post-hoc sensitivity analysis showing the effect size range that would likely be missed.
-
Internal inconsistencies in sample sizes and analysis flow (and therefore in derived quantities such as CRS) reduce confidence in the reported results: (i) DTI processing mentions $28$ MD NIfTI files while the main analysis states $N=33$ with DTI; (ii) Methods imply CRS fit on “subjects with complete data,” but Results report $F(3,37)$ for CFI regression ($N=41$), leaving unclear which sample generated the CRS used in DTI analyses (Sec. 2.3.1; Sec. 3.2–3.3).
Recommendation: Add a concise CONSORT-style flow (or table) specifying $N$ at each stage (behavioral inclusion; DNAmAge availability; MRI availability; QC exclusions), and ensure all Ns match across text/figures. State explicitly whether the CRS residualization model is fit on $N=41$ or $N=33$, and (ideally) refit on the imaging subset if CRS is used as the dependent variable in imaging analyses (or justify using the full cohort and explain implications).
-
The behavioral task description is too sparse to assess whether the operationalization truly reflects “cognitive flexibility” versus exploration strategy, motivation, or procedural artifacts (Sec. 2.1.3; Introduction; Sec. 4). Key details are missing: apparatus geometry/box count and spacing, cue availability, phase durations and retention intervals (STM vs LTM), counterbalancing of correct locations, individual vs group testing, reward schedule, and motivational state (e.g., food restriction).
Recommendation: Expand Sec. 2.1.3 with a task description from the animal’s perspective: layout, number of options, cues, trial structure, phase start/stop rules, STM/LTM interval lengths, counterbalancing/randomization, and reward/motivation procedures. This will also help justify the mapping from perseverative errors and visits to “flexibility” in Sec. 1 and Sec. 4.