-
Cohort description is internally inconsistent across Abstract, Sec. 2.1/Table 1, Sec. 3.1, Sec. 4.2, and Fig. 1D (e.g., $N=31$ vs $N=33$; differing sex/colony counts; DNAm age ranges 6.6–15.1 vs 6.62–13.84/13.8). Sec. 3.1 also contains apparent template/table artifacts (e.g., irrelevant fields such as “New Surgeon,” “Single (Award/Alter.),” and implausible age ranges). These inconsistencies make it unclear which animals contribute to which analyses and undermine all reported inferential statistics, especially interaction models that are highly sensitive to missingness and sample size.
Recommendation: Provide a single, definitive cohort accounting and propagate it consistently everywhere (Abstract; Sec. 2.1; Table 1; Sec. 3.1; Sec. 4.2; Fig. 1D). Include: starting $N$, exclusion criteria (behavioral incompletion, imaging QC failures, missing DNAm age), and final $N$ per analysis type (behavior-only; brain-only; moderation). Add a concise CONSORT-style flow diagram or table listing per-metric $N$ (important for Sec. 3.2–3.4). Remove template artifacts in Sec. 3.1 and replace with a clean demographic summary (mean$\pm$SD, min–max for DNAm age; sex/colony counts). If different $N$s are unavoidable (e.g., some metrics undefined for some bats), report them explicitly for each regression in Sec. 3 and/or in Tables 2–4.
-
The operational definition, statistical implementation, and strength of claims regarding “cognitive resilience” are currently misaligned with the evidence (Sec. 1; Sec. 2.4.3; Sec. 3.4.1–3.4.2; Sec. 4.3–4.4). The moderation equation is inconsistently written (interaction term not clearly shown in places), the moderation search space is ambiguous (“promising trends” vs “all pairs”), and the key ROI 19 interaction is uncorrected/non-FDR-significant but described with strong language (“compelling/strong/buffering/mitigated”), risking overinterpretation in a modest-$N$, cross-sectional design.
Recommendation: Define cognitive resilience explicitly in Sec. 1 with citations (e.g., preserved function relative to age-related burden), and state clearly that here it is operationalized as a DNAmAge $\times$ brain-metric interaction (cross-sectional proxy, not longitudinal/clinical resilience). Correct and standardize the moderation model equation in Sec. 1 and Sec. 2.4.3 to include DNAmAge, BrainMetric, and DNAmAge$\times$BrainMetric. In Sec. 2.4.3 and Sec. 3.4.1, specify exactly: (i) which behavior outcomes entered moderation, (ii) which brain metrics (24 ROIs $\times$ volume/intensity), (iii) whether any pre-screening occurred and if it was pre-registered or post hoc, (iv) the total number of interaction tests, and (v) the exact FDR family and procedure (e.g., BH at $q=0.05$). Throughout (Abstract; Fig. 1C; Sec. 3.4.2; Sec. 4.3–4.4), clearly label ROI 19 as exploratory and report both raw and FDR-adjusted $p$-values; replace causal/buffering wording with correlational language and emphasize hypothesis-generation.
-
Mean b0 signal intensity is treated as a region-specific biological marker without sufficient acquisition/preprocessing detail or controls for non-biological intensity scaling (Sec. 2.3.1–2.3.2; Sec. 3.3.1; Sec. 4.3). b0 intensity is not inherently quantitative and can vary with coil sensitivity/bias field, receiver gain, session/scanner differences, subject positioning, EPI distortions (if DTI-EPI), motion, and partial voluming. Without explicit intensity normalization and distortion/bias handling, ROI intensity effects (including ROI 14 and ROI 19) could reflect acquisition/processing artifacts or global scaling rather than aging biology.
Recommendation: Expand Sec. 2.3.1 to report key MRI acquisition parameters: scanner model and field strength, sequence type (EPI DTI?), TR/TE, voxel size, number of directions/$b$-values, number of b0 volumes, and whether parameters were identical across bats/sessions. In Sec. 2.3.2, document preprocessing affecting intensity comparability: brain extraction, bias-field correction, denoising, motion/eddy correction, EPI distortion correction (topup/fieldmap) or rationale if absent, and any intensity normalization (e.g., divide ROI mean by whole-brain mean/median; $z$-score within subject; histogram matching). Add robustness checks: (i) include global b0 intensity (whole-brain mean/median) as a covariate in ROI-intensity models or analyze relative intensity (ROI/global), and (ii) report whether the ROI 14 age effect remains after such adjustment. Temper mechanistic interpretations in Sec. 4.3 (gliosis/iron/water etc.) as speculative unless supported by quantitative MRI or histology; frame as hypotheses for follow-up.
-
ROI labels are largely numeric (ROI 14/19 central to conclusions) without anatomical names, laterality, or localization (Sec. 2.3.2; Sec. 3.3.1; Sec. 3.4.2; Sec. 4.3–4.4). This prevents readers from evaluating biological plausibility, comparing to prior literature (e.g., hippocampal/striatal/cortical systems for navigation/flexibility), or interpreting why specific ROIs might relate to perseveration or consolidation.
Recommendation: Add an atlas/ROI mapping table (preferably in Sec. 2.3.2 or Supplement with clear in-text pointers) listing ROI 1–24 with: anatomical name, laterality (if applicable), broad class (cortical/subcortical/white matter), and brief description of atlas provenance. Provide a figure showing the atlas in template space with ROI 14 and ROI 19 highlighted. In Sec. 3.3–3.4 and Discussion, refer to ROIs by anatomical names in addition to indices (e.g., “StructureName (ROI 14)”). If the atlas does not support reliable anatomical correspondence (e.g., composite parcels), state this limitation explicitly and correspondingly soften functional claims.
-
Outcome distributions and modeling choices for behavioral metrics (counts, times, proportions) are not convincingly justified for linear regression assumptions (Sec. 2.2.2; Sec. 2.4.1–2.4.2; Sec. 3.2). Perseveration outcomes are small-range counts with likely zero inflation; latency measures are typically skewed/censored; Correct_Box_Preference is a proportion with variable denominators (“after first correct”) and an unclear log transform that may be invalid if zeros occur. These issues can bias estimates/$p$-values and complicate interpretation of effect sizes.
Recommendation: In Sec. 2.2.2, provide explicit formulas for each metric (including Correct_Box_Preference) and specify handling of edge cases (e.g., no post-discovery entries; failure to find correct box). In Sec. 2.4, justify the modeling family per metric and report transformations precisely (e.g., $\log_{10}(x+1)$, logit, arcsine-sqrt). Strongly consider re-fitting key outcomes with appropriate models: negative binomial/Poisson (or zero-inflated) GLMs for count perseveration; survival/AFT or appropriately transformed time models for latencies; beta regression or binomial (successes/total) models for proportions like Correct_Box_Preference. At minimum, add sensitivity analyses showing that the main inferences (especially Perseverative_Errors_STM age effect and any ROI 19 interaction trend) are robust across reasonable alternative model families and/or transformations.
-
Multiplicity handling and the “universe” of tests are not transparently defined, especially for moderation (Sec. 2.4.2–2.4.3; Sec. 3.4.1). The manuscript alternates between moderation on a subset of “promising trends” and an “extensive series across all brain–behavior pairs.” Without an explicit count of tested hypotheses per family, readers cannot interpret FDR-adjusted results, nor judge the evidential strength of uncorrected $p$-values (e.g., $p=0.004$ for ROI 19 interaction).
Recommendation: In Sec. 2.4, define separate hypothesis families and test counts (e.g., age$\to$behavior; age$\to$brain volumes; age$\to$brain intensities; brain$\times$age moderation per behavior metric). Report the exact number of tests included in each FDR correction and the procedure (e.g., BH). In Sec. 3.2–3.4, consistently label raw vs FDR-adjusted $p$-values and provide effect sizes with 95% CIs. If moderation tests were pre-screened, describe the screening rule, whether it used the same data (risking circularity), and consider presenting both (i) a fully exploratory all-pairs analysis with stringent correction and (ii) a smaller, pre-specified hypothesis set (if defensible) analyzed confirmatorily.
-
Regional volume analyses appear not to control for total brain size/intracranial volume, limiting interpretation of null or localized volumetric findings (Sec. 2.3.2; Sec. 3.3; Sec. 4.2). Without adjusting for total brain volume (or equivalent), regional volume associations may reflect global size differences (including sex/allometry) rather than region-specific effects.
Recommendation: Compute total brain volume (or intracranial volume, if feasible from the same atlas/mask) and either (i) include it as a covariate in ROI volume regressions, or (ii) analyze normalized volumes (ROI/total brain) and justify the choice. Report both absolute and size-adjusted results (at least in Supplement) and state clearly in Sec. 2.4.2 and Sec. 3.3 which approach is primary. Also clarify whether voxel dimensions/resampling affect volume computation (native vs template space) (Sec. 2.3.2).
-
Core reproducibility details are missing or fragmented across behavioral, imaging, and statistical pipelines (Sec. 2.2–2.4), and key result tables are referenced but not adequately presented (Tables 2–4). The current description is insufficient for replication and for evaluating robustness (diagnostics, influential points, missingness patterns).
Recommendation: Strengthen Methods: (i) Sec. 2.2: arena geometry and box layout, phase durations and criteria, reward details, habituation/training, event coding, and explicit rules for aborted/incomplete phases; (ii) Sec. 2.3: atlas provenance and validation, registration direction and parameters (rigid/affine/nonlinear; cost function; interpolation), masking/erosion, and QC criteria; (iii) Sec. 2.4: software and package versions, covariate coding, missing-data handling (complete-case per model vs other), and routine diagnostics (residuals, heteroskedasticity, Cook’s distance). Provide complete model outputs in Tables 2–4 (or Supplement): $N$, $\beta\pm$SE, 95% CI, (adjusted) $R^2$ or pseudo-$R^2$, and both raw and FDR $p$-values. Include at least leave-one-out or influence sensitivity for the headline effects (Perseverative_Errors_STM; ROI 14 intensity). State whether code/data (or a de-identified derivative) will be shared.