-
Reproducibility-critical methodological details are missing or too vague across all three modalities (behavior, DTI, DNAmAge), preventing rigorous evaluation of confounds and limiting reuse/replication (Sec. 2.1–2.4). For the foraging task, key parameters are not fully specified (number of boxes, spatial layout, arena geometry, phase durations/termination, inter-phase timing, reward schedule, habituation/training, counterbalancing/randomization of correct locations, and criteria for valid trials/phases) (Sec. 2.1, Sec. 2.3, Sec. 3.2). For DTI, acquisition and preprocessing are under-described (scanner/field strength, sequence, b-values, directions, resolution, TR/TE, anesthesia and motion mitigation, distortion/eddy correction, tensor fitting, skull stripping, atlas origin, registration strategy, ROI extraction, and QC thresholds) (Sec. 2.4). For DNAmAge, the methylation platform, preprocessing/normalization, clock model and its calibration/validation for this species/tissue/age range, and QC are not provided (Sec. 2.1–2.2).
Recommendation: Substantially expand Methods. (i) In Sec. 2.1 and Sec. 2.3, fully specify apparatus and protocol (arena dimensions; number/layout of boxes; cues/lighting; phase durations/stop rules; reward type/amount/schedule; habituation/training; whether phases are same day; and how correct-box locations are randomized/counterbalanced across bats and phases). State inclusion/exclusion rules (e.g., minimum activity/entries) and whether scorers were blinded. (ii) In Sec. 2.4, report full DTI acquisition parameters and a step-by-step preprocessing/QC pipeline (software + versions; motion/eddy/distortion correction; denoising; skull stripping; tensor fitting; atlas/template provenance; registration direction and method; ROI extraction; partial-volume mitigation; and QC thresholds plus number excluded for imaging QC). (iii) In Sec. 2.1–2.2 (or a dedicated subsection), describe biopsy processing, methylation assay, normalization, the epigenetic clock model (training set/species/tissue), expected error, QC filters, and how DNAmAge was computed. If space is limited, move full details to Supplementary Material but keep a complete summary in the main text.
-
The behavioral metrics are the paper’s central contribution, but their construct validity, robustness, and full mathematical definition are not yet established (Sec. 2.3, Sec. 3.2–3.5). (a) Adaptation Efficiency (unique incorrect boxes before first correct) ignores repeated errors/perseveration and sequence/time structure; two bats with very different perseverative profiles can score identically. (b) Interference Index (old-correct entries / total entries) can change due to denominator variation (overall exploration/activity), making interpretation ambiguous without also analyzing numerator/denominator and activity covariates. (c) Edge cases are not defined (e.g., if a bat never finds the new correct box; if total entries are zero) (Sec. 2.3.1–2.3.2). (d) Reliability/consistency across phases is not assessed, yet the discussion sometimes reads as if these are stable cognitive traits (Sec. 3.2–3.5).
Recommendation: In Sec. 2.3.1–2.3.2, provide fully specified computation rules (what counts as an “entry”; whether counts stop at first correct; handling of rapid re-entries; whether repeated visits to the same wrong box are counted; and explicit definitions for ‘never correct’ and zero-denominator phases—e.g., NA with exclusion rules, or capped values). Report task parameters needed to interpret ranges (e.g., total number of boxes; whether the stated $0$–$5$ range is intrinsic or a cap) (Sec. 2.3; see also Sec. 3.2). Add a short ‘metric validation/psychometrics’ Results subsection: relate each metric to simpler behavioral quantities (total incorrect entries, latency to first correct, total entries/activity), and report cross-phase consistency (e.g., correlation of phase-2 vs phase-3 versions). Consider reporting complementary metrics that capture perseveration and timing (e.g., total incorrect entries before first correct; number of revisits; latency) alongside the proposed metrics to support construct validity.
-
The analytic cohort is complete-case ($n = 30$ from an initial $n = 41$), but missingness and potential selection bias are not characterized (Sec. 3.1). If missing DTI/DNAmAge/behavior is related to age, sex, colony, or performance (e.g., motion-prone animals, low-activity animals, scan failures), effect estimates and null conclusions can be biased, and generalizability is unclear.
Recommendation: In Sec. 3.1 (and/or a Supplementary table), provide a CONSORT-style flow: starting $N$, numbers missing behavior/DTI/DNAmAge, and reasons (motion/QC failure, incomplete task, assay failure, etc.). Add an included-vs-excluded comparison table on available variables (sex, origin colony, DNAmAge, any behavioral summaries, and chronological age if known) and test whether key variables differ between groups. If differences exist, acknowledge potential bias and consider sensitivity analyses (e.g., inverse probability weighting, or at minimum qualitative discussion).
-
The statistical modeling framework is incompletely described and, in places, poorly matched to the bounded/zero-inflated outcomes, limiting power and interpretability (Sec. 2.5–2.6, Sec. 3.3–3.4). The paper uses Spearman for $\text{DNAmAge}$–behavior but ROI-wise ‘mass-univariate’ analyses via linear regression with covariates; however, Adaptation Efficiency is a bounded count (reported $0$–$5$) and Interference Index is a bounded proportion with many zeros—settings where OLS assumptions can be violated and effect estimates can be inefficient. In addition, the moderation-analysis Methods text contains formatting/corruption artifacts and lacks a clear executable specification (Sec. 2.6.2), and covariate handling is inconsistent between $\text{DNAmAge}$–behavior correlations and MD–behavior models.
Recommendation: Rewrite Sec. 2.5–2.6 as a coherent, final analysis plan with explicit model equations and consistent covariates. For $\text{DNAmAge}$–behavior, either (i) report covariate-adjusted associations via regression (e.g., $\text{Behavior} \sim \text{DNAmAge} + \text{Sex} + \text{Colony}$) or (ii) justify unadjusted Spearman and state it clearly. For ROI analyses, consider outcome-appropriate GLMs: (a) Adaptation Efficiency as count/ordinal (Poisson/negative binomial or ordinal regression; include overdispersion checks), and (b) Interference as binomial using counts (old-correct entries out of total entries) rather than a raw ratio, optionally with random effects if repeated measures are used. If you retain OLS for comparability, add residual diagnostics/robust regression and explicitly justify. In Sec. 2.6.2, remove numbering/table artifacts and provide the exact moderation model (e.g., $\text{Behavior} \sim \text{MD} + \text{DNAmAge} + \text{MD} \times \text{DNAmAge} + \text{Sex} + \text{Colony}$), plus criteria for running it (pre-specified even under null main effects vs only if main effects pass FDR).
-
Null findings are interpreted relatively strongly despite small $n$ and heavy multiplicity ($25$ MD measures $\times$ $4$ outcomes $= 100$ tests), without quantitative sensitivity/power characterization (Sec. 3.3–3.5, Conclusions/Sec. 4). Under these conditions, ‘no significant associations’ is ambiguous: it may reflect truly small effects, model mismatch, nonlinearity, or limited power after correction. The manuscript would be much more informative if it quantified what effect sizes are (and are not) compatible with the data.
Recommendation: Add a dedicated sensitivity/power subsection (end of Sec. 2.6 or start of Sec. 3.5). Report minimum detectable effect sizes for: (i) $\text{DNAmAge}$–behavior correlations (with chosen $\alpha$), and (ii) ROI-wise MD–behavior tests under the applied FDR regime. In Results, report effect sizes with uncertainty (e.g., correlation/regression estimates with confidence intervals) rather than only $p$-values, and summarize the distribution of observed effects across ROIs (e.g., median and maximum $|\text{effect}|$). Consider equivalence tests or Bayesian analyses for $\text{DNAmAge}$–behavior to quantify evidence against moderate-to-large effects. In the Discussion/Conclusions, revise language to ‘consistent with preserved performance across this sample/age range under the tested models’ and avoid implying strong evidence of absence without these sensitivity bounds.
-
DNAmAge is treated as a key aging variable, but its biological meaning in this cohort is insufficiently supported without reporting calibration to chronological age, tissue limitations, and potential measurement error (Sec. 2.1–2.2, Sec. 3.1–3.3). It is currently unclear (i) whether chronological ages are known and how $\text{DNAmAge}$ tracks them in this sample, (ii) whether the clock is validated for $R. aegyptiacus$ skin, and (iii) whether ‘age acceleration’ ($\text{DNAmAge}$ residualized on chronological age) could be a more interpretable predictor if chronological age is available.
Recommendation: In Sec. 3.1–3.3, report chronological age availability, its distribution, and $\text{DNAmAge}$–chronological age concordance (correlation and scatterplot). If chronological age is known, consider adding age-acceleration analyses ($\text{DNAmAge}$ residuals) and/or include chronological age alongside $\text{DNAmAge}$ to clarify whether $\text{DNAmAge}$ adds information beyond age. In Methods (Sec. 2.1–2.2), provide clock provenance/validation and expected error; discuss tissue specificity and attenuation due to measurement error in Sec. 3.5. If chronological age is unknown/estimated, state estimation method and limitations explicitly.
-
The manuscript appears not fully finalized: key tables contain placeholders/corrupted entries and some section formatting is broken, which undermines confidence in the results presentation and makes review/actionability difficult (Sec. 2.5, Sec. 3.1, Sec. 2.6.2). Examples include Table 1 containing ‘TBD’ entries, Table 2 showing corrupted headers and implausible values (e.g., Global MD Min/Max both $0.00$), and the malformed moderation-analysis section header/numbering artifacts.
Recommendation: Before resubmission, fully clean and validate the manuscript outputs. Replace ‘TBD’ with actual descriptive statistics or remove redundant tables; correct Table 2 headers and values and ensure MD values are in plausible ranges with units. Fix all section-numbering artifacts (e.g., stray ‘#’, malformed table-style headings), and cross-check consistency of cohort sizes across all tables/figures/text. Re-run scripts to regenerate tables/figures directly from the analysis pipeline to avoid manual transcription errors; note software versions and provide reproducibility details where possible.