-
Framing and interpretation overreach: the manuscript treats a null association ($\beta \approx 0.0073$, $p \approx 0.968$; adjusted $R^2 \approx 0.017$; Sec. 3.2) as evidence for “remarkable preservation”/slower atrophy with age (Sec. 1, 3.3, 4.3–4.4). With $n = 33$, cross-sectional data, an epigenetic-age range that is not contextualized relative to lifespan (Sec. 3.1), and assumption violations (Sec. 3.3), “no detectable association” is not equivalent to “preservation,” nor does it quantify an atrophy rate or support cross-species comparisons.
Recommendation: Reframe throughout (Abstract, Sec. 1, Sec. 3.3, Sec. 4.3–4.4) to state the main result as an absence of detectable association between DNAm age and TBV in this sample. Replace or qualify strong claims (e.g., “remarkable preservation,” “supports our hypothesis”) with effect-size/precision language. Report the unstandardized DNAm-age slope (mm$^3$ per DNAm-year) and a $95\%$ CI alongside the standardized $\beta$ (Sec. 3.2), and interpret what effect sizes remain compatible with the CI. Add a short “precision/power” paragraph (Sec. 3.3 or 4.3) describing minimum detectable effects given the observed TBV variance and age range. If you want to make a preservation-style claim, consider an equivalence/interval-null framing (define a smallest effect size of interest such as $\pm X\%$ TBV per year, and discuss whether the CI excludes it). Explicitly note that absence of evidence is not evidence of absence of atrophy and that longitudinal data would be required to estimate within-individual change.
-
Statistical inference is fragile given clear OLS assumption violations and outliers (Sec. 3.3; Fig. 5). The manuscript documents non-normal residuals and apparent influential points but still relies on standard OLS p-values/CIs as the primary basis for inference (Sec. 3.2–3.3), without robust SEs, robust regression, bootstrapping/permutation, or pre-specified sensitivity analyses.
Recommendation: Strengthen Sec. 2.4 (analysis plan) and Sec. 3.2–3.3 (results) with robustness checks: (i) report heteroskedasticity-robust SEs (e.g., HC3/HC4) for OLS; (ii) fit a robust regression (Huber or Tukey) and report the DNAm-age coefficient and CI; (iii) provide a nonparametric association (Spearman) and/or a permutation test for the DNAm-age slope; (iv) quantify influence (Cook’s distance/leverage/DFBETAs), define objective outlier criteria, and run sensitivity analyses with/without influential observations. Also check functional form (e.g., add a spline term or at least overlay LOESS in Fig. 4) to ensure a linear model is appropriate. Base the Discussion (Sec. 4.3–4.4) on the ensemble of robust/primary analyses rather than a single OLS p-value.
-
Outcome definition/validity: TBV computed as (number of non-zero voxels in skull-stripped averaged B0) $\times$ voxel volume (Sec. 2.2.3) is unconventional as a volumetric phenotype and may be sensitive to preprocessing artifacts (skull-stripping behavior, implicit thresholding, intensity scaling), EPI distortions, within-brain zeros, resampling/geometry differences, and scan/session effects. As written, it is unclear whether “non-zero voxels” is exactly equivalent to a binary brain mask, and whether all scans share identical voxel size and image dimensions.
Recommendation: In Sec. 2.2.3, explicitly define the binarization rule (e.g., $>0$ vs $>\epsilon$) and/or (preferably) compute TBV from an explicit binary brain mask rather than intensity non-zeros. Provide validation/QC: confirm (on all subjects or a representative subset) that non-zero counting matches mask-based volume; show example QC images (skull-stripping results) and report how failures were handled. Add acquisition and preprocessing details needed to judge comparability (scanner/field strength, sequence, voxel size, b-values, number of directions, resampling, distortion/motion correction, skull-stripping tool and parameters; Sec. 2.2.3). If scans differ by protocol/session, incorporate scan/batch covariates (or justify why not) and discuss potential bias. Consider adding a brief reliability/sensitivity check (e.g., TBV from first B0 vs averaged B0; sensitivity to threshold $\epsilon$) to show the phenotype is stable.
-
Scope mismatch due to missing behavioral outcomes: the paper’s stated integrative aim (epigenetic age $\leftrightarrow$ brain structure $\leftrightarrow$ spatial memory) is central in the Abstract/Introduction (Sec. 1) but cannot be evaluated because behavioral metrics could not be extracted (Sec. 2.2.2, 3.1). As a result, the current manuscript is primarily a single association test (DNAm age vs TBV), and the cognitive narrative currently overpromises relative to delivered analyses.
Recommendation: Align framing with actual content. Revise title/Abstract (and early Sec. 1) to position this as a preliminary/feasibility analysis of DNAm age vs global TBV, with behavioral analyses explicitly out of scope due to parsing failure. In Sec. 2.2.2 and 3.1, quantify the behavioral-data failure (how many files/animals affected, what exactly broke, what was attempted) and move detailed behavioral metric definitions to an Appendix to reduce distraction. In Sec. 4.1–4.4, clearly separate conclusions supported by current data (structural association test) from planned future work (cognition linkage), and outline a concrete recovery plan (schema validation, header harmonization rules, semi-manual extraction, unit tests for parsers) or justify why recovery is not feasible.
-
Epigenetic age measure is under-described and not calibrated/benchmarked within this cohort (Sec. 2.2.1, 3.1). The DNAm clock is described as “previously established and validated,” but key details are missing (training set, tissue(s), CpG count, accuracy/MAE, age range, normalization/QC, implementation). Additionally, if chronological age is known, the manuscript does not report DNAm–chronological correspondence or consider age-acceleration formulations; if chronological age is unknown, this limitation should be explicit because it affects interpretability of “advanced epigenetic age.”
Recommendation: Expand Sec. 2.2.1 with full clock provenance: cite the clock paper/tool, describe training data (tissue, age range, $n$), number of CpGs, reported performance (MAE/correlation), and your pipeline (normalization/QC/software versions). Clarify whether DNAm ages were computed anew or imported. If chronological ages exist, report DNAm age vs chronological age correlation and consider (at least in supplement) models using chronological age and/or DNAm age acceleration (DNAm residuals controlling for chronological age). If chronological ages do not exist, explicitly state this and temper interpretations of DNAm age as “biological age,” noting the additional uncertainty (including that the clock is trained on skin, not brain, which may limit inferences about brain aging).
-
Cohort/design and missingness are under-specified, limiting interpretability and raising potential bias concerns (Sec. 2.1, 3.1, 3.3). Eight of $41$ bats are excluded due to missing imaging/TBV, but reasons for missingness and whether it relates to age/sex/colony are not documented. The age distribution is not contextualized relative to species lifespan, and potential confounders strongly tied to brain volume (body mass/head size, health status, scan/session effects) are not addressed beyond sex and origin colony (Sec. 2.4, 3.2).
Recommendation: In Sec. 2.1 and 3.1, document sampling/inclusion criteria, captive vs wild-derived status, and why $8/41$ lacked usable imaging (scan failure, QC exclusion, preprocessing failure). Provide a missingness check: compare DNAm age/sex/colony between included vs excluded animals. Contextualize the DNAm-age range ($6.62$–$15.07$ years; Sec. 3.1) against typical/maximum lifespan and life stage in $R.\ aegyptiacus$. If available, add morphometrics (mass, forearm length, head size) and relevant health indices as covariates or at minimum report them and discuss as limitations; likewise, clarify whether all DTI data were acquired under a single protocol/session and, if not, incorporate batch/session/scanner covariates.
-
Reporting/reproducibility gaps: key descriptive statistics, full model outputs, and methodological details needed to evaluate and reproduce results are incomplete (Sec. 2.2.1, 2.2.3, 2.3.2, 3.1–3.2). Current reporting emphasizes standardized $\beta$ and a p-value, but does not present a full regression table (coefficients/SEs/CIs), complete descriptives for the $n=33$ subset, or clear statements about standardization conventions and reference categories.
Recommendation: Add (main text or supplement): (i) a descriptive table for the $n=33$ analysis subset (mean/SD/median/min/max for DNAm age and TBV; counts by sex/colony); (ii) the full regression table including unstandardized and standardized coefficients, SEs, $95\%$ CIs, p-values, and model fit metrics; (iii) a clear definition of z-scoring (computed over which sample; sample SD vs population SD; Sec. 2.4), and reference categories for $C(\text{Sex})$ and $C(\text{Origin\_colony})$ (Sec. 2.4.1). Include a brief code/data availability statement and QC criteria summary (even if data are restricted).