-
Core aim and framing mismatch: the manuscript repeatedly frames the study as integrating brain structure (DTI‑MRI) with epigenetic age and spatial cognition, and as testing DNAmAge $\times$ brain‑metric interactions (Sec. 1, Sec. 2.4, Sec. 2.5.2), but no usable MRI metrics were extracted due to the $b=0$ failure (Sec. 3.1, Sec. 3.4). The delivered results therefore do not test the central hypothesis about structural moderators of cognitive resilience, yet the Abstract/Introduction/Conclusions language still implies that neural structural correlates were analyzed.
Recommendation: Reframe the paper explicitly as a feasibility/QC/pilot report (or a “lessons learned” pipeline paper) plus a limited DNAmAge–behavior exploration, rather than a completed multimodal brain–age–cognition study. Concretely: (i) revise the Abstract and Conclusions to state up front that MRI feature extraction failed and that the planned interaction analyses were not performed; (ii) in Sec. 1, add a short paragraph distinguishing planned vs achieved analyses and narrowing the contribution; (iii) move detailed unrealized DTI‑metric/interaction model descriptions (Sec. 2.4, Sec. 2.5.2) to an “Intended analyses” subsection or Appendix, clearly labeled as not executed; and (iv) temper or remove claims about “structural signatures” and broad “cognitive resilience” if resilience is not operationalized with available data.
-
MRI failure diagnosis is under‑documented and may be premature: the reported ‘empty/all‑zero first $b=0$ volume’ prevents masking and all downstream diffusion metrics (Sec. 2.4.1–2.4.2, Sec. 3.1), but the manuscript does not provide sufficient forensic QC to determine whether this is truly absent signal vs. conversion/scaling/indexing errors, nor whether alternative salvage routes exist. In addition, using reconstructed/assumed bvals/bvecs (generic 30‑direction scheme) raises fundamental validity concerns even if masking were fixed (Sec. 2.4.1).
Recommendation: Add a dedicated MRI QC/forensics subsection (Sec. 2.4 and/or Sec. 3.1) with concrete evidence and diagnostics: (i) acquisition details (scanner/sequence, TE/TR, resolution, number and placement of b0s, gradient export availability); (ii) Bruker$\rightarrow$NIfTI conversion tool, version, settings, and validation steps; (iii) volume‑wise intensity summaries (min/max/mean, histograms) after NIfTI scaling is applied (e.g., verifying header scl_slope/scl_inter, dtype/NaNs), plus visual montages of several slices; (iv) verify whether the first volume is truly a $b=0$ (check bvals ordering vs actual volume signal; b0s can be interleaved or later); (v) attempt robust masking from the mean of all b0s or another reference image (and describe why alternatives failed); and (vi) explicitly justify or retract the use of generic bvecs—ideally recover true gradients from Bruker metadata (e.g., method file) or state clearly that quantitative DTI cannot be interpreted without correct bvecs/bvals. If the dataset is indeed unrecoverable, demonstrate this with the QC outputs above (e.g., a small set of representative subjects).
-
Behavioral parsing failures are insufficiently specified and currently undermine interpretability and replicability: most planned behavioral metrics are reported as zero/NaN due to parsing/log-format issues (Sec. 2.3, Sec. 3.2), yet the paper does not provide enough detail about the raw log structure, event coding, phase rules, or parsing logic to (a) assess whether the failures could be corrected, (b) validate that the remaining outcomes ($\text{Time\_to\_First\_Food}$, $\text{Switch\_Cost}$) are correctly computed, or (c) reproduce the extraction.
Recommendation: Expand Sec. 2.3 and Sec. 3.2 into a more rigorous behavioral methods + parsing/QC section: (i) fully specify the task (arena/box layout, reward contingencies per phase, session duration/timeouts, phase transition rules, how ‘correct box’ is defined/stored); (ii) document the raw log/Excel schema with an anonymized example (column names, data types, header rows, where phase IDs and ‘correct box’ appear, allowed action codes such as E/F and variants); (iii) describe parsing rules precisely (handling of missing/duplicate timestamps, capitalization/whitespace variants, sheet naming differences, per‑phase boundaries); (iv) perform and report a manual validation against ground truth on a subset (e.g., 5–10 sessions): compute entries/perseveration/exploration by hand and compare to script outputs; (v) if repair is feasible, re‑extract and report the full intended metric set; if not feasible, provide a table of the observed file-format variants/error modes and a clear justification of why reliable recovery is impossible. Also clarify censoring: if some bats never find food within a session, state how $\text{Time\_to\_First\_Food}$ is defined/treated (timeout value vs missing), as this affects appropriate statistical modeling.
-
Small, uneven, and potentially non‑random sample sizes plus fragile inference: effective $N$ varies widely across phases (e.g., P1$\approx 30$, P2$\approx 21$, P3$\approx 35$) and $\text{Switch\_Cost}$ uses a much smaller paired subset ($\approx 17$) (Sec. 3.2, Sec. 3.3). The manuscript does not quantify missingness causes, test whether inclusion is related to DNAmAge/sex/colony, or provide power/sensitivity estimates. As a result, the null DNAmAge results are not informative and the colony effect ($p \approx 0.049$) is likely highly unstable (Sec. 3.3.2, Sec. 3.4, Conclusions).
Recommendation: Add a missingness and sensitivity section spanning Sec. 3.2–3.4: (i) provide a single summary table listing each metric/phase, the $N$ used, and explicit exclusion reasons; (ii) test whether missingness/inclusion is associated with DNAmAge, sex, or colony (e.g., logistic regression or contingency analyses), and discuss implications; (iii) report power/sensitivity (e.g., minimum detectable Spearman $|\rho|$ at given $N$; detectable standardized effects in regression); and (iv) rephrase interpretation throughout Sec. 3.4 and Conclusions to emphasize low power for DNAmAge effects and to label the colony effect as exploratory/hypothesis‑generating, not confirmatory.
-
Statistical modeling does not match distributional features and small-$N$ uncertainty: $\text{Time\_to\_First\_Food}$ is likely right‑skewed and potentially censored; $\text{Switch\_Cost}$ can be heavy‑tailed. The manuscript notes assumption violations in diagnostics (Sec. 3.3) but largely proceeds with standard linear regression, and the key colony result is based on a very small $N$ with multiple predictors and limited robustness checks.
Recommendation: Strengthen Sec. 2.5 and Sec. 3.3 with analyses appropriate to the outcomes and sample size: (i) consider log/log1p transforms for $\text{Time\_to\_First\_Food}$ and report whether conclusions change; (ii) use heteroskedasticity‑robust SEs (e.g., HC3/HC4) and/or robust regression as a sensitivity analysis; (iii) for $\text{Switch\_Cost}$ and small $N$, add non‑parametric or permutation/bootstrapped inference (bootstrapped CIs for coefficients; permutation test for colony effect), and report full coefficient tables with CIs (not only $p$‑values); (iv) explicitly address multiple testing across phases/outcomes/predictors (even if limited) or justify why not; and (v) if timeouts/censoring exist, consider survival/TOBIT‑style approaches or at minimum clearly define how censoring was handled and its potential bias.
-
Reproducibility materials and reporting are not yet aligned with the paper’s implicit ‘pipeline cautionary tale’ contribution: without code, data dictionaries, and QC outputs, readers cannot learn from or verify the failure modes and the limited analyses (Sec. 2.2–2.5, Sec. 3.1–3.3). Ethical/animal welfare details are also missing or too sparse for animal research (Sec. 2.1, Sec. 2.4).
Recommendation: Add a Reproducibility and Ethics package: (i) provide versioned code (behavior parsing + MRI preprocessing attempts + stats), software versions, and a brief runbook; (ii) include an anonymized behavioral log schema/example and a data dictionary for all variables; (iii) include representative MRI header dumps and QC figures (volume‑wise mean intensity plots; slice montages showing the ‘empty $b0$’ issue); (iv) document random seeds (if any) and exact inclusion/exclusion criteria; and (v) add an Ethics/Animal Welfare subsection in Sec. 2.1 describing approvals, housing/handling, and MRI procedures (e.g., anesthesia/restraint, stress mitigation), with approval IDs or explicit regulatory justification.