-
DTI ROI results are currently invalid due to catastrophic atlas-to-subject misalignment (Dice reported as $0.0$ across subjects) and physiologically implausible diffusion values (e.g., near-zero FA), yet the Methods still read as though ROI extraction yielded meaningful data products (Sec. 2.3.1–2.3.3; Sec. 3.2; Fig. 6–9). This undermines the central mechanistic aim (microstructural aging signatures) and also makes the reported null ROI age associations uninterpretable as “no effect” (they may simply reflect sampling background/incorrect voxels).
Recommendation: Make the manuscript internally consistent by (i) explicitly labeling the ROI-derived DTI outputs as non-interpretable immediately where ROI extraction is described (end of Sec. 2.3.3, with a forward reference to Sec. 3.2), and (ii) either: (A) fix and re-run the diffusion preprocessing/registration/ROI pipeline, demonstrating non-zero overlap and plausible scalar ranges before any inferential statistics; or (B) if a fix is not feasible with the current data, reframe the paper as a methodological case study/post-mortem, and move/remove ROI age statistics and any downstream brain–behavior plans from the main Results (Sec. 3.2–3.3) to clearly marked “invalid exploratory output” or Supplementary material. In both cases, revise the Discussion (Sec. 4.3–4.4) to avoid language implying preserved microstructure.
-
The diffusion preprocessing/registration pipeline is under-specified and partially ambiguous, preventing diagnosis and reproducibility of the failure mode (Sec. 2.3.1–2.3.3; Abstract wording about scans being “stretched to uniform dimensions”; Sec. 3.2). Key missing details include: exact software/toolchain and versions; eddy/motion and susceptibility correction; b-vector handling; denoising/bias correction; brain extraction/masking; coordinate conventions (RAS/LPS), affines/headers; which images live in which space (native diffusion, $b_0$, FA space, template/atlas space); registration type (linear vs nonlinear), cost function and constraints; interpolation for label resampling; and precisely how Dice was computed (which masks; thresholding; binarization).
Recommendation: Expand Sec. 2.3.1–2.3.3 into a stepwise, reproducible pipeline description with explicit file spaces and transforms. At minimum, report for each relevant image (atlas labels, subject diffusion/$b_0$, FA map, brain mask): voxel size, dimensions, orientation convention, and affine; describe all preprocessing steps and parameters (eddy/motion, susceptibility, bvec rotation, denoising, bias-field correction if any); specify the registration path (e.g., subject $b_0\to$ template $b_0$/FA; atlas $\to$ template; or subject $\to$ atlas) and whether it is affine or nonlinear, with cost function and masking strategy. Define Dice computation precisely (whole-brain masks vs union of atlas labels; binarization/thresholds). Add a brief “post-mortem” subsection in Sec. 3.2 identifying the most likely cause(s) (e.g., header mismatch, orientation swap, wrong space, non-binarized masks) and concrete safeguards/QC gates to prevent recurrence.
-
Sample sizes, inclusion criteria, and age ranges are inconsistent across the manuscript (e.g., references to an initial cohort of $41$, DTI $N=33$, “complete data” $N=31$, and conflicting max ages such as $15.07$ vs $13.8$; Sec. 2.1; Sec. 3.1–3.2; Fig. 1–3 captions/text). These inconsistencies impede interpretation, power assessment, and reproducibility.
Recommendation: Provide a consolidated cohort accounting (ideally a CONSORT-style/funnel table) in Sec. 2.1–2.4.1 and/or a new table/figure: counts at each stage separately for behavior and imaging; explicit inclusion/exclusion criteria and reasons (missingness, QC failures, unusable scans); and the exact $N$ used for each model family in Sec. 3.1–3.3. Reconcile and standardize the min/max age for each analytic subset (behavior-only, DTI-only, multimodal intersection) and ensure figure captions state the $N$ and age range for the plotted subset.
-
Behavioral metric definitions are promising but not yet rigorous/validated enough to support the central interpretation that “no age effects” implies preserved cognition rather than limited sensitivity, floor/ceiling effects, or confounding by time-on-task/motivation (Sec. 2.2.1–2.2.3; Sec. 3.1; Sec. 4.3–4.4). Several metrics also have edge cases that can materially affect estimates (e.g., never visiting the correct box in a phase; exploitation $\approx 0$ leading to unstable ratios; how “\text{Absolute\_Time}” is defined across phases; how phase boundaries are handled).
Recommendation: Strengthen Sec. 2.2.1–2.2.3 by adding precise operational definitions (including edge-case handling) for each metric—ideally with pseudocode or a worked example in an Appendix. Explicitly define timing variables (e.g., whether latency is phase-relative or continuous across the session), rules for missing/undefined events (never finding correct box; no ‘Lose’ events; denominators of $0$), and whether/ how metrics are normalized by phase duration or number of visits. In Sec. 3.1, add basic psychometric/robustness reporting: metric distributions, missingness per phase, within-bat variability, correlations among metrics, and checks for floor/ceiling effects. If feasible, add sensitivity analyses using alternative formulations (e.g., log-latency or ratio-based switch cost; lose–shift conditional on having previously exploited the correct location; exploration normalized by visit count/time). Update Sec. 4.3–4.4 to more sharply separate “resilience” from “insufficient task sensitivity/limited age span.”
-
Null findings are not supported with sufficient quantitative reporting and sensitivity/power considerations. The manuscript does not systematically provide effect sizes, uncertainty (CI), or exact $p/q$ values for age effects (behavior; DTI), and does not estimate the smallest detectable effects given $N$ and multiple-testing burden (Sec. 2.4.2–2.4.4; Sec. 3.1–3.2; Sec. 4.3–4.4). This makes it difficult to judge what magnitudes of aging effects the study can rule out.
Recommendation: Add summary tables (main text or Supplementary) for Sec. 3.1 and the intended Sec. 3.2 models listing: $\beta(\text{age})$, SE, $95\%$ CI, $p$-value, and (where applicable) FDR-adjusted $q$-value, plus $\beta(\text{sex})$ and coding details. Include a brief sensitivity/power analysis (post hoc is acceptable) indicating the minimum detectable standardized age effect sizes for behavioral metrics given $N \approx 31$ and for imaging given the number of tests ($24$ ROIs $\times 4$ metrics). Use these results to temper claims in Sec. 4.3–4.4 (e.g., “small-to-moderate declines cannot be excluded”).
-
The manuscript’s “cognitive resilience score” (residual-based framework) is presented as a key conceptual contribution, but its role is confused because (i) no robust behavioral age decline is observed, and (ii) imaging is unusable; additionally, some wording suggests residuals are mathematically impossible to compute without significant decline (Sec. 2.4.4; Sec. 3.3). There is also a specification mismatch: age-effect models include sex, but residualization is described as $\text{Behavioral\_Metric} \sim \text{Age}$ (omitting sex), which can leave systematic sex differences in the residuals.
Recommendation: Rewrite Sec. 2.4.4 and Sec. 3.3 to distinguish planned vs executed analyses and to clarify that residuals are always computable but may not be interpretable as “resilience to decline” when the estimated age slope is $\sim 0$. Either (a) define the score as “age-adjusted performance” and keep interpretation cautious, or (b) propose alternative resilience operationalizations that do not require observable decline in this cohort (e.g., learning parameters from a reinforcement-learning model; performance conditional on task difficulty/phase transitions). Ensure the residualization model matches the covariates used elsewhere ($\text{Behavioral\_Metric} \sim \text{Age} + \text{Sex}$, and consider colony/origin if relevant; see below), or justify exclusions explicitly.
-
The age predictor is inconsistently described and named (chronological age vs DNA methylation age; variable name indicates a skin clock), which is conceptually central for an “aging” study and affects interpretation of regression results (Sec. 2.1; Sec. 2.4.2–2.4.4; Fig. 2–3; Sec. 3).
Recommendation: Standardize terminology and explicitly define the primary age variable in Sec. 2.1 and Sec. 2.4.2: whether it is chronological age, DNAm-predicted age, or both. If both exist, report their relationship (correlation, bias) and state which is used in each analysis. Replace opaque phrases/labels (e.g., “Skin+Sexaging analyses”) with clear wording throughout the Results and figure captions.