-
Core confounding: lab and field datasets cover different biological processes and developmental windows, undermining the interpretability of the proposed “lab vs field regulatory strategy” contrast (Sec. 3.1–3.3, Sec. 3.5). Field isolates are strongly enriched for gametocytes and late rings and largely lack trophozoites/schizonts, while lab data span the full asexual IDC with only limited sexual branching. Because trajectories/pseudotime are inferred separately within each subset, regulator$\rightarrow$module lags can differ simply due to (i) different underlying lineages (IDC vs gametocytogenesis), (ii) different root choices (early ring vs late ring), (iii) missing intermediate states and uneven sampling density, and (iv) pseudotime warping/scaling differences—rather than reflecting a true mechanistic shift from “just-in-time” to “priming.”
Recommendation: Reframe the main comparative claim to avoid over-attributing differences to “lab vs field” per se unless you can compare the same biological transition. Concretely: (i) perform an apples-to-apples timing analysis restricted to a matched lineage present in both sources (e.g., late ring $\rightarrow$ early gametocyte and/or gametocyte maturation only, using the lab sexual branch if sufficient cells exist); (ii) report results both on the full datasets and on the matched subset(s), explicitly stating which stages are included; (iii) where matched analyses are not feasible due to limited overlap, temper the claim to: “field dataset (gametocytogenesis-enriched) shows longer inferred regulator–module separations than lab IDC,” and clearly label this as a hypothesis about priming that requires validation.
-
Methods are incomplete/inconsistent and contain clear text corruption in key pipeline steps, preventing reproducibility and making it difficult to assess whether lab–field differences reflect biology or processing artifacts (Sec. 2.1–2.3, Sec. 2.5; inconsistencies with Sec. 3.1). Sec. 2.1 has garbled input/metadata descriptions; Sec. 2.3 and Sec. 2.5 include duplicated/truncated sentences and do not cleanly specify HVG selection, PCA/UMAP/PAGA settings, the dynamic-gene model, module clustering, or the regulator cross-correlation procedure. Additionally, QC thresholds and outcomes are described inconsistently (Sec. 2.2 vs Sec. 3.1), and it is unclear whether the input is raw counts or already-normalized expression (risking double-normalization).
Recommendation: Rewrite Sec. 2.1–2.3 and Sec. 2.5 from the original source (not OCR) into a precise, stepwise, parameterized Methods description. At minimum include: (i) data provenance (platform, mapping/quantification, whether matrices are counts vs normalized values), and starting cells/genes per origin/sample; (ii) exact QC thresholds and the number removed at each step (per origin and per donor/strain), reconciling Sec. 2.2 with Sec. 3.1; (iii) exact normalization/log1p/scaling steps and parameters; (iv) HVG selection function/flavor and the final HVG count used in each analysis; (v) neighbors graph, PCA/UMAP settings, clustering algorithm/resolution, and PAGA/DPT settings including root selection; (vi) the statistical model used for pseudotime-dynamic genes and how the “top 500” were selected with multiple-testing correction; and (vii) the full candidate-regulator/lag pipeline with explicit formulas and thresholds. Remove duplicated/corrupted phrases so an independent group could reproduce the analysis.
-
Batch effects and integration across donors/isolates/experiments are not addressed sufficiently, yet origin-driven separation in UMAP/PAGA is interpreted biologically (Sec. 3.1–3.2). Without explicit assessment/mitigation, lab–field separation can be driven by technical factors (library chemistry, capture method, ambient RNA, sequencing depth, processing site/time) or donor/strain effects rather than biological adaptation. The manuscript does not state whether any batch correction was attempted, nor does it quantify whether origin separation persists within the same annotated stage after controlling for donor/strain and technical covariates.
Recommendation: Provide a clear batch/integration plan and report its impact. Specifically: (i) describe batch variables available (donor ID for field; strain/replicate/run/day-in-culture for lab) and how they were used; (ii) quantify origin separation within matched stages (e.g., late rings, gametocytes) before/after correction (e.g., kBET/LISI, classifier accuracy with cross-validation, or variance partitioning); (iii) apply and justify an integration approach suitable for scRNA-seq (e.g., Harmony on PCA, BBKNN, Scanorama, scVI), then re-check key qualitative conclusions (UMAP separation, PAGA topology, pseudotime trends) under integrated vs non-integrated processing; and (iv) if you choose not to correct, justify why (same technology, same pipeline, etc.) and explicitly present evidence that technical effects are minor.
-
The central “just-in-time vs priming” timing result is not statistically or metrically validated, and the lag metric (“bins out of 100”) is not comparable across separately inferred pseudotime trajectories without additional controls (Sec. 2.5, Sec. 3.5; Figs. 7–10). The manuscript does not define the cross-correlation computation, the binning scheme, smoothing details, lag extraction rule, or uncertainty; nor does it test whether observed lags (e.g., $\sim 76$–$77$ bins in field) exceed null expectations. Because bin-based lag depends on pseudotime scaling/warping and trajectory length, the headline quantitative contrast could be an artifact of discretization and trajectory coverage.
Recommendation: Make the timing analysis mathematically explicit and statistically supported. Concretely: (i) define pseudotime discretization (equal-width vs equal-cell-count bins; bin edges), smoothing (window size, boundary handling), and the exact cross-correlation formula (normalization/mean subtraction; handling of missing bins); (ii) replace or complement “lag in bins” with a continuous-time measure (e.g., difference between regulator peak location and module activation/peak in continuous pseudotime from spline/GAM fits); (iii) provide uncertainty (bootstrap over cells and/or isolates; sensitivity to bin number and smoothing window); (iv) include null models (pseudotime permutation, matched mean/variance random genes, shuffled module assignments) and correct for multiple testing across gene–module pairs; and (v) explicitly test whether lag distributions differ between lab and field within matched lineage(s) (see confounding issue) using permutation tests or appropriate two-sample comparisons with confidence intervals.
-
Dynamic gene selection, module construction, and the “candidate master regulator” definition are heuristic and under-specified, risking selection of technical artifacts or early markers rather than plausible regulators (Sec. 2.5, Sec. 3.4–3.5). The manuscript does not specify the pseudotime-dynamics model/test (GAM vs other), covariates (batch/donor, library size), multiple-testing correction, or the rationale for exactly six modules. The regulator selection emphasizes low-abundance and sharp peaks, which can enrich for dropout/noise and may be sensitive to smoothing choices. The term “master regulator” implies causality not supported by correlation/lag alone.
Recommendation: Strengthen biological plausibility and methodological rigor of the regulator/module framework: (i) specify the exact model and hypothesis test used to rank/select dynamic genes (including FDR control), and report robustness to alternative model/parameters; (ii) justify six modules quantitatively (e.g., silhouette/gap statistic; stability under resampling) and report module sizes and functional enrichment (GO/KEGG where possible; curated malaria gene sets); (iii) constrain regulator candidates to plausible regulatory classes (e.g., ApiAP2 TFs, chromatin modifiers, RNA-binding proteins) or at least report enrichment of these classes among selected candidates versus background; (iv) control for technical covariates (library size/complexity; potential cell-cycle effects for asexual stages) in dynamic/regulator detection; (v) apply multiple-testing correction across regulator–module associations; and (vi) consistently rename to “candidate regulators” and present “master regulator” only as a hypothesis, ideally supported by overlap with known regulators (e.g., AP2-G/GDV1 axis; known gametocyte regulators) and/or external datasets.
-
Comparative lab–field differential expression (especially within matched stages) is described but not convincingly presented, yet the Discussion/Conclusions make broad origin-specific functional claims (Sec. 2.6–2.7, Sec. 3.6/Results coverage, Sec. 4). Given the confounding and potential batch effects, within-stage DE for overlapping stages (late rings and gametocytes) is essential to support claims about stress responses, immune evasion, sexual commitment, etc. The current Results do not provide counts of DE genes, effect sizes, FDR thresholds, donor-stratified consistency, or enrichment analyses supporting these statements.
Recommendation: Add (or expand) a dedicated Results subsection for origin comparisons on matched stages/lineages. Minimum expectations: (i) define stage labels and matching criteria; (ii) report DE method (e.g., pseudo-bulk per donor/strain, mixed models, or single-cell tests with donor as covariate), thresholds, and effect sizes; (iii) present donor-stratified consistency for field isolates; (iv) provide functional enrichment for robust DE sets; and (v) if trajectory-based DE (e.g., tradeSeq) is used, clearly specify the model (knots, covariates, contrasts) and summarize the major findings. If such analyses cannot be done robustly, soften/remove broad functional generalizations in Sec. 4 so every claim is traceable to a documented analysis.
-
The manuscript appears not to be a careful final draft in several places, which materially impedes review and risks undermining confidence in the work (Abstract/Sec. 1; Sec. 2–3). Examples include an unrelated physics keyword list, placeholder figure references (“Figure ??”), and other formatting/OCR artifacts; these issues occur alongside core methodological corruption (Sec. 2.3, Sec. 2.5).
Recommendation: Conduct a full pass to ensure the submitted version is publication-ready: remove irrelevant keywords and placeholder author/affiliation artifacts if present; replace all “Figure ??” with correct references; ensure all figures and captions are included and legible; and correct OCR-induced corruption throughout, prioritizing Sec. 2.1–2.5. Consider re-exporting directly from the source manuscript (LaTeX/Word) rather than via OCR to avoid reintroducing corruption.