-
Reproducibility is currently limited because key algorithmic and parameter details are missing or described hypothetically (e.g., “if not already log-transformed…”, “e.g.” thresholds, “empirically determined”), even though thresholding, smoothing, and peak-calling drive the main result (Sec. 2.1–2.4). Critical unspecified items include: the exact normalization used (and whether applied jointly or separately to Lab/Field); whether/which log transform was applied; concrete cell/gene QC thresholds and outcomes; HVG method and number of HVGs; number of PCs retained and selection criterion; neighborhood graph parameters; clustering resolution; PAGA pruning/thresholding; DPT settings; trajectory rooting rule; smoothing method and bandwidth/span; peak-finding algorithm and its parameters (prominence, height, min distance/width); handling of multiple peaks per gene; and the precise rule(s) for module membership, statistical tests, and multiple-testing corrections (Sec. 2.3–2.4).
Recommendation: Expand Sec. 2.1–2.4 into an auditable, deterministic specification (main text + supplement) and add a single consolidated parameter table. At minimum: (i) state the exact normalization (e.g., size-factor/CPM/TPM/Scanpy $\text{normalize\_total}$) and whether Lab/Field were normalized separately or together; (ii) state explicitly whether values are linear or $\log1p$ and carry that notation consistently; (iii) report concrete QC thresholds (min genes/counts per cell, min cells per gene, rRNA/mitochondrial thresholds if used, doublet handling) and the number/% cells/genes removed; (iv) specify HVG selection method, number of HVGs, scaling, and PCs used; (v) list PAGA/DPT parameters ($k$NN $k$, metric, clustering method/resolution, PAGA connectivity threshold, DPT settings) and whether identical settings were used for Lab and Field; (vi) define smoothing (method + bandwidth) and peak calling (algorithm + all parameters) and how multiple peaks are handled; (vii) define module calling with explicit statistical tests, effect sizes, FDR correction, thresholds, and the exact Boolean rule combining DE and lagged correlation (union vs intersection). Include software packages and versions.
-
Internal inconsistency around stage labels/marker genes and the definition of “pre-transition” bursts undermines the conceptual core. Methods describe transition-aware burst selection and alignment using life-cycle stage labels/marker genes (Sec. 2.2.3, Sec. 2.3.3), but Results state that provided $\text{life\_cycle\_stage}$ labels were “uninformative” and canonical IDC marker genes were absent from the gene list (Sec. 3.1). If transitions cannot be defined/validated, it is unclear how “immediately preceding a developmental transition” is operationalized, how the trajectory is rooted, and whether the method in practice reduces to detecting peaks anywhere along pseudotime.
Recommendation: Resolve the discrepancy explicitly. Either: (A) provide a label-free, fully specified method for rooting and defining transition windows (e.g., graph/geodesic landmarks, change-point detection on pseudotime density or cluster boundaries, or external reference projection), including numeric pseudotime ranges for transitions and pre-/post-burst windows (Sec. 2.3.3, Sec. 2.4.1); or (B) if robust transition definition is not possible with available genes/labels, reframe throughout (Abstract, Sec. 2.3, Sec. 3.2–3.4, Conclusions) as “transient bursts along pseudotime” rather than “pre-transition master regulators,” and temper causal/transition-specific language accordingly.
-
The central comparative claim (“profound divergence” / “fundamentally different regulatory architectures”) is driven almost entirely by the observation of zero overlap between the top $100$ burst-defined candidates in Lab vs Field (Sec. 3.4, Table 4), but the manuscript does not quantify (i) whether zero overlap is statistically unexpected given the tested gene universe and filtering, or (ii) whether it is robust to parameter choices, list size, or sampling variability. Given the heavy dependence on ranking by peak prominence/height and on low-expression filtering, the zero-overlap result could be fragile.
Recommendation: In Sec. 3.4 (and Sec. 2.5), add a dedicated robustness + significance section: (i) report the total shared gene universe evaluated and the number of genes passing each filter in Lab and Field; (ii) compute expected overlap under a null model (hypergeometric conditioned on the shared universe and candidate counts; and/or permutations that preserve marginal properties such as mean expression and dropout), reporting $p$-values and confidence intervals; (iii) show overlap as a function of list size (top $10/25/50/100/200/500$) and as a function of key thresholds (low-expression percentile; burst height/prominence cutoffs; smoothing bandwidth); (iv) assess stability via bootstrap/subsampling of cells within each group; (v) report rank correlations (or lack thereof) of burst scores genome-wide. Based on these results, either strengthen the claim or revise wording to “divergent burst-defined candidate lists under the chosen procedure.”
-
Potential confounders are not adequately controlled: Lab vs Field differ substantially in cell number ($36{,}520$ vs $6{,}866$; Table 1), may differ in sequencing depth/dropout and batch structure, and Field aggregates multiple genetically distinct isolates into one pooled trajectory. These factors can change HVG selection, manifold geometry, pseudotime density, smoothing behavior, peak prominence ranking, and perceived burst “sharpness,” and therefore could drive apparent divergence independent of environment (Sec. 2.1–2.2, Sec. 3.1–3.4).
Recommendation: In Sec. 2.1–2.2 and Sec. 3.1–3.4: (i) explicitly describe batch structure (runs/libraries/lanes; field-vs-lab processing differences) and whether any integration/batch-correction was applied; if not, add an analysis using an established approach (e.g., Harmony/Seurat integration/scVI) and reassess trajectories and candidate lists; (ii) control for cell-number imbalance by downsampling Lab to match Field and repeating candidate calling and overlap analyses; (iii) analyze Field isolates separately where feasible (per-isolate trajectories and candidate lists), quantify within-field overlap/heterogeneity, and test whether pooling reduces peak sharpness or alters rankings. If per-isolate power is limited, present it as a limitation and avoid over-interpreting pooled results.
-
Trajectory validity (IDC interpretation) is not convincingly established, yet it is central to biological interpretation of burst timing and transitions (Sec. 3.1). The manuscript notes that canonical marker genes are absent and stage annotations are uninformative, raising the risk that pseudotime reflects technical gradients (library size, rRNA content, stress) rather than developmental progression.
Recommendation: Strengthen Sec. 3.1 with quantitative validation: (i) project cells onto external IDC references (e.g., published bulk time-course IDC transcriptomes) via correlation/nearest-neighbor mapping using the available gene set; and/or test enrichment of IDC phase gene sets along pseudotime even if single canonical markers are missing; (ii) show correlations of pseudotime with technical covariates (UMIs, detected genes, rRNA/mitochondrial fraction where applicable) to rule out dominant technical gradients; (iii) verify robustness of global ordering with at least one alternative trajectory method (e.g., Slingshot/Monocle3) and/or parameter perturbations of $k$NN/clustering/PAGA pruning. If validation remains weak, explicitly qualify that IDC stage mapping is approximate and that results are hypothesis-generating.
-
“Candidate master regulators” are defined purely by expression dynamics (low overall expression + transient burst), but the manuscript provides little evidence that these candidates are plausible regulators rather than low-count/noisy genes. Candidates are mostly reported as PF3D7 IDs with limited functional annotation, and there is no enrichment test for known regulatory families (ApiAP2 TFs, chromatin regulators, RNA-binding proteins), nor basic checks that bursts are supported by broad cell-level signal rather than a small number of outlier cells (Sec. 3.2, Tables 2–3).
Recommendation: In Sec. 3.2 and supplement: (i) annotate candidates using PlasmoDB (gene names, domains, predicted localization, known phenotypes) and explicitly flag known/predicted regulatory proteins; (ii) test enrichment of regulatory classes among candidates vs the expressed-gene background (report odds ratios and adjusted $p$-values); (iii) report per-candidate expression support metrics (fraction of cells expressing, mean/median counts in peak window vs baseline, robustness to removing top-expressing cells) to reduce the chance that bursts reflect sparse dropout noise; (iv) adjust claims to “candidate regulators” unless additional evidence is provided.
-
Downstream module definition is under-specified and the resulting module sizes (e.g., ${>}1,000$ genes for some candidates; Sec. 3.3) raise concerns that modules may reflect broad pseudotime progression rather than specific regulator–target relationships. It is unclear how pre-/post-burst windows are set per gene, what DE model/test is used, how multiple testing is corrected, what lagged-correlation metric/lag range is used, and whether DE and lag criteria are combined (union/intersection). Extremely large reported $\log_2$FC values (e.g., ${>}24$; Sec. 3.3) also suggest potential baseline/pseudocount artifacts.
Recommendation: In Sec. 2.4 and Sec. 3.3: (i) precisely define module membership rules, windows, tests (including covariates), effect sizes, and FDR procedures; (ii) add controls to assess specificity (e.g., modules from random “burst times,” or from non-regulatory genes with similar peak times/mean expression); (iii) report module coherence metrics (average within-module correlation; reproducibility across subsamples; functional enrichment consistency); (iv) investigate and explain extreme $\log_2$FC values (pseudocount choice, linear vs log scale, near-zero baseline) and cap/regularize where appropriate. If modules remain large, explicitly interpret them as broad stage-associated programs rather than specific regulatory targets.
-
Functional enrichment analysis is described (Sec. 2.4.3) but not presented in a way that supports the manuscript’s narrative about “distinct regulatory strategies” in Lab vs Field. Without enrichment tables/plots and clear background/universe definitions, module interpretation remains anecdotal (Sec. 3.3–3.4).
Recommendation: Either present the promised enrichment results or remove/scale back the claims. Concretely: (i) for representative modules (and/or aggregated across top candidates), report GO/KEGG enrichment with adjusted $p$-values, effect sizes, and the exact background universe (e.g., all expressed genes in that group); (ii) compare Lab vs Field at the level of enriched terms/pathways (e.g., Jaccard similarity or correlation of $-\log_{10}$ FDR across terms), which can reveal convergent biology even when gene-level regulator lists differ; (iii) provide full enrichment tables in supplementary files and cite them in Sec. 3.3–3.4.