-
Strong confounding from dataset composition (stage coverage/mixtures), donor structure (only four field donors), and potential batch/processing effects is not adequately controlled, yet conclusions are framed as broad consequences of “laboratory adaptation” (Sec. 3.1–3.6; Conclusions Sec. 4.1). Field asexual coverage is incomplete (e.g., missing early rings/schizonts) and field samples are enriched for sexual/late gametocyte stages, while lab samples span the asexual cycle but may not cover late gametocytogenesis (Sec. 3.1; Table 2). Treating cells as independent replicates risks pseudo-replication and inflated significance, and technical differences (library chemistry/batches) could drive global shifts (e.g., “skew toward upregulation in field isolates”).
Recommendation: Across Sec. 3.3–3.6, restrict direct lab-vs-field comparisons to clearly shared and matched segments of development (same stages and, ideally, matched substates along within-stage pseudotime). Re-run key analyses with donor-aware inference: (i) pseudobulk DE per donor$\times$stage (field) and per strain/batch$\times$stage (lab, if available), using edgeR/DESeq2/limma-voom (or muscat), and report consistency across MSC donors; (ii) sensitivity analyses via subsampling/downsampling lab cells to match field cell counts and stage distributions before trajectory/network reconstruction; (iii) explicitly report whether lab and field were processed/sequenced together and, if not, include batch correction/integration (e.g., scVI/Harmony/BBKNN/Scanorama) and show that core findings persist. Add an explicit limitations paragraph in Sec. 3.6 or Conclusions Sec. 4.1 stating what can and cannot be generalized given missing field asexual stages, donor count, and compositional imbalance.
-
Batch correction/integration, QC, and covariate handling are insufficiently described to assess whether observed lab–field differences are biological vs technical (Sec. 2.1–2.3; Sec. 2.6; Sec. 3.1–3.3). Critical details are missing/fragmented: filtering thresholds (min/max genes per cell, other QC metrics), doublet/ambient-RNA handling, normalization units, HVG selection strategy, and whether integration across strains/patients/batches was performed and evaluated.
Recommendation: Add a dedicated Methods subsection (e.g., Sec. 2.3.x) listing exact QC thresholds and tools: min/max genes per cell, UMI/count thresholds, any mitochondria/rRNA metrics used (or justify why not), doublet detection method and cutoffs, ambient RNA mitigation (if any), normalization target (e.g., counts per 10,000), and the exact data layer used thereafter. If integration/batch correction is used, specify tool and parameters and provide quantitative diagnostics (batch mixing, variance explained). If not used, provide evidence that batch effects are small relative to stage/biology (e.g., embeddings colored by batch/donor; stage composition per batch) and temper claims accordingly (Sec. 3.1–3.3).
-
Differential expression (including the reported strong skew toward upregulation in field isolates) is not statistically calibrated for donor/strain structure and technical covariates, risking inflated significance and misinterpretation (Sec. 2.5; Sec. 3.3). Stage labels are coarse; within-stage maturation differences between in vivo and in vitro could dominate “within-stage” DE, and the manuscript does not demonstrate that lab and field cells represent comparable substates within each broad stage category.
Recommendation: In Sec. 2.5, state the exact DE method used (e.g., Scanpy `rank_genes_groups` settings, test type, correction) and which matrix (log1p vs scaled) was used. Re-run the headline DE results using pseudobulk (donor/strain as the replicate unit) and include covariates (library size; detection rate; batch) as appropriate. In Sec. 3.3, add diagnostics per stage: distributions of QC metrics (UMIs/genes detected), detection fractions, and within-stage pseudotime/subcluster composition by condition. Where within-stage heterogeneity is large (notably gametocytes), perform within-stage subclustering and compare matched substates (or compare along a shared within-stage pseudotime axis) rather than only coarse labels.
-
Trajectory inference and pseudotime construction are described inconsistently (Monocle 3 in Methods vs PAGA/diffusion pseudotime in Results) and lack key parameterization (root choice, neighbor graph parameters, clustering resolution, joint vs separate inference), creating uncertainty about all downstream GAM/module/regulatory timing analyses (Sec. 2.6–2.7 vs Sec. 3.4).
Recommendation: Reconcile Methods and Results by explicitly stating, for each trajectory (asexual lab, asexual field, sexual lab, sexual field): (i) the method used to build the graph (PAGA/Monocle 3/other), (ii) how pseudotime was assigned (DPT/Monocle pseudotime), (iii) how the root was selected (root cluster definition and rationale), and (iv) the key parameters (kNN neighbors, resolution, filtering/subsetting). If multiple methods were tried, report concordance (e.g., correlation of pseudotime orderings; stability of branchpoints) and use one consistent pseudotime variable for all timing claims (Sec. 2.7; Sec. 3.4; Sec. 2.9–2.10). Restrict asexual “progression dynamics” claims to the segments actually present in field data (Sec. 3.4; Sec. 3.6).
-
Regulatory network / “master regulator” inference is under-specified, not normalized for differing trajectory lengths/search spaces, and insufficiently validated; yet the manuscript uses strong language (“rewiring,” “master regulators”) based largely on link counts and heuristic precedence rules (Sec. 2.8–2.10; Sec. 3.5–3.6; Conclusions Sec. 4.1). Multiple-testing control, null/negative controls, and robustness to thresholding/binning are not presented, and link-count comparisons are not directly comparable across conditions without normalization.
Recommendation: Expand Sec. 2.8–2.10 to provide a precise, reproducible definition of: (i) the input representation used for modules and regulator inference (e.g., GAM-fitted curves sampled on $B$ pseudotime bins), (ii) binning ($B$, boundaries, equal-width vs equal-mass), smoothing parameters, and peak-calling criteria on a clearly defined scale (see also log/2-fold issue below), (iii) the statistical test used to call regulator$\to$target links (including time windowing/lag definition), and (iv) multiple-testing correction (FDR) and counts passing each filter. In Sec. 3.5, normalize network comparisons (edges per regulator; per module; per target; per pseudotime length) and add robustness analyses (vary thresholds/bins; subsample to matched cell counts). Add negative controls (permuted pseudotime; shuffled peaks/targets) to estimate expected edge counts under null. Benchmark against known regulators (ApiAP2 family; AP2-G/AP2-FG and other established gametocyte regulators) and, if feasible, motif enrichment in promoters of inferred targets. If rigorous validation is not feasible, reframe networks as hypothesis-generating co-expression/timing associations and soften causal terminology in Sec. 3.6 and Conclusions Sec. 4.1.
-
Interpretation of sexual-stage findings needs clearer separation of biology vs sampling/annotation: (i) “late-stage gametocytes absent from lab strains” may reflect the lab dataset design (timepoints/induction conditions) rather than an intrinsic absence in lab-adapted parasites; (ii) reported co-expression of male/female markers and conclusions about relaxed lineage separation may be confounded by misannotation, doublets, or ambient RNA (Sec. 3.3.3; Sec. 3.4.2; Sec. 3.6). The unstructured report also flags a likely marker interpretation/mapping error: Pfs25 is typically a female/sexual-stage marker rather than male-specific.
Recommendation: In Methods (Sec. 2.1–2.2) and Results (Sec. 3.1; Sec. 3.4), describe lab gametocyte induction and sampling (strains used; induction protocol; duration/timepoints) and explicitly rephrase the claim to “not captured in this lab dataset” unless you can show that late stages were expected but absent. Validate late-stage gametocyte identity with a panel of maturation markers and show monotonic trends along sexual pseudotime (Sec. 3.4). For sex annotations, verify gene ID$\leftrightarrow$name mapping and marker sex-specificity (including correcting any Pfs25 misstatement), compute quantitative male/female scores per cell using multiple canonical markers, report fractions of ambiguous/co-expressing cells per donor, and apply doublet/ambient-RNA checks targeted at mixed marker expression (Sec. 2.4; Sec. 3.3.3). Present alternative technical explanations alongside biological hypotheses and temper conclusions in Sec. 3.6/Conclusions unless reproducible across donors and markers.