-
Key thermodynamic and kinetic outcomes are not reported quantitatively, and no uncertainty quantification is provided (Sec. III.1–III.4; Sec. III.6; Sec. IV). Results are described with phrases such as “few kJ/mol”, “twofold faster”, “microsecond timescales”, and “excellent agreement” without explicit values for basin free energies, barrier heights, equilibrium populations, implied timescales, rate constants, or MFPTs, and without confidence intervals. Without numbers and uncertainties, the MSM and FES claims cannot be rigorously assessed or used as benchmarks.
Recommendation: Add a compact set of quantitative deliverables (preferably in 1–2 tables, referenced in Sec. III.6 and reiterated in Sec. IV): (i) equilibrium populations of each (micro/macro)state; (ii) $\Delta G$ between folded/unfolded and any intermediates; (iii) barrier heights with a clearly defined measurement protocol; (iv) the slowest implied timescales at the selected MSM lag time; (v) folding/unfolding MFPTs and/or rates. Provide uncertainty estimates via bootstrap (trajectory block bootstrap and/or Bayesian MSM) and report how uncertainties propagate to MFPTs/rates.
-
Trajectory/simulation provenance is insufficiently specified, preventing scientific reproducibility and interpretation of kinetics (Sec. II.1; Sec. IV). The manuscript does not report core simulation details (force field, solvent/water model, ions, temperature/pressure control, timestep/constraints, long-range electrostatics, protonation states, initial structure/reference PDB/topology). It is also unclear whether the 10 $\mu$s trajectory represents equilibrium sampling with multiple folding/unfolding events, and how many transitions are observed—critical for kinetic reliability.
Recommendation: In Sec. II.1 (or a dedicated “Simulation details and dataset provenance” subsection), fully specify the MD setup and the dataset source (including input structure/PDB ID, preparation protocol). Explicitly state the frame stride and total number of frames analyzed. Report how many folding/unfolding transitions (by your state definition) are observed and whether the trajectory appears stationary/equilibrated (e.g., block-wise populations). If the dataset is from a public benchmark, cite it and provide accession/DOI.
-
MSM construction is underdetermined and key statistical choices are missing or ambiguous (Sec. II.5; Sec. III.4.3; Fig. 18–19 as discussed). The reader cannot tell (i) which feature space is used to build the MSM (raw features vs PCA vs TICA vs Diffusion Maps; which components), (ii) the number of microstates used for Markovian discretization, (iii) the exact MSM lag time(s) selected, (iv) whether the estimator enforces detailed balance/reversibility and how the stationary distribution is obtained, (v) the counting scheme (sliding window vs non-overlapping) and any regularization/pseudocounts. Additionally, the implied-timescale behavior described/depicted appears atypical in places (e.g., lack of clear plateau or timescales exceeding trajectory length), which calls Markovianity and the reliability of long timescales into question.
Recommendation: Provide a single canonical MSM protocol (Sec. II.5) used for the final kinetic results and ensure all figures/tables correspond to that protocol: list the final feature set, preprocessing (alignment/standardization), dimensionality (e.g., tIC1–tICm), microstate count, clustering method/hyperparameters, selected lag time $\tau$ (numerical value) and tested $\tau$ range, whether the MSM is reversible, and the counting/regularization choices. In Sec. III.4.3, mark the chosen $\tau$ on implied-timescale plots, report the plateau region numerically, and explicitly restrict interpretation to timescales supported by the data (e.g., discuss when implied timescales approach a substantial fraction of trajectory length). Include uncertainty bands on implied timescales if possible.
-
Reproducibility of CV/feature construction, dimensionality reduction, and clustering is insufficient (Sec. II.2–II.4; Sec. III.2–III.3). Many essential details are missing: exact feature definitions and atom selections (e.g., which distances/contacts and which atoms), alignment/superposition choices, scaling/whitening, the final number of PCs/tICs/diffusion components retained, TICA lag time (distinct from MSM lag time), Diffusion Maps kernel form/normalization/connectivity, and the final clustering hyperparameters ($k$ for k-means; covariance model for GMM; $\epsilon$/min_samples for DBSCAN) used for the reported results.
Recommendation: Expand Sec. II.2–II.4 with a reproducibility-first parameterization: (i) explicit feature list (including atom selections and any contact definitions), alignment protocol, and preprocessing; (ii) PCA/TICA retained dimensionalities and the criteria used (variance explained; kinetic/variational score); (iii) TICA lag time(s) tested and chosen; (iv) Diffusion Maps details (kernel equation, bandwidth selection, graph construction kNN vs fully connected, normalization/symmetrization, diffusion time); (v) final clustering algorithm and hyperparameters per embedding. Add a concise “Methods parameter table” summarizing all key settings used in Sec. III results.
-
Microstate vs macrostate definition and coarse-graining are not clearly separated or justified, making reported rates/MFPTs sensitive to ad hoc choices (Sec. II.3; Sec. II.5; Sec. III.3–III.4). The narrative alternates between 3–4 “clusters/states” and folded/unfolded macrostates, without stating whether macrostates are obtained by a formal kinetic coarse-graining (e.g., PCCA+/spectral clustering on MSM eigenvectors) or by heuristic structural labeling. Very coarse discretizations (3–4 states total) may also compromise Markovianity.
Recommendation: In Sec. II.5 (or a new Sec. II.5.1), explicitly define: (i) the microstate discretization used to estimate the MSM (typically many microstates), and (ii) the macrostate coarse-graining method (e.g., PCCA+), including the mapping from microstates to macrostates (table/diagram). In Sec. III.4.3, show that MFPTs/rates are stable under reasonable changes in microstate count and lumping scheme, or justify why a very small state model remains Markovian for your selected $\tau$.
-
Robustness and sensitivity analysis is asserted but not quantified in terms of final physical conclusions (Sec. II.7; Sec. III.5; Sec. III.6). The manuscript focuses on internal metrics (variance explained, silhouette/BIC, eigengaps) without reporting how key outputs (populations, $\Delta G$, barriers, MFPTs) vary with CV choice, clustering/discretization, lag time, subsampling, and embedding hyperparameters.
Recommendation: Strengthen Sec. II.7 and Sec. III.5 by reporting sensitivity of the main physical deliverables: provide ranges (or error bars) for folded population, $\Delta G$, dominant barrier height, and folding/unfolding MFPTs across (i) alternative CV/embedding choices (conventional vs TICA vs Diffusion Maps), (ii) microstate counts/clustering hyperparameters, and (iii) MSM lag times within the implied-timescale plateau. Summarize these variations in a table/figure and state explicitly in Sec. III.6 which conclusions are robust vs. conditional.
-
Free-energy surface (FES) analysis risks over-interpretation given projection, binning/smoothing, and sampling limitations from a single trajectory (Sec. II.4; Sec. III.4.1; Figs. 12–17). The method (histogram-based Boltzmann inversion) is sensitive to bin size, pseudocounts/empty-bin handling, and whether the trajectory is equilibrated. Several FES plots are described as having unexpected symmetries/over-smoothing, and barrier statements (“few kJ/mol”) are not tied to a precise barrier definition (basin-to-saddle; minimum free-energy path; etc.).
Recommendation: In Sec. II.4 and the relevant figure captions, specify bin sizes, smoothing/KDE bandwidth (if any), and explicit handling of empty bins (masking vs pseudocount). Define how barrier heights are computed (with an algorithmic definition). Add a convergence assessment (e.g., block analysis of FES and $\Delta G$/barriers) and avoid claiming physical barriers from 2D projections unless supported by MSM-derived free energies, committor/TPT analysis, or consistency across multiple CV projections.
-
Scholarly framing and novelty are not convincingly established, and the bibliography is dominated by unrelated astrophysics/cosmology references (Sec. I; Sec. III.6; Sec. IV; References). Given that most pipeline components are standard in MD/MSM practice, the manuscript currently reads as a descriptive demonstration without a clear methodological contribution, and the inappropriate citations undermine confidence and prevent readers from locating relevant prior work.
Recommendation: Rewrite Sec. I and Sec. IV to clearly articulate the contribution (e.g., a structured comparison of conventional/linear/nonlinear embeddings under a unified robustness and scalability framework; or specific practical guidance for NTL9-like systems). Replace unrelated references with domain-appropriate MSM/TICA/diffusion maps/protein-folding literature and prior NTL9 studies. In Sec. III.6, include at least one concrete benchmark/ablation: compare kinetics/thermodynamics from your “full pipeline” versus a simpler baseline (e.g., MSM on RMSD/Q only; without diffusion maps; without robustness tuning), demonstrating measurable benefit.
-
Code/data availability is not addressed despite the paper positioning itself as a reproducible workflow (Sec. II; Sec. IV). Without access to scripts, parameters, and (at least) processed features/state assignments, the workflow is not practically reusable.
Recommendation: Add a Data and Code Availability statement (end of Sec. II or before Sec. IV) with a repository link/DOI. Provide analysis scripts/notebooks, environment information (package versions), parameter files, and—if raw trajectory cannot be shared—a minimal dataset enabling reproduction of key figures (features, embeddings, clustering labels, MSM counts/transition matrices).