-
Diffusion coefficient values, ranges, and even exponents are internally inconsistent across the manuscript, undermining the central quantitative claim of “$\sim$five-fold tunability” and creating ambiguity about which diffusion definition is used (Sec. II.2/Table I vs Sec. III.1 vs Sec. IV; also cluster descriptions in Sec. III.4). For example, Table I reports $0.61$–$3.84 \times 10^{-5}$ cm$^2$/s (mean $2.15 \times 10^{-5}$), while Sec. III.1 and Sec. IV report $0.40$–$1.98 \times 10^{-5}$ cm$^2$/s (mean $1.17 \times 10^{-5}$); additionally, Cluster $9$ is reported as $0.55 \times 10^{-6}$ cm$^2$/s (Sec. III.4), conflicting with the global minima and the rest of the manuscript’s $10^{-5}$ scale.
Recommendation: Recompute diffusion summary statistics (min/median/max/mean/SD, ideally also percentiles) directly from the exact master DataFrame used downstream (heatmaps, clustering, regression). Establish a single authoritative diffusion variable definition (units, scaling, and whether it is in-plane $D_{xy}$ vs $3$D $D$) and use it consistently in Table I (Sec. II.2), Sec. III.1, cluster summaries (Sec. III.4), and the Conclusions (Sec. IV). If multiple diffusion measures exist, label them explicitly (e.g., $D_{\parallel}$, $D_{3D}$) and keep their statistics separate. Verify and correct the Cluster $9$ exponent and any plot/table scaling factors (e.g., whether axes display “$\times 10^{-5}$”).
-
Reproducibility and MD provenance are insufficiently described, and “public availability” is undermined by local absolute file paths (e.g., “/Users/osman_mbp/...”) and missing simulation/analysis parameters (Sec. II.1). Key details needed to assess diffusion reliability in confinement are absent: force fields and water/ion models, geometry and channel height, boundary conditions, thermostat/barostat and ensemble, timestep, equilibration/production lengths, sampling cadence, functionalization protocol (random vs patterned; one-/two-sided), and how diffusion is computed (directionality, MSD fit window, drift removal, uncertainty estimation).
Recommendation: Replace all machine-specific paths in Sec. II.1 with repository-neutral descriptions and provide an accessible archive (GitHub/Zenodo/DOI) containing (at minimum) analysis scripts and metadata mapping each system to its parameters and trajectory/source. Add a concise but complete MD methods paragraph in Sec. II.1 covering: system geometry/dimensions, graphene separation, functionalization placement protocol, water and ion models/parameters, force field details for graphene and functional groups, thermostat/barostat settings, timestep, equilibration/production durations, and any constraints. Add a clear diffusion-estimation protocol: whether $D$ is computed parallel to the walls (recommended for confinement) or in $3$D, the MSD time interval used for the linear fit, any block-averaging/CI or replicate strategy, and how uncertainties are handled (or explicitly state point estimates only). Temper any “publicly available” language if full data cannot be released.
-
The supervised ML model (XGBoost) is evaluated in-sample (trained and assessed on the same $91$ points) while being described as “excellent performance,” and SHAP-based importance is used to support design principles without quantifying generalization or stability (Sec. II.4, Sec. III.4–III.5). With a small dataset and one-hot categorical variables, this creates a substantial overfitting/circularity risk: strong-looking predicted-vs-actual plots may reflect memorization rather than robust trends.
Recommendation: Introduce a validation protocol in Sec. II.4: at minimum repeated $k$-fold cross-validation (or LOOCV) with reported MAE/RMSE/$R^2$ on held-out folds. Recompute/aggregate SHAP results across folds (training-only per fold) and report stability of the feature ranking (e.g., rank frequencies or mean$\pm$SD SHAP importance across resamples). Consider adding a simple baseline model (e.g., linear/GLM with main effects and selected interactions) to show that the qualitative hierarchy (salt, COOH, coverage) is not an artifact of boosted trees. If predictive generalization is not a goal, reframe the regression explicitly as descriptive and soften claims tied to “performance.”
-
Clustering is presented as a $10$-state “interfacial water atlas,” but the implemented feature set effectively collapses to only two density-derived features (density_peak_height and density_peak_position) because RDF features and bulk_density parsing failed (Sec. II.4 vs Sec. III.6). With only $2$D features and $91$ samples, $k=10$ risks over-partitioning a continuous trend into arbitrary bins; additionally, selecting $k=10$ because it maximizes silhouette at the upper tested bound is not a robust model-selection justification (Sec. III.4.1, Fig. $9$).
Recommendation: Make the feature set used for clustering explicit and consistent in Sec. II.4 and Sec. III.4.1 (move the parsing-failure disclosure earlier than Sec. III.6). Provide the full silhouette (and preferably Davies–Bouldin) curves over a wider $k$ range and justify $k$ based on a plateau/elbow and interpretability rather than the maximum tested endpoint. Add robustness checks: re-run clustering with fewer $k$ (e.g., $3$–$6$), show whether the key physical dichotomy (mobile/disordered vs trapped/ordered) persists, and/or compare with alternative clustering (GMM/hierarchical/DBSCAN). Ideally, fix RDF/bulk_density extraction and rerun; if not feasible, substantially temper “$10$ distinct states/atlas” language and reframe as “density-profile-based regimes.”
-
Key control variables are not defined in transferable physical units, and this propagates into figure interpretability and generalizability: “coverage” is alternately described as a percentage and as an integer number of groups (Sec. II.2–II.3, Sec. III.2; Fig. 4–Fig. 6 captions), and salt is sometimes expressed as “NaCl pairs” without normalization by volume (Sec. II.1–II.3; multiple figures). This makes trends difficult to compare across geometries and undermines the “design principles” framing.
Recommendation: Define coverage canonically in one physical measure (preferred: groups per nm$^2$ or $\%$ of functionalizable sites) and provide an explicit mapping between “$N$ groups” and “$\%$” based on the graphene surface area and site count used. Standardize all axes/captions accordingly. Express salt in molarity or number density (ion pairs per nm$^3$) and optionally include the raw “NaCl pairs” in parentheses. Ensure all heatmaps and comparisons specify the fixed slice values with clear units (Sec. II.3, Sec. III.2.2).
-
The interaction-effect analysis (“synergistic/antagonistic” deviations from an additive baseline) is not defined with sufficient mathematical precision, contains at least one arithmetic inconsistency, and is conceptually disconnected from the SHAP framework used elsewhere (Sec. III.5; Fig. $13$). The additive baseline (what is averaged over, how categories are treated, whether it is a fitted model or marginal means) is unclear, and the interpretation sometimes aligns more with saturation/floor effects than “synergy.”
Recommendation: In Sec. III.5, explicitly define the baseline used to compute interaction terms (equation, averaging sets, categorical handling, and uncertainty if any). Correct the numerical example where $0.8-1.2$ is reported as $-0.3$ instead of $-0.4$ (Sec. III.5). Consider aligning interaction analysis with the ML model by reporting SHAP interaction values for the tree model, or alternatively fitting a simple linear/GLM model with and without interaction terms and comparing coefficients and fit; then describe effects as “positive/negative deviation from additivity” rather than “synergy” unless you define these terms rigorously.
-
Claims of broad generality (e.g., “quantitative atlas,” broadly applicable “design principles”) are stronger than warranted by the study’s restricted domain: single channel/pore geometry, limited functional group set, only NaCl, and classical non-polarizable MD (Abstract, Sec. I, Sec. IV). The limitations discussion focuses on parsing issues but does not sufficiently bound the physical domain of validity (Sec. III.6).
Recommendation: In Sec. III.6 and Sec. IV, clearly delimit the domain of validity: specify the channel geometry/size, functionalization chemistries, electrolyte type/range, and interaction model limitations. Rephrase “atlas”/“design principles” as applicable within this parameter space, and identify which qualitative trends you expect to be robust vs sensitive to geometry, electrolyte identity, or polarization/quantum effects. This will improve credibility without diminishing the paper’s usefulness as a template workflow.