-
The data-generating physical/simulation model is insufficiently specified, preventing readers from knowing what equations/terms are actually “recoverable” (Sec. 2.1, Sec. 3.1). The manuscript describes a “three-dimensional periodic fluid system” but does not clearly state the governing PDEs, whether the velocity is incompressible, the role/definition of $\rho$ (density vs signed fluctuation vs tracer), boundary conditions, forcing, parameters (e.g., Reynolds/Péclet), numerical scheme, and (critically) the actual simulation time step and output cadence. Without this, it is difficult to evaluate claims about missing pressure gradients, expected advective/diffusive balance, or the severity of temporal undersampling.
Recommendation: Expand Sec. 2.1 (or add a dedicated subsection) to document the dataset provenance and the data-generating model: explicit PDEs (even if only “believed/assumed”), boundary conditions (periodic), forcing/initial conditions (as known), nondimensional parameters, and the true simulation time step $\Delta t_{\text{sim}}$ and snapshot interval $\Delta t_{\text{snap}}$. If the dataset is external and details are unknown, state what is unknown explicitly, cite the source, and reframe subsequent claims (pressure/CFL) as conditional hypotheses rather than established facts.
-
The core claim—temporal undersampling/aliasing dominates failure—is not quantitatively demonstrated, and the “$\Delta t = 1$” assumption is treated too literally (Sec. 2.2.2, Sec. 3.5, Conclusions). With only $10$ snapshots, finite-difference $\partial_t$ estimates can be dominated by truncation error and/or noise; moreover, if the physical time axis is unknown, scaling directly affects the regression targets and regularization path. The CFL $\approx 128$ argument is currently presented without a tight link to the true sampling interval and velocity scale.
Recommendation: In Sec. 2.2.2 and Sec. 3.5: (i) clearly distinguish “index time step” from physical/nondimensional time; if physical $\Delta t$ is unknown, present results in a scale-free way and treat $\Delta t$ as an unknown factor; (ii) provide a derivation of the effective CFL (or a dimensionless sampling ratio $\Delta t_{\text{snap}}/\tau_{\text{char}}$) using reported $\Delta x$, a characteristic velocity (mean/max $|u|$), and $\Delta t_{\text{snap}}$; and (iii) add quantitative derivative-diagnostic evidence, e.g., compare central vs one-sided vs higher-order differences, temporal smoothing (Savitzky–Golay/local polynomial in time), or regularized differentiation, and report how the distribution/variance of $\partial_t f$ and downstream $R^2$ change. Even a small sensitivity table would substantially strengthen the “undersampling is primary” conclusion.
-
Causal attribution is confounded: library inadequacy (pressure/projection), regression/model-class limitations, and derivative error are discussed together but not empirically disentangled (Sec. 2.2.1, Sec. 3.5). In particular, the manuscript highlights “missing pressure gradient” as a major limitation for momentum equations, yet does not test whether adding pressure-like information (or using a formulation that removes pressure) changes the outcome; similarly, it is unclear whether incompressibility holds ($\nabla\cdot u \approx 0$) and whether a vorticity/projection-based equation discovery would be more appropriate.
Recommendation: Re-structure Sec. 3.5 to separate hypotheses and support each with at least one diagnostic/ablation: (a) check and report incompressibility/divergence statistics (distribution of $\nabla\cdot u$, relative magnitude vs $|\nabla u|$); (b) if pressure is available, add $\nabla p$ to the library and rerun; if not available, explicitly label the pressure discussion as a hypothesis and add a proxy experiment: regress the vorticity equation (pressure-free) or apply a solenoidal projection/Helmholtz decomposition in Fourier space (natural for periodic domains) before regression; (c) for $\rho$, explicitly test a passive-scalar advection–diffusion-only library ($u\cdot\nabla\rho$, $\Delta\rho$, optional source) to see whether “algebraic dominance” persists.
-
The experimental protocol and evaluation are under-specified and may be optimistic/unstable given strong spatiotemporal correlations (Sec. 2.3, Sec. 3.4). The manuscript does not clearly state whether $R^2$ is computed in-sample, on held-out points, or via cross-validation; how train/test splitting is performed given only $10$ time slices; and whether results are stable across random subsamples/seeds. Random pointwise splits can leak information due to spatial correlation and may not reflect generalization across time.
Recommendation: In Sec. 2.3.3 and Sec. 3.4: (i) specify the exact split protocol (by time slice holdout is strongly preferred here; alternatively spatial-block holdout), and report both train and test $R^2$; (ii) report variability across multiple random seeds/subsamples (mean $\pm$ std of selected terms and $R^2$); and (iii) add at least one complementary diagnostic beyond pointwise $R^2$, such as RMSE/MAE and a spectral comparison (power spectrum of predicted vs true $\partial_t$ fields) to quantify the observed over-smoothing. If feasible, attempt a short-horizon forward integration/rollout from one snapshot (even if it fails), and report what fails (instability vs drift vs loss of small scales).
-
Methodological breadth is too narrow to support broad conclusions about “data-driven equation discovery” under sparse temporal sampling (Sec. 2.3, Sec. 3.2–3.4). The negative result is based essentially on one pipeline (pointwise feature library + LassoCV). It remains unclear whether the failure is intrinsic to the data regime or specific to LassoCV, scaling choices, or the pointwise strong-form formulation.
Recommendation: Add one controlled comparison (minimal but informative): e.g., STLSQ/SINDy thresholding, ridge + pruning, or a weak-form/WSINDy variant that reduces derivative sensitivity; and/or a Fourier/spectral formulation leveraging periodicity. Report whether the same qualitative pathology (algebraic-term dominance; near-zero advection/diffusion; low generalization) persists. This can be done as a small ablation in Sec. 3.2–3.4 without reframing the entire paper.
-
A “positive control” is missing, so it is hard to know whether the pipeline is correctly implemented and whether the observed failure threshold is specific to this dataset (Sec. 3.5, Conclusions). As written, the paper implicitly generalizes from a single case with unknown governing equations and potentially missing key fields (pressure).
Recommendation: Include a compact validation experiment: generate (or cite) a known PDE system on a periodic grid (e.g., 3D/2D advection–diffusion for a scalar, or a simpler fluid surrogate), run the same feature construction and regression, and then downsample in time to $10$ snapshots. Show how identification quality degrades as snapshots are reduced, ideally isolating the “$10$ snapshots” regime. This would directly substantiate the central message while keeping the current dataset as the main case study.
-
Feature scaling, nondimensionalization, and collinearity are not analyzed rigorously, yet they are central to interpreting the learned equations (Sec. 2.2.1, Sec. 2.3, Sec. 3.3–3.5). The manuscript notes standardization and “unstandardization,” but does not provide the exact mapping (including intercept handling), and dimensional consistency of mixed terms (constants, $u$, $u^2$, derivative terms) is not verifiable. Collinearity is argued qualitatively (e.g., $u_\text{mag}$ vs component squares) without quantitative diagnostics.
Recommendation: In Sec. 2.3 and Sec. 3.5: (i) state clearly whether variables/features are nondimensional and/or normalized before feature construction; (ii) provide the exact coefficient unstandardization formula (including intercept adjustment); (iii) quantify collinearity (feature correlation matrix for key groups, condition number/VIF), and test a small decorrelation ablation (e.g., remove $u_\text{mag}$ if $u_x^2,u_y^2,u_z^2$ are present; or separate mean + fluctuations $u = \bar u + u'$ and build features on fluctuations). Report whether term selection and $R^2$ change materially.
-
Reproducibility is limited by missing details on data access, preprocessing, derivative computation, and LassoCV configuration (Sec. 2.1–2.3). Given the paper’s value as a cautionary benchmark, the community benefit depends heavily on being able to replicate and extend the study.
Recommendation: Strengthen Sec. 2.1–2.3 with: dataset availability/source link (or explicit constraints), preprocessing steps (precision, periodic derivative handling, any filtering), exact subsampling strategy for the $100,000$ points (uniform/stratified over time), random seeds, and full LassoCV settings (folds, alpha grid, max_iter, tol, normalization). If possible, release code and either the dataset or a small subset plus scripts to rebuild derivatives/features.