-
The final discovered PDE system is not presented explicitly and cleanly for all fields, limiting interpretability and reproducibility (Sec. 3.2.1–3.2.2, Sec. 4). The $v_x$ equation appears truncated (dangling "+ 4"), includes apparent duplicated terms, and the $v_y$ and $v_z$ equations are described only qualitatively as having “similar structures”. The density equation is shown but not clearly separated into trustworthy vs artifact components.
Recommendation: Add a dedicated subsection (e.g., Sec. 3.2.3) and a table listing the final discovered PDEs for $\rho$, $v_x$, $v_y$, $v_z$ exactly as used for validation: each selected term, coefficient, and operator notation, with no truncation/duplication. Explicitly state which equation(s)/terms are excluded from simulation (if any). For $\rho$, clearly label physically plausible continuity-like terms vs the anti-diffusive artifact and state whether the reported $\rho$ PDE is intended as a model or a diagnostic failure case.
-
Coefficient magnitudes and physical interpretations are currently not verifiable due to feature normalization and ambiguous scaling of derivatives/time (Sec. 2.3, Sec. 3.2.1). The library columns are normalized to unit $L^2$ norm (Sec. 2.3), but later coefficients are interpreted as if attached to unnormalized operators (viscosity, “pressure” scaling, etc.). Additionally, the manuscript’s time-step inference from an advection coefficient is inconsistent with the stated finite-difference definition (Eq. (1), Sec. 3.2.1).
Recommendation: State explicitly whether reported coefficients (Sec. 3.2, Eqs. (4)–(5)) are in the normalized-feature basis or converted back to the unnormalized operator basis. If conversion is performed, provide the exact unnormalization formula (including any scaling of the target $\partial_t f$), and report both normalized and de-normalized coefficients (or at least key terms) with units/nondimensional form. Re-derive the relationship between learned coefficients and physical coefficients starting from Eq. (1); remove or correct the $\Delta t \approx 1/5.5$ claim unless an explicit rescaling step makes that inversion valid. If underlying physical units are unknown, frame coefficients as “effective” and avoid parameter identification claims.
-
Sparse regression/model selection details are under-specified, and the procedure appears to include post-hoc/manual collinearity filtering that can materially affect the discovered PDE (Sec. 2.4, Sec. 3.2.1). In addition, standard cross-validation can leak information for spatiotemporal fields if folds are formed by randomly sampling grid points, inflating apparent generalization.
Recommendation: Expand Sec. 2.4 to fully specify: LassoCV configuration ($\alpha$ grid/range, number of folds, scoring, convergence tolerances, random seed), whether data are subsampled/batched given $\approx 8 \times 128^3$ rows, and the exact train/validation splitting strategy. Use blocked CV that respects correlation structure (e.g., folds by time index; or spatial block CV; or both) and report how results change relative to random-row CV. Replace qualitative “collinear statistical artifact” removal with a documented, algorithmic rule (e.g., correlation threshold, VIF/condition number cutoff, or grouped variables), or adopt elastic net / group lasso / reparameterization to reduce researcher degrees of freedom. Provide a brief robustness report: term-selection frequency and coefficient variability under different seeds, folds, and subsampling.
-
Identifiability is threatened by a redundant candidate library (exact/near-exact dependencies such as Laplacian vs individual second derivatives, and multiple equivalent forms of transport/flux terms), making coefficient attribution unstable even if prediction is adequate (Sec. 2.3, Sec. 3.2.1). This is compounded by OLS refitting on correlated selected features, which can amplify variance.
Recommendation: Revise Sec. 2.3 to reduce exact linear dependencies (e.g., include either $\nabla^2 f$ or $\partial^2 f/\partial x_i^2$ terms, not both; avoid simultaneously including multiple algebraically equivalent transport forms unless constrained). Consider a physics-structured library (e.g., conservative-form terms like $-\nabla\cdot(\rho v)$, $-\nabla\cdot(\rho v\otimes v)$, gradient/divergence invariants) to reduce collinearity. Report diagnostics (e.g., condition number of selected design matrix; pairwise correlations; coefficient path stability). If you keep multiple components of $\nabla(\nabla\cdot v)$, either enforce a shared coefficient (isotropic bulk viscosity) or explicitly present it as a general linear combination (see also Sec. 3.2.1 discussion).
-
The density ($\rho$) equation discovery is currently presented in a way that undermines the claim of recovering a coupled system: fit quality is low ($R^2 \approx 0.30$) and the learned PDE contains an unphysical negative diffusion term (Sec. 3.2.2, Sec. 4). The manuscript acknowledges the artifact but does not systematically test mitigations or clearly decide whether $\rho$ should be modeled at all from this dataset.
Recommendation: In Sec. 3.2.2, explicitly mark the anti-diffusive $\nabla^2\rho$ term as physically inconsistent and (unless you can justify it) exclude it from any predictive model. Run at least one mitigation experiment targeted to the stated failure mode: (i) improved temporal differentiation (higher-order, smoothing, Savitzky–Golay, or regularized differentiation), (ii) constrained regression enforcing non-negative diffusion, (iii) revised library excluding second-order $\rho$ terms, and/or (iv) reweighting loss to account for small $\rho$ variance. Report the resulting $\rho$ equation and whether validation improves. If reliable $\rho$ dynamics cannot be recovered, state this clearly and restrict the validated model to momentum (with $\rho$ treated as constant/known).
-
Data provenance and the true data-generating equations/parameters are insufficiently described, weakening physical claims (Sec. 2.1, Sec. 3.2.1). In particular, interpreting $-c\nabla\rho$ as a pressure-gradient surrogate requires an equation of state or a known formulation; otherwise it may simply be an empirical gradient proxy. Without knowing nondimensionalization, forcing, viscosity, Mach/Reynolds numbers, and the actual snapshot spacing, it is hard to judge whether the discovered PDE matches Navier–Stokes or an “effective” model.
Recommendation: In Sec. 2.1, provide the dataset provenance: the solver/model used to generate the data (compressible vs incompressible; any equation of state; forcing; viscosity; etc.), nondimensionalization, and the actual time spacing between snapshots if known. Report relevant regime indicators (Re, Ma) or explicitly state they are unknown. In Sec. 3.2.1, either (i) compare discovered coefficients/terms quantitatively to the known governing equations after consistent rescaling, or (ii) reframe conclusions as learning an effective PDE in arbitrary units and interpret $\nabla\rho$ as an empirical gradient field rather than “pressure” unless justified.
-
Validation is currently too coarse for a turbulence setting: means/standard deviations, RMSE, and slice visuals do not assess scale-dependent dynamics and can miss significant spectral/structural errors (Sec. 2.6, Sec. 3.3). RMSE growth alone is also difficult to interpret without normalization and residual diagnostics.
Recommendation: Augment Sec. 2.6 and Sec. 3.3 with turbulence-relevant diagnostics: kinetic energy spectrum $E(k)$ and its evolution; enstrophy or vorticity-magnitude PDFs; divergence statistics $\langle(\nabla\cdot v)^2\rangle$ to quantify compressibility; spatial correlation functions or structure functions; and time series of total kinetic energy. Report residual diagnostics for the regression stage (e.g., spectrum of residual $\partial_t v - \Theta\xi$, correlation of residual with candidate terms). Include at least one baseline comparison (e.g., advection+Laplacian only) to show the marginal value of additional discovered terms.
-
The numerical integration used for validation is under-specified, and it is unclear how numerical artifacts (aliasing, stabilization) and problematic terms (e.g., anti-diffusion in $\rho$) are handled (Sec. 2.5, Sec. 3.3). Reproducibility and credibility of the validation depend on these details.
Recommendation: Expand Sec. 2.5 with an explicit algorithm: periodic boundary conditions; spatial discretization (spectral/pseudospectral vs finite differences); how nonlinear products are computed; whether de-aliasing (e.g., 2/3 rule) or filtering is used; which terms are treated implicitly in Crank–Nicolson vs explicitly in Euler; how mixed derivatives are evaluated; and the simulation time step $\Delta t_{\rm sim}$ with stability/CFL considerations. Explicitly state whether the anti-diffusive $\rho$ Laplacian is included; if included, explain why the simulation remains stable; if excluded, say so. Provide a simple $\Delta t_{\rm sim}$ sensitivity/convergence check on one metric (e.g., energy or spectrum) to rule out time-discretization artifacts.
-
Positioning/novelty relative to established PDE-discovery literature is incomplete, which makes it hard to assess what is new beyond applying a known sparse-regression pipeline to a 3D turbulent dataset (Sec. 1, Sec. 4).
Recommendation: Add a related-work subsection (e.g., Sec. 1.1) explicitly citing and contrasting with PDE-FIND/SINDy-style sparse regression, weak-form PDE discovery, and neural/hybrid approaches. Clarify what the paper contributes beyond prior work (e.g., specific 3D turbulence data regime, spectral derivative choices, simulation-based validation, analysis of near-constant field failure), and temper novelty claims where the method largely follows established frameworks.