-
Core methodological gap: the PDE specification and PINN/training setup are not described at a level that allows reproduction or interpretation, and it is unclear whether the reported ID–$\nu$ effect is physical or an artifact of training/conditioning (Sec. 1, Sec. 2.1–2.2, Sec. 3.1). In particular, the manuscript does not unambiguously state (i) the exact Burgers equation used (scalar vs vector, spatial dimensionality, forcing), domain, nondimensionalization, and initial/boundary conditions; (ii) whether a single conditional PINN is trained across all viscosities (with $\nu$ as an input) or 25 separate models are trained; (iii) network architecture details (depth/width/activations; where the 10D latent is taken—bottleneck vs intermediate layer); (iv) loss terms and weights (PDE residual vs IC/BC vs data), collocation strategy, optimizer/schedule, epochs/stopping; and (v) solution accuracy versus a reference solver across $\nu$.
Recommendation: Expand Sec. 2.1–2.2 into a fully specified experimental protocol: write the explicit PDE(s) and all IC/BCs; define the domain and whether $\nu$ is dimensionless; state clearly whether you train one conditional model across $\nu$ or multiple models and how $\nu$ enters the network; provide the full architecture (including the latent layer location and why $10$D was chosen); provide the exact loss and weights, sampling of collocation points, optimizer/schedule, training length and stopping; and report PINN accuracy per $\nu$ (e.g., $L_2$ / $L_\infty$ error against a numerical solver, and/or PDE residual statistics). Without this, the latent-space analysis cannot be meaningfully evaluated.
-
Validity/meaning of “intrinsic dimensionality” results is currently not credible because reported IDs substantially exceed the latent embedding dimension (e.g., $\sim 40$ in $\mathbb{R}^{10}$) without rigorous justification, debugging/sanity checks, or uncertainty quantification (Sec. 2.3, Sec. 3.2–3.3). Under standard manifold definitions, ID cannot exceed the ambient dimension; persistent $\text{ID} > 10$ strongly suggests estimator/pathology issues (implementation error, preprocessing/metric problems, density inhomogeneity, duplicates, boundary effects) or that the quantity is being used as an “effective complexity index” rather than intrinsic dimension.
Recommendation: Strengthen Sec. 2.3 and Sec. 3.2–3.3 with (i) an explicit definition of what you claim to measure—either true intrinsic/manifold dimension (and then explain why $>10$ can occur only as estimator failure/bias) or explicitly rename the output as an “effective dimension/complexity proxy” when it exceeds $10$; (ii) a pipeline sanity check on synthetic datasets embedded in $\mathbb{R}^{10}$ with known intrinsic dimension and comparable sample size (e.g., linear subspaces, noisy spheres, Swiss roll), reporting bias/variance and frequency of $\text{ID}>10$; (iii) bootstrap/jackknife/subsampling uncertainty estimates for each $\nu$ (error bars/bands on $ID(\nu)$); and (iv) explicit implementation details (distance metric, tie/duplicate handling, numerical precision, nearest-neighbor algorithm). If $\text{ID}>10$ persists, you must frame it carefully and corroborate the trend with additional measures (see below).
-
Neighbor-based ID estimators are applied to highly structured, grid-sampled point clouds ($101\times 103$ evaluations of a smooth map $(x,t)\mapsto L$), violating i.i.d. sampling assumptions and potentially inducing strong biases due to spatial/temporal correlations, anisotropic spacing, boundary effects, and near-duplicate latent vectors (Sec. 2.2–2.3, Sec. 3.1–3.2). This could create spurious non-monotonicity or inflate estimates (including $\text{ID}>10$).
Recommendation: Add robustness checks targeted at sampling structure (Sec. 3.2): (i) compute ID after random subsampling (e.g., $10\%$, $25\%$, $50\%$, $75\%$) and after decorrelated sampling (e.g., farthest-point sampling in latent space, or stratified sampling across $(x,t)$); (ii) evaluate latent vectors at off-grid $(x,t)$ (jittered or random points) to test sensitivity to grid regularity; (iii) report distance histograms, minimum-distance/duplicate rates, and whether activations saturate in some regimes; and (iv) confirm that the qualitative ID–$\nu$ curve (peak and high-$\nu$ downturn) survives these controls.
-
The central phenomenon is presented as a property of “the PINN latent space,” but experiments appear to rely on a single network instance/latent layer/dimension, with no robustness across random seeds, architectures, latent sizes, or even layer choice (Sec. 2.1–2.3, Sec. 3.1–3.3). This makes it unclear whether the non-monotonic curve and peak location are generic or accidental (optimization artifact/local minimum, hyperparameter effect, layer-specific geometry).
Recommendation: Add an explicit robustness section (Sec. 3.2–3.3): (i) retrain with multiple random seeds and show mean$\pm$std $ID(\nu)$; (ii) test at least one alternative architecture or latent dimensionality (e.g., $5$D/$20$D) and/or extract latents from different layers to see whether the trend is stable; (iii) if compute is limited, do these tests on a subset of viscosities (low/mid/high) but report variability. Clearly separate which conclusions are stable versus model-dependent.
-
The paper does not establish a quantitative link between $ID(\nu)$ and either (a) physical complexity of the underlying Burgers solutions or (b) training/approximation difficulty, leaving the main interpretation underdetermined (Sec. 3.1–3.4, Sec. 4). The RG-like narrative especially requires distinguishing “physics-driven simplification” from “network-driven representation changes.”
Recommendation: Augment Sec. 3.1–3.4 with correlational analyses against: (i) physical metrics computed from a reference solver or high-quality PINN output (e.g., gradient norms/total variation, shock indicators, spectral energy vs wavenumber, enstrophy-like measures depending on the PDE form); and (ii) learning/fit metrics (PINN error vs $\nu$, PDE residual norms, BC/IC residuals). Additionally, include at least one conceptually different latent complexity measure (e.g., PCA participation ratio/effective rank, local PCA dimension, or singular-value decay) to see whether the *trend* (especially high-$\nu$ decrease) agrees across metrics. Use these to argue whether ID is tracking physical degrees of freedom or training pathologies.
-
The RG-like “flow” interpretation is currently presented too strongly relative to the evidence: no explicit coarse-graining transformation, scale analysis, semigroup/composition property, or fixed-point-like behavior is demonstrated (Sec. 2.5, Sec. 3.4, Sec. 4). As written, the data mainly support “non-monotonic representational complexity vs $\nu$” plus a plausible diffusion-smoothing intuition at high $\nu$.
Recommendation: Either (A) operationalize the RG analogy with at least one concrete test (Sec. 3.4): define an explicit coarse-graining on inputs/solutions (spatial filtering/downsampling) and track how latent representations and their effective dimension change under that map, or test whether latent statistics exhibit a flow with approximate composition across $\nu$; or (B) reframe the RG discussion as a heuristic analogy, explicitly acknowledging alternative explanations (training difficulty, saturation, estimator artifacts) and moderating Abstract/Sec. 4 language to “suggestive/consistent with” rather than implying an RG mechanism has been established.
-
Foundational notation/model mismatch: the manuscript calls the problem “2D Burgers equation,” but the data/notation indicate only one spatial coordinate (101 points in $x$) plus time (Sec. 2.1–2.2; notation $L(x,t;\nu)$). If the PDE is truly 2D in space, the setup is missing $y$ and the field definition (scalar vs vector velocity). If it is 1D-in-space Burgers, the term “2D” is misleading.
Recommendation: Make the PDE dimensionality consistent throughout. If it is 1D spatial Burgers, rename accordingly and use $u(x,t)$. If it is 2D spatial Burgers, define $u(x,y,t)$ (or $(u,v)$), specify the $y$-grid and domains, and update dataset tensor shapes/notation (Sec. 2.1–2.2, Sec. 3.1).
-
The statistical modeling in Sec. 2.4 and Sec. 3.3 emphasizes monotone summaries (global Spearman $\rho$, linear/log fits) despite the key result being strongly non-monotonic (peak + downturn). This risks mischaracterizing the main phenomenon and does not quantify the peak location/uncertainty or the significance of the high-$\nu$ decrease.
Recommendation: Revise Sec. 2.4 and Sec. 3.3 to treat monotone fits as baselines only, and add non-monotone modeling and inference: spline/GP smoothing, quadratic or piecewise-linear change-point models, and tests comparing monotone vs non-monotone fits (information criteria). Report uncertainty on peak location and on the high-$\nu$ downturn using the ID uncertainty estimates (bootstrap bands). Consider reporting rank correlations separately on low$\rightarrow$mid and mid$\rightarrow$high $\nu$ regimes.