This section audits symbolic/analytic mathematical consistency (algebra, derivations, dimensional/unit checks, definition consistency).
Maths relevance: light
The paper contains a small number of explicit mathematical formulas (standardization and regression metrics) and mostly qualitative descriptions of the QTT decomposition/feature construction. The central mathematical mechanism (QTT factorization) is described narratively without formal equations, limiting the ability to audit the core method symbolically. Several internal consistency issues exist in preprocessing definitions and cross-references, plus an apparent stray undefined equation fragment.
✔ Log-transform definitions (mass, $v_{\rm max}$) (Sec. 3.1.1, p.2)
✖ Extraneous undefined equation fragment (Sec. 3.1.1, p.2 (between log-transform lines))
✔ Standardization equation (Eq. (1), Sec. 3.1.2, p.2)
✖ Normalization-statistics source inconsistency (Sec. 3.1.2, p.2 vs Sec. 4.2, p.5)
✔ Mean pooling aggregation formula (Sec. 3.4.1, p.3)
✔ Max pooling aggregation definition (Sec. 3.4.2, p.3)
✔ Concatenation aggregation and padding (Sec. 3.4.3, p.3)
✔ Baseline feature dimension count (Sec. 4.4.1, p.6)
✔ MSE metric formula (Sec. 3.6.1, p.4)
✔ MAE metric formula (Sec. 3.6.2, p.4)
✔ $R^2$ metric formula (Sec. 3.6.3, p.4)
✔ Example reshape for QTT ($k=1$) (Sec. 4.3, p.6)
⚠ Fixed-length QTT feature vector claim (flatten+concatenate cores) (Sec. 3.3, p.3 (and referenced in Sec. 4.3, p.6))
✖ Cross-reference to table/section (Sec. 4.6, p.10)
This section audits numerical/empirical consistency: reported metrics, experimental design, baseline comparisons, statistical evidence, leakage risks, and reproducibility.
All executed internal arithmetic/logic checks on the provided numeric statements passed. Verified items include: dataset reduction logic (300 to effective $N=5$), equality of “5 unique subgraphs” and “5 distinct trees,” averages lying within stated min/max ranges for $k=1..3$, power-of-two padding examples relative to max node counts, reshape element-count preservation, baseline feature dimensionality ($3$ aggregations $\times$ $4$ node features $=12$), identification of best QTT configuration by MSE and $R^2$ from the provided table values, baseline vs best-QTT $R^2$ difference ($0.048$), PCA two-component explained-variance totals, and monotonic trends of $R^2$ and MSE with $k$ (rank $2$) plus reconstruction-MSE improvement from rank $2$ to rank $3$.
✔ C1_dataset_reduction_300_to_5 (p.5 (Sec. 4.2 Data Preprocessing and Subgraph Extraction Yield); also Abstract p.1)
✔ C2_subgraphs_unique_equals_trees_processed (p.5 (Sec. 4.2))
✔ C3_k1_nodes_avg_with_min_max (p.5 (Sec. 4.2))
✔ C4_k2_nodes_avg_with_min_max (p.5 (Sec. 4.2))
✔ C5_k3_nodes_avg_with_min_max (p.5 (Sec. 4.2))
✔ C6_padding_next_power_of_two_k1 (p.5 (Sec. 4.2))
✔ C7_padding_next_power_of_two_k2 (p.5 (Sec. 4.2))
✔ C8_padding_next_power_of_two_k3 (p.5 (Sec. 4.2))
✔ C9_k1_tensor_reshape_product (p.6 (Sec. 4.3))
✔ C10_baseline_feature_dimensionality_12 (p.6 (Sec. 4.4.1 Baseline Model))
✔ C11_table1_best_mse_identification (p.6 (Table 1) and p.6-8 (Fig.9 caption text))
✔ C12_table1_best_r2_identification (p.6 (Sec. 4.4.3) and Table 1)
✔ C13_baseline_vs_best_qtt_r2_comparison (p.6 (Sec. 4.4.1 and 4.4.3); p.12 (Conclusions))
✔ C14_pca_baseline_explained_variance_sum (p.8 (Sec. 4.5.2 Dimensionality Reduction (PCA)))
✔ C15_pca_qtt_explained_variance_sum (p.8 (Sec. 4.5.2 Dimensionality Reduction (PCA)))
✔ C16_rank2_r2_trend_with_k (p.10 (Sec. 4.6.1 Impact of $k$) referencing Table 1)
✔ C17_rank2_mse_trend_with_k (p.6 (Table 1))
✔ C18_qtt_recon_mse_rank_improves_k1 (p.6 (Sec. 4.3))
✔ C19_qtt_recon_mse_rank_improves_k2 (p.6 (Sec. 4.3))
✔ C20_qtt_recon_mse_rank_improves_k3 (p.6 (Sec. 4.3))