This section audits symbolic/analytic mathematical consistency (algebra, derivations, dimensional/unit checks, definition consistency).
Maths relevance: light
The paper contains limited formal mathematics (no explicit numbered equations). The main analytic elements are (i) definitions of binary/grouped POA variables, (ii) log-transformations of skewed outcomes with small offsets, and (iii) linear (OLS) and classification model specifications described in prose. The most important internal-consistency issues are mismatches in how the POA predictor is coded (binary vs two dummies with a third baseline) and ambiguous/variable definitions of the log-transformed outcomes ($\ln(x)$ vs $\ln(x+c)$).
✔ Log-transform with offsets (definition) (Sec. 2.2, Resource Utilization Outcomes, p.3)
✖ Dependent variable naming vs transformation (Sec. 2.5.1, p.4; Sec. 3.4 and Figs. 16–17 captions, pp.8–9)
✔ Binary POA grouping for main comparison (Sec. 2.2, POA_PRINC_DIAG definition, p.2)
✔ Classification target definition (Sec. 2.4.1, p.3)
✔ PR-AUC baseline vs prevalence consistency (Sec. 3.1 (prevalence 1.63%), p.6; Sec. 3.3 (baseline PR-AUC 0.016), p.6)
✔ Engineered POA count features are principal-only (Sec. 3.1, pp.5–6 (Figure 7 discussion))
✔ Feature leakage description (tautological predictor) (Sec. 3.3, p.6–7)
✖ Regression POA predictor coding (Methods vs Results) (Sec. 2.5.2, p.4 vs Sec. 3.4.2, p.9)
⚠ Interpretation of POA coefficients as N vs Y/U/W contrast (Sec. 3.4.2, p.9)
✔ Interaction term degeneracy under mutually exclusive dummies (End of Sec. 3.4.2, p.9)
✔ Use of 'vast majority' for mutual exclusivity (End of Sec. 3.4.2, p.9)
✖ Semantic inconsistency in calibration figure description (Figure 15 caption/text, p.8)
This section audits numerical/empirical consistency: reported metrics, experimental design, baseline comparisons, statistical evidence, leakage risks, and reproducibility.
Fifteen text-based numeric statements were checked using parsing, logical bounds/order checks, and simple derived relationships. All checks passed; no inconsistencies were detected within the validated scope.
✔ C1_total_records_2018 (Page 4 / Results 3.1 (first paragraph))
✔ C2_age_stats_internal_consistency (Page 4 / Results 3.1 (age summary statistics paragraph))
✔ C3_los_summary_stats_consistency (Page 5 / Results 3.1 (resource utilization outcomes paragraph))
✔ C4_charges_summary_stats_consistency (Page 5 / Results 3.1 (resource utilization outcomes paragraph))
✔ C5_skewness_kurtosis_nonnegative_check (Page 5 / Results 3.1 (resource utilization outcomes paragraph))
✔ C6_added_constants_for_log_transform (Page 3 / Methods 2.2 (Resource Utilization Outcomes bullet))
✔ C7_pr_auc_baseline_vs_prevalence (Page 6 / Results 3.3 (PR curves paragraph) + Page 6 (minority class prevalence))
✔ C8_auc_roc_ordering_check (Page 6 / Results 3.3 (model metrics bullets))
✔ C9_pr_auc_ordering_check (Page 6 / Results 3.3 (model metrics bullets))
✔ C10_f1_score_range_check (Page 6 / Results 3.3 (model metrics bullets))
✔ C11_avg_unique_principal_dx_leq_1 (Page 6 / Results 3.1 (engineered features paragraph))
✔ C12_loge_los_coeff_ordering (Page 9 / Results 3.4.2 (Regression model results, loge LOS))
✔ C13_loge_charges_coeff_signs (Page 9 / Results 3.4.2 (Regression model results, loge Charges))
✔ C14_r2_bounds (Page 9 / Results 3.4.2 (Regression model results))
✔ C15_implied_percent_from_baseline_pr_auc (Page 6 / Results 3.3 (PR curves paragraph))