-
Emergent-condition prediction (HAD_EMERGENT_CONDITION) shows perfect/near-perfect discrimination (AUC-ROC/AUC-PR/F1 $\approx 1.0$) across models and subgroups (Sec. 3.2; Table 1; Figures 15–16), indicating severe target leakage or circular construction. Although acknowledged, the manuscript does not isolate which predictors encode post-admission/discharge-only information (e.g., diagnosis-group features potentially built from all diagnoses, POA-missingness artifacts, downstream fields), nor does it provide a leakage-free reformulation—leaving a primary stated objective unresolved (Sec. 2.3.1, Sec. 3.2, Sec. 3.5, Conclusions).
Recommendation: Add an explicit leakage audit and a corrected modeling attempt. Concretely: (a) In Sec. 2.3.1 (or an appendix), enumerate every predictor used in the classifier (including one-hot diagnosis-group indicators, counts, missingness flags, POA-uncertainty flags, any physician/hospital identifiers, and any discharge disposition–like fields), tagging each with its provenance and whether it is available at admission. (b) Verify in code (ideally with unit-test style checks) that any diagnosis-derived features for the emergence model are computed exclusively from POA$='Y'$ diagnoses (and not from the union of all diagnoses due to a merge/mapping step). (c) Re-run the emergence task with a strictly admission-time feature set (demographics; admission type/source; payer; principal diagnosis and comorbidities derived only from POA$='Y'$; counts of POA$='Y'$ diagnoses). (d) Use evaluation splits that help detect facility-level/coding artifacts (e.g., hospital-held-out split and/or temporal split across months) and report performance there. If leakage-free prediction is not feasible from discharge abstracts alone, explicitly reframe the emergence-prediction objective (Sec. 3.2, Conclusions) as a cautionary demonstration rather than a substantive predictive contribution.
-
The operational definition of “emergent condition” as any POA$='N'$ diagnosis is clinically heterogeneous and appears overwhelmingly dominated by obstetric (ICD-10 Chapter O) codes (Sec. 2.2.4, Sec. 3.1). This raises a ‘bigger picture’ concern: the target may primarily capture routine delivery-related coding rules rather than hospital-acquired morbidity, undermining interpretability of both the emergence task and the conclusion that emergent morbidity adds negligible information for LOS/charges (Sec. 3.3, Sec. 3.5, Conclusions).
Recommendation: Refine and stratify the endpoint in Sec. 2.2.4 and re-run the core analyses. Minimum set of revisions: (a) Separate obstetric vs non-obstetric analyses (e.g., exclude Chapter O diagnoses and/or identify delivery-related encounters via principal diagnosis/procedure where possible) and report prevalence for each stratum in Sec. 3.1. (b) Define one or more clinically coherent complication endpoints (e.g., CMS HAC-like groups, PSI-like complications, or a curated list of complication CCSR groups where POA is intended to distinguish complications from comorbidity) and repeat the emergence and utilization analyses on these endpoints. (c) Report how conclusions (incremental $R^2$; effect directions) change when excluding or separately modeling pregnancy-related stays, and discuss explicitly in Sec. 3.5/Conclusions how Chapter O dominance affects the headline results.
-
Interpretation of POA$='N'$ as ‘developed during hospitalization’ and language implying ‘impact’ on LOS/charges is not well supported by the data-generating process: POA is a documentation/coding flag without onset time; POA$='N'$ can reflect delayed recognition, coding variation, or POA uncertainty rather than true in-hospital acquisition (Sec. 1, Sec. 2.2.4, Sec. 3.3, Sec. 3.5, Conclusions). This creates a risk of causal overinterpretation and, for utilization models, post-treatment adjustment confusion (emergent diagnoses are recorded at discharge and are downstream of care intensity/LOS/charges).
Recommendation: Tighten the inferential framing throughout. (a) In the Introduction and Sec. 2.2.4, explicitly state that POA$='N'$ does not provide a timestamp and is not equivalent to ‘hospital-acquired’ without additional clinical validation. (b) In Sec. 3.3 and Conclusions, replace causal language (‘impact’, ‘contribution’, ‘drives’) with association/prediction language, and clearly label models that include emergent features as ‘hindsight’/discharge-informed rather than admission-time models. (c) If you want an ‘incremental impact’ interpretation, either (i) redesign around a causal estimand (e.g., mediation-aware framing) or (ii) restrict to complication definitions with stronger face validity (see Major Issue 2) and emphasize descriptive associations only.
-
Hospital and subgroup comparisons based on the emergent-condition prediction model—especially hospital observed-to-expected ($O/E$) ratios and subgroup AUC$=1.0$ results—are not interpretable given leakage and likely heterogeneity in coding completeness/intensity across hospitals (Sec. 3.4.1–3.4.2; Figures 15–16). Even if leakage is fixed, hospital comparisons will remain highly sensitive to POA coding practices and ‘non-informative POA’ handling.
Recommendation: Revise Sec. 3.4 to avoid misleading performance/performance-comparison implications. (a) Remove or quarantine (clearly labeled as invalid artifacts) any $O/E$ ratios and subgroup AUC summaries derived from the leaky model; do not interpret them as hospital performance or risk-adjusted comparisons. (b) If hospital-level results are retained after fixing leakage and endpoint definitions, add uncertainty intervals and consider hierarchical/shrinkage approaches; explicitly discuss coding completeness as a confounder (and, where feasible, report POA completeness metrics by hospital and their correlation with HAD_EMERGENT_CONDITION). (c) Consider switching hospital-level reporting to leakage-free descriptive rates (with minimal adjustment) framed as coding/case-mix exploration, not quality measurement.
-
Key resource-utilization conclusions rely on very coarse emergent-condition features (binary presence and counts of POA$='N'$ diagnoses) and on models with modest fit and heteroscedastic residuals (especially for LOS; Sec. 3.3). The manuscript does not quantify uncertainty around the incremental $R^2$ changes (often tiny deltas) nor provide effect sizes on original scales, limiting the strength and interpretability of the ‘negligible incremental value’ conclusion (Sec. 3.3, Sec. 3.5, Conclusions).
Recommendation: Strengthen Sec. 3.3 with effect sizes and uncertainty. (a) Report out-of-sample deltas explicitly (e.g., $+0.0004$ in $R^2$) and include bootstrap or cross-validation confidence intervals for baseline vs full model performance. (b) Provide interpretable effect sizes: for linear models, coefficients/SEs/CIs for emergent features and translate to days (LOS) and dollars (charges) where possible; for tree models, provide partial dependence or SHAP marginal effects with uncertainty (or at least stratified averages). (c) Explore richer but still interpretable emergent features tied to clinically coherent subsets (Major Issue 2), and consider interactions with baseline severity/service line. (d) Consider alternative outcome models more appropriate for skew/heteroscedasticity (e.g., log-LOS, negative binomial for LOS; Gamma/Tweedie for charges; quantile regression) and report whether conclusions are robust.
-
Very high reported missingness for PAT_AGE ($\sim 59.3\%$) and its handling via midpoint mapping/median imputation plus missingness indicator (Sec. 2.2.2, Sec. 3.1) raises a ‘bigger picture’ data-quality concern: this may reflect masked/redacted age coding (e.g., neonates/very old) or an extraction/mapping bug. If systematic, it can distort subgroup analyses, fairness interpretations, and any age gradients in LOS/charges and emergent condition rates.
Recommendation: Validate and document the age field and missingness mechanism. (a) Cross-check PUDF documentation and show a pre-/post-mapping table of age code distributions (Sec. 2.2.2, Sec. 3.1). (b) Distinguish truly missing vs masked/interval-coded age categories; consider modeling age as categorical bands (including a ‘masked/unknown’ category) rather than midpoint+imputation when missingness is structural. (c) Add sensitivity analyses restricting to records with non-missing/usable age and report whether the main utilization conclusions (incremental value of emergent features; $R^2$ patterns) change.