-
The central scientific claim (CVAE reconstruction improves step counting from $25~{\rm Hz}$) is not supported by valid evidence: the OxWalk distribution used lacks ground-truth step annotations, so step-centered segmentation and absolute MAE/MAPE step-count evaluation cannot be performed, and all reported quantitative results are acknowledged to be invalid (Sec. 2.1–2.2, Sec. 2.4, Sec. 3.2–3.4, Sec. 4).
Recommendation: Pick and execute one consistent scope: (a) Methods + empirical study: obtain/create step annotations (OxWalk alternative release if it exists; manual labeling on a subset; synchronized reference sensor; or another dataset with step labels) and rerun the entire pipeline in a clean environment; or (b) Case study: fully reframe the paper so the primary contribution is the data/provenance/workflow post-mortem, and explicitly state (Abstract, Sec. 1, Sec. 4) that the step-counting efficacy hypothesis remains untested. If staying with OxWalk *without* step labels, also consider adding an intermediate, label-free evaluation (Major Issue 2) so the CVAE reconstruction component is empirically assessed even if absolute step-count accuracy is not.
-
The manuscript treats missing step labels as blocking *all* model training, but step annotations are not strictly necessary for training a $25~{\rm Hz} \to 100~{\rm Hz}$ reconstruction model if aligned paired signals exist; labels are primarily required for step-centered window sampling and evaluation against ground-truth step counts (Sec. 2.2–2.4, Sec. 3.2). This conceptual coupling weakens the methodology and the paper’s usefulness even as a proposal.
Recommendation: Decouple (i) reconstruction training from (ii) step-count evaluation: - In Sec. 2.2–2.3, define an alternative windowing scheme that does not require step events (e.g., uniformly sampled aligned windows from walking bouts, or sliding windows across the full stream with stratification by participant and activity if available). - Add reconstruction metrics that do not require step labels (e.g., time-domain MAE/RMSE on acceleration/SVM, correlation, spectral/PSD similarity, coherence) and report them in Sec. 3 if feasible. - For step counting, if no ground truth exists, explicitly label any comparison as *proxy* (e.g., comparing peak counts on reconstructed $100~{\rm Hz}$ against peak counts on the true $100~{\rm Hz}$ using the same detector) and keep it separate from claims about real-world step-count accuracy. This yields an executable study even under limited labels, while clearly stating what remains unvalidated.
-
Workflow contamination by legacy artifacts (pre-existing .npz feature tensors and step_count_evaluation_results.csv) undermines the credibility of the entire results section, yet the current description is not sufficiently forensic/reproducible for readers to learn from it or verify the claims (Sec. 3.3–3.4).
Recommendation: Expand Sec. 3.3–3.4 into an auditable post-mortem: - Enumerate exact artifact filenames/paths loaded by each script (e.g., train\_cvae.py, synthesis/evaluation scripts), including how paths are resolved and any fallbacks. - Provide concrete evidence (timestamps, directory listings, hashes/checksums, tensor shapes, dataset version identifiers) demonstrating mismatch between intended inputs and loaded artifacts. - Add explicit “failure mode” and “fix” steps (e.g., require-empty output dirs; fail-fast if expected raw-data files/labels are missing; embed dataset version + commit hash into every artifact; containerize; pin dependencies; write-protect artifact directories). - Consider moving invalid plots/tables to an appendix labeled as *artifact outputs from contaminated runs*, and add a short checklist readers can reuse (Sec. 3.5 or Sec. 4).
-
Dataset provenance and validation are insufficiently specified: the paper does not clearly document the exact OxWalk version/source/date, what files/columns were expected vs actually present, and what checks were performed to rule out labels stored under different names or in separate files (Sec. 2.1, Sec. 3.2).
Recommendation: In Sec. 2.1, add a concise dataset audit table: - dataset release identifier (URL/DOI), date accessed, checksum (if available); - directory/file listing (at least for one participant) and example headers/columns; - explicit statement of what was searched for (e.g., 'step' column; separate annotation files; alternative column names) and results of that search. Then add a short, explicit *data validation protocol* (must-pass checks) that precedes any modeling (e.g., non-empty label fields; plausible step rates; alignment between $25~{\rm Hz}$ and $100~{\rm Hz}$ streams).
-
Methods are described in a way that blends intended protocol with what was actually executed, which can mislead readers given that the reported outputs are invalid (Sec. 2.1–2.5 vs Sec. 3).
Recommendation: Systematically label procedures as Planned vs Executed: - Either split Sec. 2 into “Planned Methods” and add a new “Actual Execution / Deviations” subsection before Sec. 3, or annotate each subsection (Sec. 2.2–2.5) with an explicit status note. - Ensure Sec. 3 contains only verified observations (e.g., missing labels, artifact loading behavior) and does not read like a completed performance evaluation.
-
Key technical specifications are underspecified, limiting replicability even as a proposal: CVAE conditioning definition, architecture details, loss terms (KL), training hyperparameters, alignment between $25~{\rm Hz}$ and $100~{\rm Hz}$ windows, and overlap-add stitching are not implementation-ready (Sec. 2.3–2.4).
Recommendation: Augment Sec. 2.3–2.4 with a concrete specification: - Define the conditional model explicitly (e.g., $q(z|X_{\rm high},X_{\rm low})$, $p(X_{\rm high}|z,X_{\rm low})$, and whether $p(z|X_{\rm low})$ is used). - Provide layer-by-layer architecture (Conv1D channels, kernel/stride, activations, latent dim), optimizer and schedule, epochs, batch size, normalization/dropout, and any $\beta$-VAE weighting. - State the exact KL formula and aggregation/normalization. - Clarify synchronization: whether $25~{\rm Hz}$ is a native stream aligned by timestamps to $100~{\rm Hz}$ or is derived by downsampling; document the exact resampling/downsampling method. - Specify overlap-add details: hop length, overlap fraction, any tapering window, and boundary handling.
-
Related work is not systematically covered, making it hard to judge novelty and to place the CVAE reconstruction idea within existing step-counting, low-frequency sensing, and time-series super-resolution literature (Sec. 1–2).
Recommendation: Add a Related Work section (between Sec. 1 and Sec. 2): - classical step counting (thresholding/peak methods) and learning-based alternatives; - impacts of sampling rate reduction on gait/step detection; - time-series super-resolution / reconstruction for wearable sensors; - VAEs/CVAEs (and diffusion/other generative models if relevant) in gait or activity analysis. Clearly state what is novel here (e.g., placement-specific reconstruction + downstream counting pipeline) while qualifying that step-count accuracy gains are not yet validated.