Spaces:
Running
docs(decisions): promote cold-start falsified-assumption and audit-path incident entries, add three-regimes latency refinement
Browse filesTwo entries previously on hf-deploy only:
- "Cold-start gate fired β assumption falsified, fix deferred to v1.1"
documents the N=3 cold-start measurement (median 113s), the falsified
corpus-loading assumption, and the re-pre-committed v1.1 contingency
- "Audit-path bug β streams masked a request-crashing failure (2026-04-15)"
documents the streaming-endpoint failure-masking finding and the fix
Also folds in a post-hoc refinement (2026-04-22): three latency regimes
observed (cold-start 89-129s, wake-from-idle ~6.7s, steady-state warm ~5s),
tightening the README's "~2 min cold / ~5s warm" phrasing and narrowing
the v1.1 first-request-penalty projection to +6-13s over current behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- DECISIONS.md +262 -0
|
@@ -620,6 +620,147 @@ numbers needed to make the decision; the decision is documented here
|
|
| 620 |
so future-me remembers the threshold and doesn't optimize prematurely
|
| 621 |
on a hunch.
|
| 622 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 623 |
## False-premise questions come in two flavors
|
| 624 |
|
| 625 |
When authoring golden-dataset questions whose premise is wrong, the
|
|
@@ -1854,3 +1995,124 @@ threshold, or (c) mark the file explicitly in a
|
|
| 1854 |
option (a) as the follow-up, because it preserves the production
|
| 1855 |
detector regex without weakening and keeps the test's fidelity to
|
| 1856 |
the actual attack surface.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 620 |
so future-me remembers the threshold and doesn't optimize prematurely
|
| 621 |
on a hunch.
|
| 622 |
|
| 623 |
+
## Cold-start gate fired β assumption falsified, fix deferred to v1.1 at the right cause
|
| 624 |
+
|
| 625 |
+
The preceding "Cold-start contingency" entry pre-committed a lazy-load
|
| 626 |
+
fix (FastAPI eager, K8s lazy on first request) if the measured cold
|
| 627 |
+
start exceeded 60 seconds. Measurement falsified the entry's core
|
| 628 |
+
assumption: **corpus loading is not the dominant cold-start cost**.
|
| 629 |
+
The committed fix addresses ~1 % of the observed overshoot. Executing
|
| 630 |
+
it verbatim would honor the gate's letter but not its intent β theater
|
| 631 |
+
dressed as discipline. This entry documents the measurement, the
|
| 632 |
+
falsified assumption, and the new contingency pre-committed at the
|
| 633 |
+
actual cause.
|
| 634 |
+
|
| 635 |
+
**Measurement (N=3, 2026-04-15, HF Spaces target deployment):**
|
| 636 |
+
|
| 637 |
+
| Sample | Cold start | Silent Python init | Visible phase |
|
| 638 |
+
|---|---|---|---|
|
| 639 |
+
| N=1 | 113 s | ~101 s | ~12 s |
|
| 640 |
+
| N=2 | 89 s | ~70 s | ~19 s |
|
| 641 |
+
| N=3 | 129 s | ~115 s | ~14 s |
|
| 642 |
+
|
| 643 |
+
- Median 113 s, mean ~110 s, range 89β129 s (spread ~40 s)
|
| 644 |
+
- **Gate fire is unambiguous at both tails.** Even the fastest sample
|
| 645 |
+
(89 s) is ~48 % over the 60 s threshold; the slowest (129 s) is
|
| 646 |
+
~115 % over. No boundary ambiguity.
|
| 647 |
+
- **Sample-size justification.** N=3 is acknowledged as a small sample.
|
| 648 |
+
It is adequate here because (a) the gate-fire conclusion is stable
|
| 649 |
+
across both tails, (b) the "silent Python init dominates variance"
|
| 650 |
+
finding is stable across all three samples (silent phase varies
|
| 651 |
+
70 β 115 s across runs; visible phase varies only 12 β 19 s), and
|
| 652 |
+
(c) the cost of additional samples (manual HF Space restart + ~2 min
|
| 653 |
+
wait + log extraction per sample) exceeds the marginal information
|
| 654 |
+
gain once both tails fire the gate and the variance pattern is stable.
|
| 655 |
+
N=4 would tighten the confidence interval on the median but does not
|
| 656 |
+
change either the gate-fire conclusion or the falsified-assumption
|
| 657 |
+
finding.
|
| 658 |
+
- **Variance source named.** HF Spaces shared-infrastructure CPU / IO
|
| 659 |
+
contention during Python module imports. The silent-init phase
|
| 660 |
+
varies 45 s across samples (70 β 115 s); the visible phase is stable
|
| 661 |
+
(12β19 s). That is the signature of host-level contention on a
|
| 662 |
+
shared physical node, not code-level variability. An
|
| 663 |
+
exclusively-owned container would plausibly show a tighter bound.
|
| 664 |
+
- **Raw log captures** (preserved so this entry can be cross-checked
|
| 665 |
+
against the underlying evidence without re-running the measurement):
|
| 666 |
+
`measurements/2026-04-15-coldstart-n1.log`, `-n2.log`, `-n3.log`.
|
| 667 |
+
|
| 668 |
+
**Where the cost lives.** At the median (113 s):
|
| 669 |
+
|
| 670 |
+
- **Silent Python init phase β ~90 s (β 80 % of total):** interpreter
|
| 671 |
+
start, module imports (`torch`, `transformers`, `langchain`, `faiss`,
|
| 672 |
+
`fastapi`, `httpx`, the full dependency closure), and initial model
|
| 673 |
+
weight loading (`all-MiniLM-L6-v2` embedder, cross-encoder
|
| 674 |
+
reranker). Not logged β no observability inside the import chain.
|
| 675 |
+
- **Visible startup phase β ~15 s (β 15 % of total):** injection
|
| 676 |
+
classifier init (~10 s, includes the "classifier skipped" warning),
|
| 677 |
+
FastAPI corpus load (< 1 s, +0.9 MB RSS), K8s corpus load (< 1 s,
|
| 678 |
+
+25.8 MB RSS), reranker warmup (~2 s).
|
| 679 |
+
|
| 680 |
+
**The K8s corpus load β which the pre-committed fix was designed to
|
| 681 |
+
defer β contributes under 1 second of the 113-second median.**
|
| 682 |
+
Deferring it saves roughly 1 % of the overshoot. FastAPI corpus load
|
| 683 |
+
is the same order of magnitude. Corpus loading is simply not where the
|
| 684 |
+
cost lives on this deployment.
|
| 685 |
+
|
| 686 |
+
**Why we are not executing the pre-committed fix.** The preceding
|
| 687 |
+
contingency was written under an empirical assumption about cost
|
| 688 |
+
attribution (corpus loading is the dominant cost). Measurement
|
| 689 |
+
falsified the assumption. Implementing the fix anyway would be a
|
| 690 |
+
mechanical execution of a recipe whose premise has been disproven β
|
| 691 |
+
it checks the gate-honoring box while failing to address the cause.
|
| 692 |
+
That is structurally identical to relaxing-by-redefinition ("60 s was
|
| 693 |
+
too tight"), just in the opposite direction: **relaxing by execution**.
|
| 694 |
+
The pre-commitment rule's purpose is to prevent motivated reasoning
|
| 695 |
+
about the gate, not to mandate mechanical compliance with a recipe
|
| 696 |
+
whose empirical foundation has collapsed.
|
| 697 |
+
|
| 698 |
+
The honest action is (1) accept the measurement as the v1 baseline,
|
| 699 |
+
(2) document the falsified assumption explicitly (this entry),
|
| 700 |
+
(3) re-pre-commit a new contingency at the actual dominant cost with
|
| 701 |
+
an explicit trigger condition so the decision is not relitigated at
|
| 702 |
+
review time, and (4) update the user-facing README surface to reflect
|
| 703 |
+
the measured cold-wake number rather than the optimistic pre-deploy
|
| 704 |
+
estimate.
|
| 705 |
+
|
| 706 |
+
**v1.1 contingency β pre-committed:**
|
| 707 |
+
|
| 708 |
+
> **If HF Spaces traffic produces more than N cold wakes per day**
|
| 709 |
+
> (N to be determined from observed usage patterns after launch, **not
|
| 710 |
+
> estimated in advance**), defer eager loading of (a) the cross-encoder
|
| 711 |
+
> reranker, (b) the sentence-transformers embedder, and (c) the
|
| 712 |
+
> injection classifier tier to first-relevant-request.
|
| 713 |
+
>
|
| 714 |
+
> **Estimated work:** 4β6 hours (lazy-init wrappers + first-request
|
| 715 |
+
> caching + integration tests for the warm/cold transition).
|
| 716 |
+
>
|
| 717 |
+
> **Expected tradeoff:** cold wake ~113 s β ~50β60 s (approaches the
|
| 718 |
+
> original 60 s target); **first request after any cold wake incurs
|
| 719 |
+
> +8β15 s** additional latency (model weights load synchronously in
|
| 720 |
+
> the request path), after which subsequent warm requests return to
|
| 721 |
+
> normal ~5 s latency.
|
| 722 |
+
>
|
| 723 |
+
> **Trigger is usage-justified, not estimate-justified.** Until real
|
| 724 |
+
> traffic data justifies the work, there is nothing to optimize β a
|
| 725 |
+
> recruiter demo that gets one cold wake per day does not pay for
|
| 726 |
+
> 4β6 hours of engineering plus the new first-request-latency failure
|
| 727 |
+
> mode. The trigger threshold N is left unnamed deliberately: naming a
|
| 728 |
+
> number in advance would invite the same falsification pattern this
|
| 729 |
+
> entry is documenting.
|
| 730 |
+
|
| 731 |
+
**Methodology lesson.** When a pre-committed contingency is written
|
| 732 |
+
under an empirical assumption, the contingency only holds if the
|
| 733 |
+
assumption survives measurement. If measurement falsifies the
|
| 734 |
+
assumption, the correct action is to document the falsification,
|
| 735 |
+
accept the observed baseline, and re-pre-commit at the actual cause.
|
| 736 |
+
The wrong action is to execute the original recipe anyway, which
|
| 737 |
+
trades one form of motivated reasoning (threshold relaxation) for
|
| 738 |
+
another (recipe compliance). The underlying discipline β "pre-commit
|
| 739 |
+
your gates and honor them" β does not mean "mechanically run the
|
| 740 |
+
pre-committed fix regardless of what it addresses." It means "honor
|
| 741 |
+
the gate's *intent*, which is to prevent motivated reasoning about
|
| 742 |
+
pass/fail."
|
| 743 |
+
|
| 744 |
+
**Post-hoc refinement (2026-04-22) β three latency regimes observed.**
|
| 745 |
+
Follow-up warm-latency measurements after the initial cold-start
|
| 746 |
+
characterization distinguish a middle regime that the README's
|
| 747 |
+
"~2 min cold / ~5 s warm" phrasing collapses:
|
| 748 |
+
|
| 749 |
+
| Regime | Latency | Cause |
|
| 750 |
+
|---|---|---|
|
| 751 |
+
| Cold-start (container spin-up) | 89β129 s | Python imports, model load (per the table above) |
|
| 752 |
+
| Wake-from-idle (first `/ask` after container is up) | ~6.7 s (β +2 s over warm) | Residual lazy init on first request |
|
| 753 |
+
| Steady-state warm | ~5 s (mean 4.74 s, n=5) | LLM API round-trip + retrieval |
|
| 754 |
+
|
| 755 |
+
Source: n=1 first-hit-after-cold-wake @ 6.7 s, then n=5 steady-state
|
| 756 |
+
warm @ 4.07, 4.69, 4.78, 5.06, 5.11 s (2026-04-22). The README claim
|
| 757 |
+
captures regimes 1 and 3; the ~2 s first-hit penalty between them is
|
| 758 |
+
a refinement of the warm-cold boundary, not a contradiction. The
|
| 759 |
+
v1.1 projection of "+8β15 s first request after any cold wake"
|
| 760 |
+
measures the delta from this ~2 s current baseline, not from a
|
| 761 |
+
zero-penalty starting point β the true v1.1 UX cost over current
|
| 762 |
+
behavior is closer to +6β13 s.
|
| 763 |
+
|
| 764 |
## False-premise questions come in two flavors
|
| 765 |
|
| 766 |
When authoring golden-dataset questions whose premise is wrong, the
|
|
|
|
| 1995 |
option (a) as the follow-up, because it preserves the production
|
| 1996 |
detector regex without weakening and keeps the test's fidelity to
|
| 1997 |
the actual attack surface.
|
| 1998 |
+
|
| 1999 |
+
## Audit-path bug β streams masked a request-crashing failure (2026-04-15)
|
| 2000 |
+
|
| 2001 |
+
During v1 deploy smoke testing, every non-stream `POST /ask` request
|
| 2002 |
+
and every injection-blocked request on the HF Space returned
|
| 2003 |
+
`{"detail": "Internal server error"}` (HTTP 500) instead of the
|
| 2004 |
+
intended 200 / 403. Normal queries via `POST /ask/stream` appeared to
|
| 2005 |
+
work correctly from the dashboard, so the bug was invisible until a
|
| 2006 |
+
direct curl hit a non-stream endpoint. Discovered and fixed same-day.
|
| 2007 |
+
|
| 2008 |
+
**Root cause.** `agent_bench/security/audit_logger.py:60` called
|
| 2009 |
+
`self.path.parent.mkdir(parents=True, exist_ok=True)` the first time
|
| 2010 |
+
a request wrote an audit record. The default path `logs/audit.jsonl`
|
| 2011 |
+
resolved to `/home/user/app/logs/audit.jsonl` at runtime. The
|
| 2012 |
+
Dockerfile's `WORKDIR /home/user/app` creates that directory as
|
| 2013 |
+
**root:root mode 0755** because Docker's WORKDIR directive does not
|
| 2014 |
+
honor `--chown`. Subsequent `COPY --chown=user` lines only change
|
| 2015 |
+
ownership of the copied files, not the directory itself. At runtime
|
| 2016 |
+
under `USER user` (uid 1000), the process had `r-x` on WORKDIR β it
|
| 2017 |
+
could read and execute, but not create new subdirectories. First
|
| 2018 |
+
audit write β `PermissionError: [Errno 13] Permission denied: 'logs'`.
|
| 2019 |
+
|
| 2020 |
+
**Why `.cache/` worked and `logs/` didn't.** `.cache/` is created at
|
| 2021 |
+
build time by the two `RUN python scripts/ingest.py ...` steps
|
| 2022 |
+
(running as root) and then explicitly chowned by
|
| 2023 |
+
`RUN chown -R user:user .cache/`. `logs/` was created lazily at
|
| 2024 |
+
runtime by the audit logger, as a non-root user, in a directory owned
|
| 2025 |
+
by root. Directory-creation permission wall.
|
| 2026 |
+
|
| 2027 |
+
**Why the streaming endpoint masked it.** `/ask/stream` calls
|
| 2028 |
+
`_write_audit` at the *end* of the event generator
|
| 2029 |
+
(`routes.py:438`), after all stage events and the final answer have
|
| 2030 |
+
already been yielded over SSE. A failure there is already too late
|
| 2031 |
+
to affect the client's view of the response β the client sees the
|
| 2032 |
+
answer, the stream ends, and the audit entry is silently missing.
|
| 2033 |
+
Non-stream `/ask` calls `_write_audit` synchronously before `return`
|
| 2034 |
+
(line 263), so the failure propagates up through the middleware
|
| 2035 |
+
exception handler and becomes a visible 500. Injection-blocked
|
| 2036 |
+
requests on both endpoints also audit synchronously before returning
|
| 2037 |
+
(lines 193, 302) and produce the same visible 500. Normal dashboard
|
| 2038 |
+
use hits `/ask/stream` β symptom invisible; smoke testing
|
| 2039 |
+
non-stream `/ask` β symptom obvious.
|
| 2040 |
+
|
| 2041 |
+
**Traceback confirmed from the HF Space runtime log:**
|
| 2042 |
+
|
| 2043 |
+
```
|
| 2044 |
+
File "/home/user/app/agent_bench/serving/routes.py", line 302, in ask_stream
|
| 2045 |
+
_write_audit(
|
| 2046 |
+
File "/home/user/app/agent_bench/serving/routes.py", line 558, in _write_audit
|
| 2047 |
+
audit_logger.log(record)
|
| 2048 |
+
File "/home/user/app/agent_bench/security/audit_logger.py", line 60, in log
|
| 2049 |
+
self.path.parent.mkdir(parents=True, exist_ok=True)
|
| 2050 |
+
PermissionError: [Errno 13] Permission denied: 'logs'
|
| 2051 |
+
```
|
| 2052 |
+
|
| 2053 |
+
**Two-fix patch.** Both applied in the same deploy cycle:
|
| 2054 |
+
|
| 2055 |
+
1. **Dockerfile (proximate).** Commit `ca34ccb`:
|
| 2056 |
+
`RUN mkdir -p logs && chown -R user:user .cache/ logs/` at build
|
| 2057 |
+
time, before `USER user`. Ensures the runtime user owns the
|
| 2058 |
+
audit-log directory from container start.
|
| 2059 |
+
2. **`AuditLogger.log()` (architectural, defense-in-depth).** Commit
|
| 2060 |
+
`25e0f1b`: wrap the entire write body in `try/except Exception`,
|
| 2061 |
+
log the failure via structlog as `audit_write_failed` with the
|
| 2062 |
+
exception type and path, return normally.
|
| 2063 |
+
|
| 2064 |
+
**Why both, not either alone.**
|
| 2065 |
+
|
| 2066 |
+
- **Only fix 1:** unblocks the Space but leaves the architectural
|
| 2067 |
+
bug β an audit logger that can crash the app is misdesigned
|
| 2068 |
+
regardless of filesystem permissions. The SECURITY.md OWASP LLM10
|
| 2069 |
+
writeup would be structurally weak.
|
| 2070 |
+
- **Only fix 2:** stops the crashes, but audit writes would still
|
| 2071 |
+
silently fail on HF Spaces because `logs/` remains unwritable. The
|
| 2072 |
+
OWASP LLM10 claim would be *live-untrue* on the demo surface.
|
| 2073 |
+
- **Both:** demo is honest (audit log actually written on HF Space),
|
| 2074 |
+
architecture is sound (audit failures degrade gracefully), and the
|
| 2075 |
+
OWASP LLM10 writeup has live evidence backing both claims.
|
| 2076 |
+
|
| 2077 |
+
**Verification on the live Space after the patch.** `GET /health`
|
| 2078 |
+
flipped from `"degraded"` to `"healthy"` with `provider_available:
|
| 2079 |
+
true`; `POST /ask` normal returned 200 with grounded answers citing
|
| 2080 |
+
real sources from both corpora (was 500); `POST /ask` with the
|
| 2081 |
+
`ignore_previous` injection pattern returned 403
|
| 2082 |
+
`"Request blocked: potential prompt injection detected"` (was 500);
|
| 2083 |
+
the runtime log shows `injection_detected pattern=ignore_previous
|
| 2084 |
+
tier=heuristic` followed by a 403 status and no `audit_write_failed`
|
| 2085 |
+
entries, confirming the audit write succeeded end-to-end.
|
| 2086 |
+
|
| 2087 |
+
**Methodology lesson β streaming endpoints can hide backend failures.**
|
| 2088 |
+
Any request-path side effect (audit logging, metrics, cache write,
|
| 2089 |
+
persistence) that runs at the end of a streaming generator will fail
|
| 2090 |
+
silently from the client's perspective, because content is already
|
| 2091 |
+
committed to the wire before the failure fires. Testing such side
|
| 2092 |
+
effects against their failure modes *through* a streaming happy path
|
| 2093 |
+
is not sufficient β the happy path masks the failure. Every
|
| 2094 |
+
request-path side effect needs at least one test that verifies it
|
| 2095 |
+
fails loudly (or, where the design says graceful, degrades loudly)
|
| 2096 |
+
when its underlying dependency is broken, **exercised synchronously,
|
| 2097 |
+
not via the streaming code path.**
|
| 2098 |
+
|
| 2099 |
+
For this project, `AuditLogger.log()` should gain a unit test that
|
| 2100 |
+
asserts it returns normally (not raises) when its target directory is
|
| 2101 |
+
unwritable β i.e., a regression test for the `try/except` fix above.
|
| 2102 |
+
And the dependency injection of the audit logger in the route
|
| 2103 |
+
handlers should get an integration test that exercises both the
|
| 2104 |
+
streaming and non-streaming paths against the same failure injection,
|
| 2105 |
+
to assert the streaming path does not hide what the non-streaming
|
| 2106 |
+
path surfaces. Tracked as a v1.1 test-coverage follow-up.
|
| 2107 |
+
|
| 2108 |
+
**Why this wasn't caught locally.** Existing
|
| 2109 |
+
`tests/test_audit_logger.py` tests use pytest's `tmp_path` fixture,
|
| 2110 |
+
which is always writable by the test process. The failure mode only
|
| 2111 |
+
manifests when the runtime user cannot create the log directory, a
|
| 2112 |
+
deployment-environment condition not reproducible in a unit test
|
| 2113 |
+
against a local filesystem owned by the developer. An integration
|
| 2114 |
+
test that runs the Docker image end-to-end and hits `/ask` against
|
| 2115 |
+
the actual container filesystem would have caught it pre-deploy.
|
| 2116 |
+
Such a test is out of scope for v1 (adds ~5 min to CI plus Docker
|
| 2117 |
+
build infrastructure) but is the right long-term mitigation for this
|
| 2118 |
+
class of bug.
|