Spaces:

Nomearod
/

agentbench

Sleeping

App Files Files Community

agentbench / tests

259 kB

Ctrl+K

Ctrl+K

4 contributors

History: 68 commits

Nomearod's picture

docs+test: round-2 incident response — Google API key format scrub

4dc3e01 12 days ago

test_langchain_baseline
fix: stream stage events live, thread source_chunks, fix LangChain wrapper 16 days ago
__init__.py

0 Bytes
feat: Day 1 — repo scaffolding, provider abstraction, config, tests about 1 month ago
conftest.py

6.19 kB
refactor: address batch-2 review feedback 14 days ago
test_agent.py

8.57 kB
feat: enrich SearchTool metadata with scores, previews, PII count 16 days ago
test_app_corpus_map.py

8.69 kB
fix: batch-3 adversarial review findings 14 days ago
test_audit_logger.py

5.45 kB
fix: ruff lint — import sorting, unused imports, line length, naming 26 days ago
test_config_corpora.py

3.48 kB
fix: batch-3 adversarial review findings 14 days ago
test_corpus_routing.py

17.3 kB
fix: batch-3 adversarial review findings 14 days ago
test_evaluation.py

10.2 kB
feat(eval): Week 1 step 5 — 25-question K8s golden dataset + grounded_refusal fix 12 days ago
test_golden_schema.py

5.85 kB
feat: extend GoldenQuestion with source_pages and source_sections 13 days ago
test_injection_detector.py

5.55 kB
security: fail-closed on secret extraction and env var leakage 14 days ago
test_landing_page_inject.py

4.34 kB
fix: batch-3 adversarial review findings 14 days ago
test_memory.py

8.85 kB
feat: add SQLite conversation sessions with session_id about 1 month ago
test_meta_corpus.py

3.56 kB
fix: batch-3 adversarial review findings 14 days ago
test_output_validator.py

8.78 kB
docs+test: round-2 incident response — Google API key format scrub 12 days ago
test_pii_redactor.py

4.72 kB
fix: ruff lint — import sorting, unused imports, line length, naming 26 days ago
test_prompt_template.py

4.75 kB
refactor: address batch-2 review feedback 14 days ago
test_provider.py

33.1 kB
feat: Anthropic Haiku benchmark + README with provider comparison about 1 month ago
test_rag.py

13.6 kB
style: fix ruff lint — import sorting, line length 16 days ago
test_reranker_scores.py

2.9 kB
style: fix ruff lint — import sorting, line length 16 days ago
test_search_metadata.py

3.27 kB
feat: enrich SearchTool metadata with scores, previews, PII count 16 days ago
test_security_config.py

4.18 kB
fix(security): validate injection tier names, normalize URLs 26 days ago
test_security_integration.py

8.4 kB
test: update security integration mock for _orchestrator_done event 16 days ago
test_security_types.py

1.33 kB
feat(security): add SecurityVerdict and OutputVerdict types 26 days ago
test_selfhosted_provider.py

26.1 kB
feat: infrastructure sprint — vLLM/Modal, Helm, Terraform (#8) 26 days ago
test_serving.py

20.1 kB
style: fix ruff lint — import sorting, line length 16 days ago
test_stream_route_events.py

7.39 kB
style: fix ruff lint — import sorting, line length 16 days ago
test_stream_stages.py

6.39 kB
fix: batch-3 adversarial review findings 14 days ago
test_tools.py

12.3 kB
docs(eval): Fix 2 SearchTool query expansion — attempted and reverted 12 days ago