Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Nomearod
/
agentbench
like
0
Sleeping
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
agentbench
/
tests
259 kB
Ctrl+K
Ctrl+K
4 contributors
History:
68 commits
Nomearod
docs+test: round-2 incident response β Google API key format scrub
4dc3e01
12 days ago
test_langchain_baseline
fix: stream stage events live, thread source_chunks, fix LangChain wrapper
16 days ago
__init__.py
Safe
0 Bytes
feat: Day 1 β repo scaffolding, provider abstraction, config, tests
about 1 month ago
conftest.py
Safe
6.19 kB
refactor: address batch-2 review feedback
14 days ago
test_agent.py
Safe
8.57 kB
feat: enrich SearchTool metadata with scores, previews, PII count
16 days ago
test_app_corpus_map.py
Safe
8.69 kB
fix: batch-3 adversarial review findings
14 days ago
test_audit_logger.py
Safe
5.45 kB
fix: ruff lint β import sorting, unused imports, line length, naming
26 days ago
test_config_corpora.py
Safe
3.48 kB
fix: batch-3 adversarial review findings
14 days ago
test_corpus_routing.py
Safe
17.3 kB
fix: batch-3 adversarial review findings
14 days ago
test_evaluation.py
Safe
10.2 kB
feat(eval): Week 1 step 5 β 25-question K8s golden dataset + grounded_refusal fix
12 days ago
test_golden_schema.py
Safe
5.85 kB
feat: extend GoldenQuestion with source_pages and source_sections
13 days ago
test_injection_detector.py
Safe
5.55 kB
security: fail-closed on secret extraction and env var leakage
14 days ago
test_landing_page_inject.py
Safe
4.34 kB
fix: batch-3 adversarial review findings
14 days ago
test_memory.py
Safe
8.85 kB
feat: add SQLite conversation sessions with session_id
about 1 month ago
test_meta_corpus.py
Safe
3.56 kB
fix: batch-3 adversarial review findings
14 days ago
test_output_validator.py
Safe
8.78 kB
docs+test: round-2 incident response β Google API key format scrub
12 days ago
test_pii_redactor.py
Safe
4.72 kB
fix: ruff lint β import sorting, unused imports, line length, naming
26 days ago
test_prompt_template.py
Safe
4.75 kB
refactor: address batch-2 review feedback
14 days ago
test_provider.py
Safe
33.1 kB
feat: Anthropic Haiku benchmark + README with provider comparison
about 1 month ago
test_rag.py
Safe
13.6 kB
style: fix ruff lint β import sorting, line length
16 days ago
test_reranker_scores.py
Safe
2.9 kB
style: fix ruff lint β import sorting, line length
16 days ago
test_search_metadata.py
Safe
3.27 kB
feat: enrich SearchTool metadata with scores, previews, PII count
16 days ago
test_security_config.py
Safe
4.18 kB
fix(security): validate injection tier names, normalize URLs
26 days ago
test_security_integration.py
Safe
8.4 kB
test: update security integration mock for _orchestrator_done event
16 days ago
test_security_types.py
Safe
1.33 kB
feat(security): add SecurityVerdict and OutputVerdict types
26 days ago
test_selfhosted_provider.py
Safe
26.1 kB
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8)
26 days ago
test_serving.py
Safe
20.1 kB
style: fix ruff lint β import sorting, line length
16 days ago
test_stream_route_events.py
Safe
7.39 kB
style: fix ruff lint β import sorting, line length
16 days ago
test_stream_stages.py
Safe
6.39 kB
fix: batch-3 adversarial review findings
14 days ago
test_tools.py
Safe
12.3 kB
docs(eval): Fix 2 SearchTool query expansion β attempted and reverted
12 days ago