Meta-UAT
Collection
Weight space learning experiments (interpreting behavior through activation signatures)
•
16 items
•
Updated
This model was trained to classify which patterns a subject model was trained on, based on neuron activation signatures.
The model predicts which of the following 14 patterns the subject model was trained to classify as positive:
palindromesorted_ascendingsorted_descendingalternatingcontains_abcstarts_withends_withno_repeatshas_majorityincreasing_pairsdecreasing_pairsvowel_consonantfirst_last_matchmountain_pattern| Pattern | Precision | Recall | F1 Score |
|---|---|---|---|
| palindrome | 17.2% | 79.1% | 28.2% |
| sorted_ascending | 36.3% | 77.4% | 49.4% |
| sorted_descending | 16.7% | 92.0% | 28.2% |
| alternating | 23.3% | 74.4% | 35.5% |
| contains_abc | 29.7% | 90.6% | 44.8% |
| starts_with | 13.8% | 79.7% | 23.5% |
| ends_with | 35.5% | 75.3% | 48.3% |
| no_repeats | 14.3% | 70.1% | 23.8% |
| has_majority | 63.3% | 48.7% | 55.1% |
| increasing_pairs | 18.4% | 84.3% | 30.3% |
| decreasing_pairs | 16.7% | 83.0% | 27.9% |
| vowel_consonant | 14.3% | 31.6% | 19.7% |
| first_last_match | 32.5% | 67.5% | 43.9% |
| mountain_pattern | 12.7% | 82.5% | 22.1% |