Evidence First
Published scores require documented sources, evidence levels, confidence ratings, and limitations.
Assessment Program v1.0
A reusable program for evidence-based, reviewable assessments of AI models, agents, organizations, schools, cities, government systems, and products.
Purpose
Foundation assessments translate the ten Framework dimensions into a structured public-interest review. They are not endorsements, rankings, or marketing claims.
Published scores require documented sources, evidence levels, confidence ratings, and limitations.
Assessors must disclose conflicts, separate evidence from interpretation, and complete review before publication.
Each Framework dimension is scored from 1 to 5 and contributes 10% of the assessment.
Assessments are revisited after material changes, incidents, new evidence, or scheduled review dates.
Workflow
The program allows preparation materials to be public before scores are ready, so scope and evidence requirements can be inspected early.
Evidence
Level A and Level B evidence can support scoring. Level C informs questions and risk flags. Level D cannot be a primary scoring basis.
Level A
Official documentation, technical reports, regulatory filings, audit reports, and published standards.
Level B
Academic papers, independent research, and reputable third-party assessments.
Level C
News articles, public reporting, user reports, and incident context.
Level D
Cannot be a primary basis for scoring; useful only as a research gap or monitoring note.
Scoring
Each dimension receives a 1-5 evidence-backed score. The maximum raw assessment score is 50.
Assessment #001
Status: Preparation Phase. Scope, evidence requirements, and assessment questions are being defined. No scores are published.
Program Documents
The public website explains the program. The repository contains the full governance guide, evidence standards, scoring workbook, matrix template, publication template, and ChatGPT preparation record.