How We Grade
The AI sees the same rubric you see. No hidden criteria, no secret formulas. This page documents exactly how every essay is evaluated — and the same methodology applies across every simulation on the platform.
Theoretical Foundation
Grounded in research.
Not just built — designed.
Every design decision in this assessment system maps to established pedagogical research, ensuring the grading methodology serves learning — not just measurement.
Formative Assessment
Rubric criteria are displayed while students write — not hidden until after submission. This implements Black & Wiliam's seminal finding that achievement improves when students understand evaluation criteria before performing tasks.
Feed-Forward Feedback
Per-criterion AI feedback after each weekly submission informs the next performance rather than merely evaluating the last. Each cycle's feedback becomes input for the next week's decision-making.
Iterative Experiential Cycles
Each multi-week simulation repeats Kolb's experiential learning cycle — experience, reflection, conceptualization, experimentation — with compounding consequences, directly addressing Kayes's critique of single-iteration designs.
Productive Failure
Students who struggle with complex problems before receiving instruction outperform those who receive instruction first. The compounding metric system — where early decisions create downstream consequences — embodies this principle.
Scaffolded Complexity
Three difficulty tiers progressively reduce scaffolding as student capability increases — fewer advisor uses, tighter crisis thresholds — implementing Vygotsky's Zone of Proximal Development through structured support withdrawal.
Automated Essay Scoring
AI-assisted evaluation achieves inter-rater reliability comparable to human raters when rubric criteria are clearly defined. Our transparent, criterion-level rubric design is built on this established AES research foundation.
Situated Cognition
Brown, Collins & Duguid (1989) established that knowledge is most effectively acquired within authentic contexts. The simulation's CEO role, stakeholders with quantified traits, and industry-sourced articles create a situated learning environment where strategic reasoning is embedded in realistic organizational dynamics.
Stakeholder Salience
The stakeholder system's influence, hostility, flexibility, and risk-tolerance dimensions mirror Mitchell, Agle & Wood's (1997) stakeholder salience framework of power, urgency, and legitimacy — creating the organizational complexity that makes decision-making consequential and grading contextually rich.
The Rubric
Four criteria.
100 points total.
Every essay is scored on these four dimensions — the same criteria displayed on the decision page while students write their responses.
Evidence Quality
25ptsCite specific data, statistics, or case studies from Intel articles using source codes (AIM, APX, WFT)
What the AI evaluates:
Does the response reference concrete data points, industry benchmarks, or research findings? Are sources identified by code? Vague references to 'studies' without specifics score lower.
Reasoning Coherence
25ptsPresent a logical argument connecting chosen strategy settings to evidence and outcomes
What the AI evaluates:
Is there a clear thesis or decision rationale? Do the paragraphs build on each other logically? Are cause-and-effect relationships articulated, not just asserted?
Trade-off Analysis
25ptsAcknowledge sacrifices, identify biggest risks, and explain contingency plans
What the AI evaluates:
Does the response name what is being given up? Are risks specific (not generic)? Is there a 'Plan B' or mitigation strategy? One-sided arguments score lower.
Stakeholder Consideration
25ptsAddress how decisions affect 2-3+ stakeholder groups and balance competing interests
What the AI evaluates:
Are at least two distinct stakeholder perspectives named? Does the response acknowledge tension between groups? Are trade-offs between stakeholders addressed rather than ignored?
Scoring Bands
Clear expectations.
Published thresholds.
Every score maps to a published band — students know exactly where they stand and what it takes to improve.
Per-Criterion Bands (25 points each)
Overall Quality Thresholds
Exceptional depth with specific data citations, multi-stakeholder analysis, and risk mitigation
Solid analysis with clear reasoning, relevant evidence, and recognition of competing interests
General understanding demonstrated but missing depth, specificity, or balanced perspective
Insufficient evidence, unclear reasoning, or does not address the prompt requirements
The Process
How the AI evaluates
your essay.
From submission to final grade — a transparent, five-step process where every decision point is visible.
Student Submits Essay
The response is submitted through the weekly decision page, where the rubric and recommended sources are always visible.
AI Evaluates Against Rubric
The AI independently scores each of the four criteria using the same rubric published to students — no hidden criteria, no secret formulas.
Per-Criterion Feedback
Each criterion receives a numeric score (out of 25) and written feedback identifying specific strengths and areas for improvement.
Overall Quality Assessment
Scores are totaled and mapped to a quality label (Excellent, Good, Adequate, Poor) using the published thresholds.
Instructor Reviews
The instructor sees every AI score alongside the original essay. They can adjust scores, add comments, and override any grade before finalizing.
Calibration
Calibrated against
exemplar responses.
The grading engine is calibrated so that the scores match what experienced faculty would assign. We validate against exemplar essays to ensure consistency.
Cite specific statistics, address 3+ stakeholder groups with contingency plans, and provide multi-layered risk analysis. These consistently score in the 93–96 range across repeated evaluations.
Present clear reasoning with relevant evidence but may miss a stakeholder group or provide generic rather than specific risk mitigation.
Show understanding of the topic but rely on general statements without specific data, acknowledge fewer trade-offs, or skip stakeholder analysis.
Quality Assurance
Consistent and
reproducible.
The AI evaluates each criterion independently, reducing the halo effect common in holistic grading. The same essay produces consistent scores across multiple evaluations.
Human Authority
AI assists.
Instructors decide.
AI scores are formative — they give students immediate feedback and help instructors work efficiently. But the instructor always has the final word.
Optional Feature
Curved scoring.
Opt-in only.
When different weeks have different difficulty levels, curved scoring normalizes results so students aren't penalized for tackling harder scenarios.
How it works
Note: When curved scoring is disabled, all curved score columns, chart datasets, and PDF references are hidden. Raw scores are the only scores displayed.
Student Experience
What students see
at every step.
Transparency isn't just about publishing criteria — it's about making them visible in the moment they matter most.
While Writing
The weekly decision page displays the full rubric and recommended source articles alongside the essay input. Students can reference criteria and source codes while composing their response.
After Submission
The week results page shows a per-criterion score breakdown with the same quality labels (Excellent, Good, Adequate, Poor) and written feedback on each dimension.
Integrity Safeguards
How we keep AI grades honest.
Every submission is anchored to a vetted exemplar, scored by two independent raters, and fact-checked against a curated evidence corpus before any score reaches the student.
95-point exemplar anchor
A vetted human-graded response sets the 95 mark. Scores above 95 require a written justification from the model.
Two-rater consensus
Two independent passes at different temperatures must agree within 3 points per criterion. Divergent scores trigger a tiebreaker pass.
Evidence verification
Cited statutes, codes, and case studies are checked against a whitelist. Evidence Quality is capped by the count of verified citations.
AI-writing screening
Heuristic stylometric analysis flags responses with hallmark AI patterns for instructor review before any grade is released.
Withholding & review queue
Flagged submissions are confirmed received but their scores are withheld pending instructor release in the Needs Review queue.
Cohort calibration report
Instructors see distribution statistics, divergence rates, and flag breakdowns to detect grade inflation across a cohort.
References
Read the full White Paper →Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-74. doi:10.1080/0969595980050102
Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18(1), 32-42. doi:10.3102/0013189X018001032
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81-112. doi:10.3102/003465430298487
Kapur, M. (2016). Examining productive failure, productive success, and unproductive failure in learning. Educational Psychologist, 51(2), 289-299. doi:10.1080/00461520.2016.1155457
Kayes, D. C. (2002). Experiential learning and its critics: Preserving the role of experience in management learning and education. Academy of Management Learning & Education, 1(2), 137-149. doi:10.5465/amle.2002.8509336
Kolb, D. A. (1984). Experiential learning: Experience as the source of learning and development. Prentice-Hall.
Mitchell, R. K., Agle, B. R., & Wood, D. J. (1997). Toward a theory of stakeholder identification and salience. Academy of Management Review, 22(4), 853-886. doi:10.5465/amr.1997.9711022105
Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.
Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2), 89-100. doi:10.1111/j.1469-7610.1976.tb00381.x