When a team is small enough to just linearly run through the list, this is a simple framework for calibration. You can filter by level, manager, etc or just go straight through.
Define a set of criteria to evaluate on, then use that set to determine a final rating
Bucket groups of employees evenly into 3 bins, then split those into 3 bins each to arrive at 9 bins of employees. Then assign ratings to the bins.
Managers set their ratings first, and then the group gets together to compare ratings and fill in the Final Ratings.
Another form of multi-level calibration (similar to #4), but is done using a draft stacking approach instead. Similar to how sports leagues draft incoming athletes, this process progressively selects employees from the available pool one-by-one.
Have every calibrator rate every employee, then use the composite signal as ordering to determine a final rating.