Grid-Aware Compute Scheduling

Abstract

AI compute is becoming a grid actor. Training jobs, batch inference, indexing, evaluation, and non-real-time analytics represent vast quantities of flexible compute — workloads that must eventually be run but that do not need to run at any particular moment. Yet most AI infrastructure treats all workloads as equally urgent, submitting them as fast as hardware allows regardless of grid conditions.

The EM Foundation's contribution is to frame compute timing as a continuity ethics question: AI data centers that draw from shared grid infrastructure have an obligation to schedule flexible workloads in ways that preserve grid stability for critical services, reduce peak stress during high-demand periods, and align optional compute with moments when renewable generation is available and grid capacity is ample.

This paper presents a grid-aware scheduling policy simulator using public grid data, a comparison of ordinary FIFO scheduling against grid-aware scheduling across measurable outcomes, and an open-source contribution invitation. It also presents a policy memo framing grid-aware scheduling as a demand-response participation model suitable for regulatory engagement.

II. The Core Formula

Grid-Aware Scheduling Score — Conceptual DefinitionGAS = RenewableFraction
    - PeakDemandPenalty
    - CarbonIntensityPenalty
    + LatencyTolerance

Where:

RenewableFraction   = current renewable generation / total generation in region
                      Range: 0–1 (higher = better time to run)

PeakDemandPenalty   = max(0, (current_demand - baseline_demand) / peak_demand)
                      = 0 at or below baseline; increases during peak periods

CarbonIntensityPenalty = current_carbon_intensity / max_carbon_intensity
                         (from public grid carbon intensity APIs)

LatencyTolerance    = job_flexibility_score
                      RC-1 type jobs (batch training): high tolerance = 1.0
                      Real-time inference: zero tolerance = 0.0
                      Intermediate jobs: 0.3–0.7

Scheduling rule: run job immediately if GAS > urgent_threshold
                 queue job if GAS < defer_threshold and latency_tolerance > 0
                 run queued jobs when GAS recovers above defer_threshold

Public data sources: EIA API, ElectricityMaps API, CAISO OASIS, ERCOT API

III. Experiment Design

The simulation uses publicly available grid data from the US Energy Information Administration API and regional grid operators. No proprietary data is required.

Data Sources

EIA API: hourly generation by source for major US grid regions. ElectricityMaps API: carbon intensity by region (free tier available). CAISO OASIS or ERCOT API: real-time and historical load data for Texas and California regions. 30-day historical window provides sufficient variation for meaningful comparison.

Job Generation

Synthetic AI workload mix: 15% urgent inference (zero latency tolerance), 35% flexible inference (up to 2-hour tolerance), 30% batch evaluation (up to 12-hour tolerance), 20% training (up to 48-hour tolerance). Total compute demand sized to represent a mid-sized AI deployment at approximately 10-50 MW equivalent.

Two Scheduling Policies

FIFO baseline: Jobs submitted immediately as hardware becomes available. No grid awareness.

Grid-aware: Urgent jobs run immediately. Flexible, batch, and training jobs scored by GAS and scheduled during high-renewable, low-demand windows within their latency tolerance.

Measurement Outcomes

Peak load contribution during grid stress periods. Renewable alignment (fraction of flexible compute run during high-renewable hours). Job delay for flexible workloads (time from submission to execution). Carbon proxy impact (estimated CO2 equivalent based on grid carbon intensity at execution time). Dropped job risk (jobs that cannot be deferred without exceeding latency tolerance).

IV. SLA Preservation and Compute Elasticity Classification

SLA preservation is the primary objection to grid-aware scheduling. Any infrastructure operator will ask: what happens to latency SLAs when you defer workloads? The answer depends entirely on the elasticity classification of the deferred job. The GAS formula includes a LatencyTolerance parameter for this reason — jobs with zero tolerance are never deferred regardless of grid conditions.

The Foundation proposes a four-tier compute elasticity classification:

Class	Examples	Max Deferral	GAS Treatment
E0 — Inelastic	Real-time inference, live API requests, safety-critical monitoring	0 — never deferred	Always scheduled immediately; GAS not applied
E1 — Low elasticity	Interactive inference, sub-second response SLAs	Up to 5 minutes	Deferred only during extreme grid stress; short window
E2 — Moderate elasticity	Batch inference, evaluation runs, scheduled reports	Up to 2 hours	Scheduled in renewable windows within SLA window
E3 — High elasticity	Training jobs, large-scale indexing, offline analytics	Up to 24 hours	Fully GAS-optimized; scheduled at peak renewable, minimum carbon

Utility coordination models. Data centers participating in demand response programs can receive financial signals from grid operators — typically $50-300/MWh during peak events — for reducing load. E2 and E3 workloads are the natural candidates for demand response participation. The policy memo produced by this project should model the revenue potential of demand response participation alongside the carbon and peak-reduction benefits.

Renewable forecasting. GAS scheduling benefits from 4-6 hour renewable generation forecasts — the horizon over which E2 workloads can be shifted. NOAA provides free solar irradiance forecasts; wind generation forecasts are available from ERCOT and CAISO APIs. Integrating forecast data into the scheduler allows preemptive shift toward upcoming renewable windows rather than reactive response to current conditions.

IV.5 Economic Modeling

Industry adoption of grid-aware scheduling depends on demonstrating operational value in terms that data center operators understand. The following economic model provides a framework for the simulator to validate.

Energy cost savings. Shifting E2/E3 workloads from peak pricing periods (typically $0.12–0.18/kWh) to off-peak renewable windows (typically $0.04–0.08/kWh in markets with high renewable penetration) produces direct cost savings. For a 100MW data center with 40% flexible workload: shifting 40MW × 8 hours/day from peak to off-peak at a $0.07/kWh differential produces approximately $112,000/day in energy cost savings, or approximately $40M/year. This is a first-order estimate requiring location-specific validation.

Demand response revenue. FERC Order 2222 and state-level demand response programs compensate large flexible loads for reducing consumption during grid stress events. Compensation rates range from $50–300/MWh depending on program and region. A data center participating in demand response for 200 hours/year at 50MW reduction could receive $500,000–3,000,000/year in demand response payments. The policy memo produced by this project should model this revenue stream as an incentive for voluntary adoption.

Carbon pricing exposure. As carbon pricing extends to data center operations in various jurisdictions, the carbon intensity of compute timing becomes a financial liability. Grid-aware scheduling that increases renewable alignment by 15 percentage points reduces carbon intensity proportionally — a hedge against future carbon pricing that can be valued in present terms under any reasonable carbon price forecast.

IV.6 Grid Instability Edge Cases

The GAS scheduling model assumes relatively stable grid orchestration conditions. Several edge cases challenge this assumption and must be addressed in the simulator design.

Rolling blackouts and emergency curtailment. During grid emergencies, operators may issue emergency curtailment orders requiring rapid load reduction across all categories including E0 (inelastic) workloads. The GAS system must include an emergency override that suspends all scheduling optimization and routes all available load reduction capability to grid support, including sacrificing in-progress non-safety-critical workloads.

Renewable generation collapse. Rapid loss of renewable generation (cloud cover over a solar-dominated grid, wind lull in a wind-dominated region) can produce sudden price spikes and carbon intensity jumps within minutes. The GAS system's renewable forecasting should incorporate weather API integration with short-horizon (1–2 hour) updates, not just day-ahead forecasts.

Inconsistent telemetry. Grid data APIs occasionally provide inconsistent or delayed signals — a region may report high renewable availability while actually drawing heavily from backup fossil generation. The scheduler should treat anomalous readings (renewable fraction > 95%, demand < 30% of typical baseline) as potentially erroneous and fall back to conservative scheduling until the signal normalizes.

Correlated failure cascades. Grid-aware scheduling deployed across many data centers in a region could create correlated behavior — all systems simultaneously deferring workloads when the grid is stressed and simultaneously resuming when conditions improve, potentially creating new demand spikes at the recovery moment. The policy memo should address this coordination risk and propose diversity mechanisms (randomized start times, staggered resumption windows) to prevent correlated cascades.

IV.7 Deployment Pathways

Grid-aware compute scheduling is most practically adopted through a staged deployment pathway that begins with the lowest-risk workloads and builds institutional familiarity before extending to more time-sensitive compute.

Phase 1 — Non-critical background workloads. Logging pipelines, analytics jobs, model evaluation runs, and other workloads with no user-facing latency requirements and multi-day tolerance windows. Zero SLA risk. Immediate renewable alignment benefit. This phase requires only the GAS scoring implementation and a basic job queue — no SLA monitoring, no utility coordination, no emergency overrides. Start here.

Phase 2 — Batch compute scheduling. Training jobs, large-scale inference batches, and indexing workloads with 4–24 hour tolerance windows. Requires elasticity classification, basic SLA monitoring, and integration with renewable forecasting APIs. Demand response participation becomes valuable at this phase.

Phase 3 — Geographically elastic AI training. Training workloads that can be distributed across data centers in different grid regions, routing compute toward regions with highest renewable availability within acceptable latency and data residency constraints. Requires multi-site orchestration, regulatory compliance review for data residency, and coordination agreements with other sites.

Phase 4 — Coordinated grid balancing. Formal demand response participation, utility coordination agreements, and coordinated scheduling across multiple data centers to provide grid stability services. Requires regulatory engagement, metering infrastructure, contractual frameworks, and governance structures for emergency override. This phase should not be attempted without the institutional infrastructure established in phases 1–3.

Known Limitations

This section follows the Foundation's institutional practice of explicitly stating known weaknesses, failure modes, and scope boundaries for every proposal. Its presence indicates analytical maturity, not weakness in the underlying proposal.

Grid signal latency. Public grid data APIs have update latencies of 5–60 minutes. Scheduling decisions based on stale grid signals may be suboptimal — deferring workloads to a window that has already deteriorated before the workload executes.

Correlated behavior risk. If grid-aware scheduling is adopted simultaneously by many large data centers in a region, correlated deferral and resumption behavior could create new demand spikes at the recovery moment, partially negating the grid stability benefit.

SLA degradation under aggressive deferral. Elasticity classifications are proposed defaults. Operators may misclassify workloads as more elastic than they are to maximize renewable alignment, creating latent SLA risk that surfaces when deferred workloads approach their tolerance limits simultaneously.

Renewable forecast uncertainty. Wind forecasting at sub-4-hour horizons has higher uncertainty than solar. Over-reliance on inaccurate forecasts may defer workloads to windows that do not materialize as renewable-rich.

What This Paper Does Not Claim

That the ~$40M/year energy cost savings estimate applies to any specific data center — it is a first-order illustrative calculation requiring location-specific validation
That grid-aware scheduling eliminates carbon emissions — it reduces carbon intensity of flexible workloads; inelastic workloads remain unaffected
That the approach substitutes for physical infrastructure decarbonization — it is a scheduling optimization layered on existing infrastructure
That utility coordination and demand response participation are straightforward — they require regulatory engagement and contractual arrangements not described in this proposal

Non-Adoption Scenario

Without grid-aware scheduling, AI compute infrastructure operates as a passive and price-insensitive grid actor. As AI compute's share of total grid demand grows, this passivity becomes a stability liability: training clusters starting simultaneously produce sharp demand ramps; inference infrastructure scales with internet traffic peaks coinciding with residential demand; and data centers in aggregate contribute to peak-demand problems rather than serving as the flexible load resource their technical characteristics would allow.

Open Questions

What grid signal latency is acceptable before scheduling decisions become counterproductive? How should elasticity classifications be validated and audited to prevent systematic misclassification? What regulatory frameworks are required for demand response participation, and which jurisdictions currently have them? How should the correlated-behavior risk be managed when grid-aware scheduling is deployed at sector scale?

Governance Implications

Grid-aware scheduling participating in utility demand response programs creates contractual obligations, metering requirements, and performance verification needs. Governance frameworks must specify: who authorizes demand response participation; what workloads are eligible for deferral; how SLA commitments interact with demand response obligations; and how scheduling decisions are logged for regulatory compliance.

References and Related Work

US EIA (2024). Electric Power Monthly. eia.gov. · FERC Order 2222 (2020). Participation of Distributed Energy Resource Aggregations in Markets. · Qureshi, A. et al. (2009). Cutting the Electric Bill for Internet-Scale Systems. SIGCOMM. · Goiri, I. et al. (2012). GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks. EuroSys.

V. Falsifiability

✗E2/E3 workload volume below 30% of total compute — the schedulable fraction is too small for grid-aware scheduling to produce meaningful grid-level impact.

✗Renewable alignment improvement below 8 percentage points versus FIFO — insufficient to justify the scheduling complexity for most operators.

✗SLA violation rate for E1/E2 workloads exceeding 2% under grid-aware scheduling — the throughput cost is unacceptable for production deployment.

Open Source Contribution Invitation

Build the grid-load simulator using EIA and ElectricityMaps public APIs. Implement both scheduling policies with configurable job mix and latency tolerance parameters. Produce charts for all five outcome metrics. Write a 2-page policy memo suitable for submission to FERC, state PUCs, or major cloud providers, framing grid-aware AI scheduling as a demand-response participation model with grid stability benefits. Package as github.com/emfoundation/grid-aware-compute-scheduling.

Contact: research@emfoundation.net