|
| 1 | +--- |
| 2 | +name: golem-quota-moonbit |
| 3 | +description: "Adding resource quotas to a MoonBit Golem agent. Use when the user asks about rate limiting, resource quotas, quota tokens, QuotaToken, with_reservation, throttling API calls, limiting concurrency, capacity limits, or splitting tokens between agents." |
| 4 | +--- |
| 5 | + |
| 6 | +# Adding Resource Quotas to an Agent (MoonBit) |
| 7 | + |
| 8 | +Golem provides a distributed resource quota system via the `@quota` module. Quotas let you define limited resources (API call rates, storage capacity, connection concurrency) and enforce consumption limits across all agents in a deployment. |
| 9 | + |
| 10 | +## 1. Define Resources in the Application Manifest |
| 11 | + |
| 12 | +Add resource definitions under `resourceDefaults` in `golem.yaml`, scoped per environment: |
| 13 | + |
| 14 | +```yaml |
| 15 | +resourceDefaults: |
| 16 | + prod: |
| 17 | + api-calls: |
| 18 | + limit: |
| 19 | + type: Rate |
| 20 | + value: 100 |
| 21 | + period: minute |
| 22 | + max: 1000 |
| 23 | + enforcementAction: reject |
| 24 | + unit: request |
| 25 | + units: requests |
| 26 | + storage: |
| 27 | + limit: |
| 28 | + type: Capacity |
| 29 | + value: 1073741824 # 1 GB |
| 30 | + enforcementAction: reject |
| 31 | + unit: byte |
| 32 | + units: bytes |
| 33 | + connections: |
| 34 | + limit: |
| 35 | + type: Concurrency |
| 36 | + value: 50 |
| 37 | + enforcementAction: throttle |
| 38 | + unit: connection |
| 39 | + units: connections |
| 40 | +``` |
| 41 | +
|
| 42 | +### Limit Types |
| 43 | +
|
| 44 | +- **`Rate`** — refills `value` tokens every `period` (second/minute/hour/day), capped at `max`. Use for rate-limiting API calls. |
| 45 | +- **`Capacity`** — fixed pool of `value` tokens. Once consumed, never refilled. Use for storage budgets. |
| 46 | +- **`Concurrency`** — pool of `value` tokens returned when released. Use for limiting parallel connections. |
| 47 | + |
| 48 | +### Enforcement Actions |
| 49 | + |
| 50 | +- **`reject`** — returns `Err(FailedReservation)`. The agent must handle the error. |
| 51 | +- **`throttle`** — Golem suspends the agent until capacity is available. Fully automatic, no code needed. |
| 52 | +- **`terminate`** — kills the agent with a failure message. |
| 53 | + |
| 54 | +## 2. Acquire a QuotaToken |
| 55 | + |
| 56 | +Acquire a `QuotaToken` once per resource, typically in the agent constructor: |
| 57 | + |
| 58 | +```moonbit |
| 59 | +let token = @quota.QuotaToken::new("api-calls", 1UL) |
| 60 | +``` |
| 61 | + |
| 62 | +The second parameter is the **expected amount per reservation** (`UInt64`), used for fair scheduling. For simple 1-call = 1-token rate limiting, use `1UL`. |
| 63 | + |
| 64 | +## 3. Simple Rate Limiting with `with_reservation` |
| 65 | + |
| 66 | +Use `@quota.with_reservation` to reserve tokens, run code, and commit actual usage: |
| 67 | + |
| 68 | +```moonbit |
| 69 | +let result = @quota.with_reservation(token, 1UL, fn(reservation) { |
| 70 | + let response = call_simple_api() |
| 71 | + (1UL, response) |
| 72 | +}) |
| 73 | +``` |
| 74 | + |
| 75 | +The callback returns `(UInt64, T)` where the first element is actual usage. If actual < reserved, unused capacity returns to the pool. |
| 76 | + |
| 77 | +## 4. Variable-Cost Reservations (e.g., LLM Tokens) |
| 78 | + |
| 79 | +Reserve the maximum expected cost, then commit actual usage: |
| 80 | + |
| 81 | +```moonbit |
| 82 | +let result = @quota.with_reservation(token, 4000UL, fn(reservation) { |
| 83 | + let response = call_llm(prompt, max_tokens=4000) |
| 84 | + (response.tokens_used, response) |
| 85 | +}) |
| 86 | +``` |
| 87 | + |
| 88 | +## 5. Manual Reserve / Commit |
| 89 | + |
| 90 | +For finer control, use `reserve` and `commit` directly: |
| 91 | + |
| 92 | +```moonbit |
| 93 | +match token.reserve(100UL) { |
| 94 | + Ok(reservation) => { |
| 95 | + let result = do_work() |
| 96 | + reservation.commit(result.actual_usage) |
| 97 | + } |
| 98 | + Err(failed) => @log.warn("Quota unavailable") |
| 99 | +} |
| 100 | +``` |
| 101 | + |
| 102 | +## 6. Splitting Tokens for Agent-to-Agent RPC |
| 103 | + |
| 104 | +Split a portion of your quota to pass to a child agent: |
| 105 | + |
| 106 | +```moonbit |
| 107 | +let child_token = self.token.split(200UL) |
| 108 | +let child_agent = SummarizerAgent::new_phantom() |
| 109 | +child_agent.summarize(text, child_token) |
| 110 | +``` |
| 111 | + |
| 112 | +The child agent receives the `QuotaToken` as a method parameter and uses it for its own reservations. Merge returned tokens back: |
| 113 | + |
| 114 | +```moonbit |
| 115 | +token.merge(returned_token) |
| 116 | +``` |
| 117 | + |
| 118 | +## 7. Dynamic Resource Updates via CLI |
| 119 | + |
| 120 | +Modify resource limits at runtime — changes affect running agents immediately: |
| 121 | + |
| 122 | +```shell |
| 123 | +golem resource update api-calls --limit '{"type":"rate","value":200,"period":"minute","max":2000}' --environment prod |
| 124 | +``` |
| 125 | + |
| 126 | +## Key Constraints |
| 127 | + |
| 128 | +- Acquire `QuotaToken` once and reuse — do not create a new one per call |
| 129 | +- All quota amounts are `UInt64` values (use `1UL`, `200UL`, etc.) |
| 130 | +- `split` traps if `child_expected_use` exceeds the parent's current expected-use |
| 131 | +- `merge` traps if the tokens refer to different resources |
| 132 | +- `with_reservation` returns `Result[T, FailedReservation]` — `Err` only for `reject` enforcement; `throttle` suspends transparently |
| 133 | +- Resource names in code must match the names in `golem.yaml` `resourceDefaults` |
0 commit comments