Lesson 7 — Risk management, hedging & deployment

Final crypto lesson: implement hedging/green‑up, enforce risk limits and circuit breakers, make order flows idempotent and observable, and package/deploy your bot safely (Docker / VPS) with monitoring and runbooks.

Estimated time: 60–90 minutes • Skill level: Intermediate

Use this as a starting point. Don’t commit secrets.

Prerequisites

Lessons 1–6 completed (env, auth, market data, orders, webhooks, backtesting)
Basic Linux and Docker familiarity
Monitoring/alerting basics (Prometheus/Grafana, or cloud metrics)

Overview

This lesson covers:

Green‑up hedging strategies to limit directional exposure
Risk limits, circuit breakers and kill-switches
Idempotent deployment & tracing for order flows
Packaging with Docker and simple VPS deployment
Monitoring, alerts and runbooks

1) Green‑up hedging patterns

Green‑up hedging aims to neutralize P&L or limit downside by placing offsetting positions. In crypto this might mean placing an opposite order on the same market (if allowed) or hedging on a correlated instrument.

# simple green-up pseudo-code
# position: {'symbol':'BTC/USDT','size': 0.5, 'avg_price': 40000.0}  (BACK-like exposure)
# goal: reduce exposure to target_pnl or neutralize delta via opposite orders

def compute_hedge_size(position, current_price, target_pnl=0.0, fee_rate=0.001):
    # naive liability calculation for a long position
    # current_exposure = position['size'] * current_price
    # desired_exposure = position['size'] - hedge_size => solve for hedge_size to reach target_pnl
    # simplified: hedge_size = position['size']  # full hedge
    return position['size']

# place hedge (example uses market opposite order)
def place_hedge(exchange, position, hedge_size):
    # for a long position, place a market SELL for hedge_size
    return exchange.create_market_sell_order(position['symbol'], hedge_size)

Practical note: Real hedging needs precise liability calculations (esp. derivatives), commission, funding costs and latency considerations. Test hedges in sandbox thoroughly.

2) Risk limits, circuit breakers & kill-switch

Implement checks that run before placing any order (pre-trade), and global monitors (post-trade) that can pause or stop trading when limits are hit.

# pre_trade_checks.py
def pre_trade_checks(account_state, trade_request):
    # account_state: {'balance', 'daily_pnl', 'open_exposure', 'max_trade_size', 'max_daily_loss'}
    if trade_request['size'] > account_state['max_trade_size']:
        raise Exception('Trade size exceeds limit')
    if account_state['daily_pnl'] <= -abs(account_state['max_daily_loss']):
        raise Exception('Daily loss limit reached')
    # check open exposure cap
    if account_state['open_exposure'] + trade_request['size'] * trade_request.get('price',0) > account_state.get('max_exposure'):
        raise Exception('Exposure limit exceeded')
    return True

# emergency kill switch (operational)
KILL_SWITCH_FILE = '/tmp/trading_kill_switch'
def is_killed():
    import os
    return os.path.exists(KILL_SWITCH_FILE)

def set_kill_switch(on=True):
    import os
    if on: open(KILL_SWITCH_FILE,'w').close()
    else:
        try: os.remove(KILL_SWITCH_FILE)
        except: pass

Ops tip: Expose a protected admin endpoint to toggle kill-switch, but require MFA and logging. The kill-switch should immediately stop workers and optionally cancel open orders.

3) Idempotency, tracing & logging

Use client-side idempotency tokens (client_ref) for alerts/orders, and correlate traces across services (webhook → queue → worker → orders). Store structured logs for reconciliation.

# logging structure (JSON)
{
  "timestamp": "2025-11-11T09:00:00Z",
  "service": "order-worker",
  "trace_id": "abcd-1234",
  "client_ref": "tv_1690000000_1",
  "action": "place_order",
  "request": { "symbol":"BTC/USDT","side":"BUY","size":0.001, "price":42000 },
  "response": { "order_id":"12345", "status":"FILLED", "filled":0.001 },
  "duration_ms": 412
}

Reconciliation: schedule a job that fetches remote order/trade history and matches by client_ref or order_id; flag mismatches for manual review.

4) Packaging & deployment (Docker + VPS)

Containerize your bot and deploy to a small VPS or managed container service. Keep secrets out of images (use env vars / vault).

# Dockerfile (example)
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --upgrade pip && pip install -r requirements.txt
COPY . /app
ENV PYTHONUNBUFFERED=1
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "webhook:app"]

VPS quick deploy:

# on Ubuntu VPS
sudo apt update && sudo apt install -y docker.io docker-compose
# copy docker-compose.yml and env file, then:
docker-compose up -d --build

Use a secrets manager (HashiCorp Vault, AWS Secrets Manager) or Docker secrets for production. Do not store API keys in repo.

5) Monitoring, alerts & runbooks

Monitor essential signals and have ready runbooks for incidents.

Metrics: order success/failure rate, worker queue length, latency percentiles for market-book & place-order calls, daily P&L
Alerts: high failure rate, kill-switch triggered, daily loss threshold breached, webhook error spike
Runbooks: steps to disable trading, cancel open orders, rotate API keys, and recover from partial reconciliation

# example Prometheus metrics exposition (Flask)
from prometheus_client import Counter, Histogram, make_wsgi_app
REQUESTS = Counter('requests_total', 'Total requests', ['endpoint','status'])
LATENCY = Histogram('request_latency_seconds', 'Latency', ['endpoint'])
# expose /metrics via WSGI or sidecar

6) Operational checklist before live

All tests green in sandbox, full end-to-end replayed for sample period
Rate limits and backoff in place; monitored
Kill-switch, circuit breakers and daily P&L protections implemented
Key rotation plan and least-privilege API keys
Logging/alerts configured and an on-call person notified

Wrap up and next steps

You’ve completed the Crypto track. Recommended next steps: run a staged deployment on a small VPS with testnet keys, add synthetic load tests and schedule periodic reconciliation and chaos tests for kill-switch behavior.

Back to course hub

Quick feedback:

← Previous Course hub →

Lesson 7 — Risk management & deployment