Перейти к содержанию

Health & Monitoring Endpoints

Аудитория: разработчики, DevOps инженеры, мониторинговые системы Последнее обновление: 2025-11-17 Краткое содержание: Полная документация Health & Monitoring endpoints — comprehensive health checks, component-specific monitoring, Kubernetes probes, resource usage tracking. Детальное руководство по интеграции для мониторинга и observability систем.


Обзор Endpoints

Метод Endpoint Описание Auth Required Status Codes
GET /api/ping Простой pong ответ ❌ Нет 200
GET /api/health Общий health check системы ❌ Нет 200, 503
GET /api/health/detailed Детальная информация всех компонентов ❌ Нет 200, 503
GET /api/version Версия системы и build info ❌ Нет 200
GET /api/health/database Database health check ❌ Нет 200, 503
GET /api/health/blockchain Blockchain health check ❌ Нет 200, 503
GET /api/health/business Business logic health check ❌ Нет 200, 503
GET /api/health/resources System resource usage ❌ Нет 200
GET /api/health/ready Kubernetes readiness probe ❌ Нет 200, 503
GET /api/health/live Kubernetes liveness probe ❌ Нет 200, 503

Архитектурные принципы:

  • Public Endpoints: Все health endpoints не требуют аутентификации
  • Status Code Based: HTTP status code отражает health status (200 = OK, 503 = Unhealthy)
  • Comprehensive Service: Использует HealthService для координации проверок
  • Context Timeouts: Все проверки имеют configurable timeout (GetHealthCheckTimeout, GetQuickHealthTimeout)
  • Component Isolation: Отдельные endpoints для database, blockchain, business logic

📡 GET /api/ping

Простой pong endpoint для базовой проверки доступности API.

Запрос

GET /api/ping HTTP/1.1
Host: app.saga.surf
Accept: application/json

Аутентификация не требуется.

Query параметры: Нет

Ответ

Успех (200 OK):

{
  "success": true,
  "data": {
    "status": "ok",
    "success": true,
    "message": "pong",
    "timestamp": "2025-10-06T12:00:00Z"
  },
  "message": "Service is responding",
  "timestamp": "2025-10-06T12:00:00Z",
  "traceId": "req-abc123"
}

Поля ответа:

Поле Тип Описание
data.status string Всегда "ok"
data.success boolean Всегда true
data.message string Всегда "pong"
data.timestamp string Текущее время сервера (ISO 8601)

Коды статуса:

Код статуса Описание
200 API доступен

Пример cURL

# Простой ping
curl -X GET http://localhost:8080/api/ping

# С Accept header
curl -X GET http://localhost:8080/api/ping \
  -H "Accept: application/json"

# Production endpoint
curl -X GET https://app.saga.surf/api/ping

Пример TypeScript

Базовая функция:

interface PingResponse {
  status: string;
  success: boolean;
  message: string;
  timestamp: string;
}

async function checkAPIAvailability(): Promise<boolean> {
  try {
    const response = await fetch('/api/ping', {
      method: 'GET',
      headers: { 'Accept': 'application/json' }
    });

    if (!response.ok) {
      return false;
    }

    const { data } = await response.json();
    return data.success && data.status === 'ok';
  } catch (error) {
    console.error('Ping failed:', error);
    return false;
  }
}

// Usage
const isAvailable = await checkAPIAvailability();
console.log(`API available: ${isAvailable}`);

React Component:

import { useState, useEffect } from 'react';

function APIPingIndicator() {
  const [available, setAvailable] = useState<boolean | null>(null);

  useEffect(() => {
    checkAvailability();
    const interval = setInterval(checkAvailability, 30000); // Every 30 seconds
    return () => clearInterval(interval);
  }, []);

  async function checkAvailability() {
    const isAvailable = await checkAPIAvailability();
    setAvailable(isAvailable);
  }

  if (available === null) return <div>Checking...</div>;

  return (
    <div className={`ping-indicator ${available ? 'online' : 'offline'}`}>
      <span className="status-dot"></span>
      <span className="status-text">
        {available ? '✅ API Online' : '❌ API Offline'}
      </span>
    </div>
  );
}

🏥 GET /api/health

Общий health check системы с информацией о количестве здоровых компонентов.

Запрос

GET /api/health HTTP/1.1
Host: app.saga.surf
Accept: application/json

Аутентификация не требуется.

Query параметры: Нет

Context Timeout: GetHealthCheckTimeout() (configurable, default safe value)

Ответ

Успех - Healthy (200 OK):

{
  "success": true,
  "data": {
    "status": "healthy",
    "message": "3/3 components healthy",
    "timestamp": "2025-10-06T12:00:00Z",
    "uptime": "2h15m30s",
    "version": "1.0.0",
    "healthy": true,
    "componentCount": 3
  },
  "message": "Health check completed",
  "timestamp": "2025-10-06T12:00:00Z",
  "systemHealthy": true
}

Успех - Degraded (200 OK):

{
  "success": false,
  "data": {
    "status": "degraded",
    "message": "2/3 components healthy",
    "timestamp": "2025-10-06T12:00:00Z",
    "uptime": "2h15m30s",
    "version": "1.0.0",
    "healthy": false,
    "componentCount": 3
  },
  "message": "Health check completed",
  "timestamp": "2025-10-06T12:00:00Z",
  "systemHealthy": false
}

Unhealthy (503 Service Unavailable):

{
  "success": false,
  "data": {
    "status": "unhealthy",
    "message": "0/3 components healthy",
    "timestamp": "2025-10-06T12:00:00Z",
    "uptime": "2h15m30s",
    "version": "1.0.0",
    "healthy": false,
    "componentCount": 3
  },
  "message": "Health check completed",
  "timestamp": "2025-10-06T12:00:00Z",
  "systemHealthy": false
}

Поля ответа:

Поле Тип Описание
data.status string Status: healthy, degraded, unhealthy, unknown
data.message string Краткое описание (e.g., "3/3 components healthy")
data.timestamp string Время проверки (ISO 8601)
data.uptime string Uptime системы (duration string)
data.version string Версия системы
data.healthy boolean true если status = healthy или degraded
data.componentCount number Количество проверенных компонентов
systemHealthy boolean Дублирует data.healthy для удобства

Коды статуса:

Код статуса Описание
200 Система здорова или деградирована
503 Система нездорова

Пример cURL

# Базовый health check
curl -X GET http://localhost:8080/api/health

# Проверка с обработкой status code
curl -X GET http://localhost:8080/api/health \
  -w "\nHTTP Status: %{http_code}\n"

# Production endpoint
curl -X GET https://app.saga.surf/api/health | jq .

Пример TypeScript

Базовая функция:

interface HealthResponse {
  status: 'healthy' | 'degraded' | 'unhealthy' | 'unknown';
  message: string;
  timestamp: string;
  uptime: string;
  version: string;
  healthy: boolean;
  componentCount: number;
}

async function checkSystemHealth(): Promise<HealthResponse> {
  const response = await fetch('/api/health', {
    method: 'GET',
    headers: { 'Accept': 'application/json' }
  });

  // Note: 503 is expected for unhealthy state
  const { data } = await response.json();
  return data;
}

// Usage
const health = await checkSystemHealth();
console.log(`System status: ${health.status}`);
console.log(`Components: ${health.message}`);
console.log(`Healthy: ${health.healthy ? 'Yes' : 'No'}`);

React Component:

import { useState, useEffect } from 'react';

function SystemHealthBadge() {
  const [health, setHealth] = useState<HealthResponse | null>(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetchHealth();
    const interval = setInterval(fetchHealth, 60000); // Every 60 seconds
    return () => clearInterval(interval);
  }, []);

  async function fetchHealth() {
    try {
      const healthData = await checkSystemHealth();
      setHealth(healthData);
    } catch (error) {
      console.error('Health check failed:', error);
    } finally {
      setLoading(false);
    }
  }

  if (loading) return <div>Loading...</div>;
  if (!health) return <div className="health-badge error">⚠️ Unknown</div>;

  const badgeClass = health.status === 'healthy' ? 'healthy' :
                     health.status === 'degraded' ? 'degraded' : 'unhealthy';

  const icon = health.status === 'healthy' ? '✅' :
               health.status === 'degraded' ? '⚠️' : '❌';

  return (
    <div className={`health-badge ${badgeClass}`}>
      <span className="icon">{icon}</span>
      <div className="details">
        <div className="status">{health.status.toUpperCase()}</div>
        <div className="message">{health.message}</div>
        <div className="uptime">Uptime: {health.uptime}</div>
      </div>
    </div>
  );
}

SWR Integration:

import useSWR from 'swr';

function useSystemHealth(refreshInterval = 60000) {
  const { data, error, mutate } = useSWR(
    '/api/health',
    checkSystemHealth,
    { refreshInterval }
  );

  return {
    health: data,
    isLoading: !error && !data,
    isError: error,
    refresh: mutate
  };
}

// Usage
function HealthDashboard() {
  const { health, isLoading, isError, refresh } = useSystemHealth();

  if (isLoading) return <div>Loading health status...</div>;
  if (isError) return <div>Error loading health status</div>;

  return (
    <div className="health-dashboard">
      <h2>System Health</h2>
      <SystemHealthBadge />
      <button onClick={refresh}>Refresh Now</button>
    </div>
  );
}

GET /api/health/detailed

Детальная информация о здоровье всех компонентов системы с latency и status каждого компонента.

Запрос

GET /api/health/detailed HTTP/1.1
Host: app.saga.surf
Accept: application/json

Аутентификация не требуется.

Query параметры: Нет

Context Timeout: GetComprehensiveHealthTimeout() (configurable, default safe value)

Ответ

Успех (200 OK):

{
  "success": true,
  "data": {
    "overallStatus": "healthy",
    "components": [
      {
        "component": "database",
        "status": "healthy",
        "message": "Database connection successful",
        "latency": 5,
        "details": {
          "activeConnections": "3",
          "maxConnections": "20"
        },
        "lastChecked": "2025-10-06T12:00:00Z"
      },
      {
        "component": "blockchain",
        "status": "healthy",
        "message": "Blockchain node operational",
        "latency": 150,
        "details": {
          "blockNumber": "12345678",
          "networkID": "1337"
        },
        "lastChecked": "2025-10-06T12:00:00Z"
      }
    ],
    "checkedAt": "2025-10-06T12:00:00Z",
    "message": "System health: 3 healthy, 0 degraded, 0 unhealthy"
  },
  "message": "Detailed health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Degraded (200 OK):

{
  "success": false,
  "data": {
    "overallStatus": "degraded",
    "components": [
      {
        "component": "database",
        "status": "healthy",
        "latency": 5,
        "lastChecked": "2025-10-06T12:00:00Z"
      },
      {
        "component": "blockchain",
        "status": "degraded",
        "message": "High latency detected",
        "latency": 2500,
        "details": {
          "blockNumber": "12345678",
          "latencyThreshold": "2000ms"
        },
        "lastChecked": "2025-10-06T12:00:00Z"
      }
    ],
    "checkedAt": "2025-10-06T12:00:00Z",
    "message": "System health: 1 healthy, 1 degraded, 0 unhealthy"
  },
  "message": "Detailed health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Unhealthy (503 Service Unavailable):

{
  "success": false,
  "data": {
    "overallStatus": "unhealthy",
    "components": [
      {
        "component": "database",
        "status": "unhealthy",
        "message": "Database connection failed",
        "latency": 5000,
        "details": {
          "error": "connection timeout"
        },
        "lastChecked": "2025-10-06T12:00:00Z"
      }
    ],
    "checkedAt": "2025-10-06T12:00:00Z",
    "message": "System health: 0 healthy, 0 degraded, 1 unhealthy"
  },
  "message": "Detailed health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Поля ответа:

Поле Тип Описание
data.overallStatus string Общий статус: healthy, degraded, unhealthy
data.components array Массив проверок компонентов
data.components[].component string Имя компонента (database, blockchain)
data.components[].status string Статус компонента: healthy, degraded, unhealthy
data.components[].message string Описание статуса
data.components[].latency number Latency проверки в миллисекундах
data.components[].details object Дополнительная информация (key-value pairs)
data.components[].lastChecked string Время последней проверки (ISO 8601)
data.checkedAt string Время текущей проверки (ISO 8601)
data.message string Сводка статусов (e.g., "System health: 3 healthy, 0 degraded, 0 unhealthy")

Коды статуса:

Код статуса Описание
200 Система здорова или деградирована
503 Система нездорова (критические компоненты недоступны)

Пример cURL

# Детальный health check
curl -X GET http://localhost:8080/api/health/detailed

# Pretty print с jq
curl -X GET http://localhost:8080/api/health/detailed | jq .

# Фильтр только unhealthy компонентов
curl -X GET http://localhost:8080/api/health/detailed | \
  jq '.data.components[] | select(.status != "healthy")'

Пример TypeScript

Базовая функция:

interface ComponentHealth {
  component: string;
  status: 'healthy' | 'degraded' | 'unhealthy';
  message?: string;
  latency: number;
  details?: Record<string, string>;
  lastChecked: string;
}

interface DetailedHealthResponse {
  overallStatus: 'healthy' | 'degraded' | 'unhealthy';
  components: ComponentHealth[];
  checkedAt: string;
  message: string;
}

async function checkDetailedHealth(): Promise<DetailedHealthResponse> {
  const response = await fetch('/api/health/detailed', {
    method: 'GET',
    headers: { 'Accept': 'application/json' }
  });

  const { data } = await response.json();
  return data;
}

// Usage
const detailed = await checkDetailedHealth();
console.log(`Overall: ${detailed.overallStatus}`);
detailed.components.forEach(comp => {
  console.log(`  ${comp.component}: ${comp.status} (${comp.latency}ms)`);
});

React Component:

import { useState, useEffect } from 'react';

function DetailedHealthDashboard() {
  const [health, setHealth] = useState<DetailedHealthResponse | null>(null);

  useEffect(() => {
    fetchHealth();
    const interval = setInterval(fetchHealth, 30000); // Every 30 seconds
    return () => clearInterval(interval);
  }, []);

  async function fetchHealth() {
    try {
      const healthData = await checkDetailedHealth();
      setHealth(healthData);
    } catch (error) {
      console.error('Detailed health check failed:', error);
    }
  }

  if (!health) return <div>Loading...</div>;

  return (
    <div className="detailed-health-dashboard">
      <h2>System Health</h2>
      <div className={`overall-status ${health.overallStatus}`}>
        {health.overallStatus.toUpperCase()}
      </div>
      <p>{health.message}</p>

      <h3>Components</h3>
      <div className="components-grid">
        {health.components.map(comp => (
          <div key={comp.component} className={`component-card ${comp.status}`}>
            <h4>{comp.component}</h4>
            <div className="status">{comp.status}</div>
            {comp.message && <p className="message">{comp.message}</p>}
            <div className="latency">Latency: {comp.latency}ms</div>

            {comp.details && (
              <div className="details">
                {Object.entries(comp.details).map(([key, value]) => (
                  <div key={key} className="detail-item">
                    <span className="key">{key}:</span>
                    <span className="value">{value}</span>
                  </div>
                ))}
              </div>
            )}

            <div className="last-checked">
              Checked: {new Date(comp.lastChecked).toLocaleTimeString()}
            </div>
          </div>
        ))}
      </div>
    </div>
  );
}

Alerting Integration:

interface HealthAlert {
  component: string;
  severity: 'warning' | 'critical';
  message: string;
  latency: number;
}

function analyzeHealthIssues(health: DetailedHealthResponse): HealthAlert[] {
  const alerts: HealthAlert[] = [];

  health.components.forEach(comp => {
    // Unhealthy components
    if (comp.status === 'unhealthy') {
      alerts.push({
        component: comp.component,
        severity: 'critical',
        message: comp.message || 'Component unhealthy',
        latency: comp.latency
      });
    }

    // Degraded components
    if (comp.status === 'degraded') {
      alerts.push({
        component: comp.component,
        severity: 'warning',
        message: comp.message || 'Component degraded',
        latency: comp.latency
      });
    }

    // High latency (>1000ms)
    if (comp.latency > 1000) {
      alerts.push({
        component: comp.component,
        severity: 'warning',
        message: `High latency detected: ${comp.latency}ms`,
        latency: comp.latency
      });
    }
  });

  return alerts;
}

// Usage
const health = await checkDetailedHealth();
const alerts = analyzeHealthIssues(health);

if (alerts.length > 0) {
  console.warn(`⚠️ ${alerts.length} health issues detected:`);
  alerts.forEach(alert => {
    console.warn(`  ${alert.severity.toUpperCase()}: ${alert.component} - ${alert.message}`);
  });
}

GET /api/version

Получить информацию о версии системы, build metadata и environment.

Запрос

GET /api/version HTTP/1.1
Host: app.saga.surf
Accept: application/json

Аутентификация не требуется.

Query параметры: Нет

Ответ

Успех (200 OK):

{
  "success": true,
  "data": {
    "version": "2.1.536",
    "service": "saga-backend",
    "timestamp": "2025-10-06T12:00:00Z",
    "buildInfo": {
      "version": "2.1.536",
      "environment": "development",
      "goVersion": "go1.21.5",
      "buildTime": "build-time-placeholder",
      "gitCommit": "git-commit-placeholder"
    }
  },
  "message": "Version information retrieved",
  "timestamp": "2025-10-06T12:00:00Z",
  "traceId": "req-abc123"
}

Unknown Version (200 OK):

{
  "success": true,
  "data": {
    "version": "unknown",
    "service": "saga-backend",
    "timestamp": "2025-10-06T12:00:00Z",
    "buildInfo": {
      "version": "unknown",
      "environment": "development",
      "goVersion": "go1.21.5",
      "buildTime": "build-time-placeholder",
      "gitCommit": "git-commit-placeholder"
    }
  },
  "message": "Version information retrieved",
  "timestamp": "2025-10-06T12:00:00Z"
}

Поля ответа:

Поле Тип Описание
data.version string Semantic version (MAJOR.MINOR.PATCH) или "unknown"
data.service string Имя сервиса (из GetServiceName())
data.timestamp string Текущее время (ISO 8601)
data.buildInfo.version string Версия (дублирует data.version)
data.buildInfo.environment string Environment (development, staging, production)
data.buildInfo.goVersion string Версия Go runtime (e.g., "go1.21.5")
data.buildInfo.buildTime string Время сборки (может быть placeholder)
data.buildInfo.gitCommit string Git commit hash (может быть placeholder)

Коды статуса:

Код статуса Описание
200 Версия получена успешно

Примечания:

  • Версия читается из VERSION файла в корне проекта
  • Если файл недоступен, возвращается "unknown"
  • Environment определяется из GetEnvironment() конфигурации
  • Build metadata может содержать placeholder значения если не настроены build flags

Пример cURL

# Получить версию
curl -X GET http://localhost:8080/api/version

# Извлечь конкретное поле
curl -X GET http://localhost:8080/api/version | jq '.data.version'

# Production endpoint
curl -X GET https://app.saga.surf/api/version | jq .

Пример TypeScript

Базовая функция:

interface BuildInfo {
  version: string;
  environment: string;
  goVersion: string;
  buildTime: string;
  gitCommit: string;
}

interface VersionResponse {
  version: string;
  service: string;
  timestamp: string;
  buildInfo: BuildInfo;
}

async function getVersionInfo(): Promise<VersionResponse> {
  const response = await fetch('/api/version', {
    method: 'GET',
    headers: { 'Accept': 'application/json' }
  });

  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  const { data } = await response.json();
  return data;
}

// Usage
const version = await getVersionInfo();
console.log(`Version: ${version.version}`);
console.log(`Service: ${version.service}`);
console.log(`Environment: ${version.buildInfo.environment}`);
console.log(`Go Version: ${version.buildInfo.goVersion}`);

React Component:

import { useState, useEffect } from 'react';

function VersionInfo() {
  const [version, setVersion] = useState<VersionResponse | null>(null);

  useEffect(() => {
    fetchVersion();
  }, []);

  async function fetchVersion() {
    try {
      const versionData = await getVersionInfo();
      setVersion(versionData);
    } catch (error) {
      console.error('Failed to fetch version:', error);
    }
  }

  if (!version) return <div>Loading version info...</div>;

  return (
    <div className="version-info">
      <h3>System Version</h3>
      <div className="version-details">
        <div className="detail-row">
          <span className="label">Version:</span>
          <span className="value">{version.version}</span>
        </div>
        <div className="detail-row">
          <span className="label">Service:</span>
          <span className="value">{version.service}</span>
        </div>
        <div className="detail-row">
          <span className="label">Environment:</span>
          <span className="value">{version.buildInfo.environment}</span>
        </div>
        <div className="detail-row">
          <span className="label">Go Version:</span>
          <span className="value">{version.buildInfo.goVersion}</span>
        </div>
        <div className="detail-row">
          <span className="label">Build Time:</span>
          <span className="value">{version.buildInfo.buildTime}</span>
        </div>
        <div className="detail-row">
          <span className="label">Git Commit:</span>
          <span className="value code">{version.buildInfo.gitCommit}</span>
        </div>
      </div>
    </div>
  );
}

Version Comparison:

function parseVersion(version: string): number[] {
  if (version === 'unknown') return [0, 0, 0];
  return version.split('.').map(Number);
}

function compareVersions(v1: string, v2: string): number {
  const parts1 = parseVersion(v1);
  const parts2 = parseVersion(v2);

  for (let i = 0; i < 3; i++) {
    if (parts1[i] > parts2[i]) return 1;
    if (parts1[i] < parts2[i]) return -1;
  }
  return 0;
}

// Usage
const currentVersion = await getVersionInfo();
const minRequiredVersion = "2.1.0";

if (compareVersions(currentVersion.version, minRequiredVersion) < 0) {
  console.warn(`⚠️ Version ${currentVersion.version} is below minimum required ${minRequiredVersion}`);
}

🗄️ GET /api/health/database

Проверка здоровья базы данных (PostgreSQL connection, connection pool status).

Запрос

GET /api/health/database HTTP/1.1
Host: app.saga.surf
Accept: application/json

Аутентификация не требуется.

Query параметры: Нет

Context Timeout: GetQuickHealthTimeout() (quick timeout for DB check)

Ответ

Успех (200 OK):

{
  "success": true,
  "data": {
    "component": "database",
    "status": "healthy",
    "message": "Database connection successful",
    "latency": 5,
    "details": {
      "activeConnections": "3",
      "idleConnections": "7",
      "maxConnections": "20"
    },
    "lastChecked": "2025-10-06T12:00:00Z"
  },
  "message": "Database health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Unhealthy (503 Service Unavailable):

{
  "success": false,
  "data": {
    "component": "database",
    "status": "unhealthy",
    "message": "Database connection failed",
    "latency": 5000,
    "details": {
      "error": "connection timeout after 5s"
    },
    "lastChecked": "2025-10-06T12:00:00Z"
  },
  "message": "Database health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Поля ответа:

Поле Тип Описание
data.component string Всегда "database"
data.status string Status: healthy, degraded, unhealthy
data.message string Описание статуса
data.latency number Latency проверки в миллисекундах
data.details object Дополнительная информация (connections, errors)
data.lastChecked string Время последней проверки (ISO 8601)

Коды статуса:

Код статуса Описание
200 Database здорова
503 Database нездорова или недоступна

Пример cURL

# Database health check
curl -X GET http://localhost:8080/api/health/database

# Мониторинг database health
watch -n 10 'curl -s http://localhost:8080/api/health/database | jq .'

Пример TypeScript

Базовая функция:

async function checkDatabaseHealth(): Promise<ComponentHealth> {
  const response = await fetch('/api/health/database', {
    method: 'GET',
    headers: { 'Accept': 'application/json' }
  });

  const { data } = await response.json();
  return data;
}

// Usage
const dbHealth = await checkDatabaseHealth();
console.log(`Database: ${dbHealth.status} (${dbHealth.latency}ms)`);
if (dbHealth.details) {
  console.log(`Active connections: ${dbHealth.details.activeConnections}`);
}

React Component:

function DatabaseHealthMonitor() {
  const [health, setHealth] = useState<ComponentHealth | null>(null);

  useEffect(() => {
    fetchHealth();
    const interval = setInterval(fetchHealth, 30000);
    return () => clearInterval(interval);
  }, []);

  async function fetchHealth() {
    try {
      const dbHealth = await checkDatabaseHealth();
      setHealth(dbHealth);
    } catch (error) {
      console.error('Database health check failed:', error);
    }
  }

  if (!health) return <div>Loading...</div>;

  return (
    <div className={`db-health ${health.status}`}>
      <h4>Database Health</h4>
      <div className="status-badge">{health.status}</div>
      <p>{health.message}</p>
      <div className="latency">Response time: {health.latency}ms</div>
      {health.details && (
        <div className="connection-pool">
          <p>Active: {health.details.activeConnections}</p>
          <p>Idle: {health.details.idleConnections}</p>
          <p>Max: {health.details.maxConnections}</p>
        </div>
      )}
    </div>
  );
}

⛓️ GET /api/health/blockchain

Проверка здоровья blockchain подключения.

Запрос

GET /api/health/blockchain HTTP/1.1
Host: app.saga.surf
Accept: application/json

Аутентификация не требуется.

Query параметры: Нет

Context Timeout: GetHealthCheckTimeout() (configurable timeout)

Ответ

Успех (200 OK):

{
  "success": true,
  "data": {
    "component": "blockchain",
    "status": "healthy",
    "message": "Blockchain node operational",
    "latency": 150,
    "details": {
      "blockNumber": "12345678",
      "networkID": "1337",
      "nodeVersion": "anvil"
    },
    "lastChecked": "2025-10-06T12:00:00Z"
  },
  "message": "Blockchain health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Unhealthy (503 Service Unavailable):

{
  "success": false,
  "data": {
    "component": "blockchain",
    "status": "unhealthy",
    "message": "Blockchain node unreachable",
    "latency": 5000,
    "details": {
      "error": "connection timeout",
      "rpcUrl": "http://188.42.218.164:8545"
    },
    "lastChecked": "2025-10-06T12:00:00Z"
  },
  "message": "Blockchain health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Поля ответа:

Поле Тип Описание
data.component string Всегда "blockchain"
data.status string Status: healthy, degraded, unhealthy
data.message string Описание статуса
data.latency number Latency проверки в миллисекундах
data.details object Информация о ноде (blockNumber, networkID, errors)
data.lastChecked string Время последней проверки (ISO 8601)

Коды статуса:

Код статуса Описание
200 Blockchain здоров
503 Blockchain нездоров или недоступен

Пример cURL

# Blockchain health check
curl -X GET http://localhost:8080/api/health/blockchain

# Мониторинг blockchain health
watch -n 15 'curl -s http://localhost:8080/api/health/blockchain | jq .'

Пример TypeScript

Базовая функция:

async function checkBlockchainHealth(): Promise<ComponentHealth> {
  const response = await fetch('/api/health/blockchain', {
    method: 'GET',
    headers: { 'Accept': 'application/json' }
  });

  const { data } = await response.json();
  return data;
}

// Usage
const bcHealth = await checkBlockchainHealth();
console.log(`Blockchain: ${bcHealth.status} (${bcHealth.latency}ms)`);
if (bcHealth.details) {
  console.log(`Block: ${bcHealth.details.blockNumber}`);
  console.log(`Network: ${bcHealth.details.networkID}`);
}

React Component:

function BlockchainHealthMonitor() {
  const [health, setHealth] = useState<ComponentHealth | null>(null);

  useEffect(() => {
    fetchHealth();
    const interval = setInterval(fetchHealth, 15000); // Every 15 seconds
    return () => clearInterval(interval);
  }, []);

  async function fetchHealth() {
    try {
      const bcHealth = await checkBlockchainHealth();
      setHealth(bcHealth);
    } catch (error) {
      console.error('Blockchain health check failed:', error);
    }
  }

  if (!health) return <div>Loading...</div>;

  return (
    <div className={`blockchain-health ${health.status}`}>
      <h4>Blockchain Health</h4>
      <div className="status-badge">{health.status}</div>
      <p>{health.message}</p>
      <div className="latency">Response time: {health.latency}ms</div>
      {health.details && (
        <div className="blockchain-details">
          <p>Block: #{health.details.blockNumber}</p>
          <p>Network ID: {health.details.networkID}</p>
          <p>Node: {health.details.nodeVersion}</p>
        </div>
      )}
    </div>
  );
}

💼 GET /api/health/business

Проверка здоровья бизнес-логики системы.

Запрос

GET /api/health/business HTTP/1.1
Host: app.saga.surf
Accept: application/json

Аутентификация не требуется.

Query параметры: Нет

Context Timeout: GetHealthCheckTimeout() (configurable timeout)

Примечание: В текущей implementation использует database health как proxy для business logic health.

Ответ

Успех (200 OK):

{
  "success": true,
  "data": {
    "component": "database",
    "status": "healthy",
    "message": "Database connection successful",
    "latency": 5,
    "details": {
      "activeConnections": "3",
      "businessLogicReady": "true"
    },
    "lastChecked": "2025-10-06T12:00:00Z"
  },
  "message": "Business logic health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Unhealthy (503 Service Unavailable):

{
  "success": false,
  "data": {
    "component": "database",
    "status": "unhealthy",
    "message": "Business logic unavailable",
    "latency": 5000,
    "details": {
      "error": "database connection failed"
    },
    "lastChecked": "2025-10-06T12:00:00Z"
  },
  "message": "Business logic health check completed",
  "timestamp": "2025-10-06T12:00:00Z"
}

Поля ответа:

Поле Тип Описание
data.component string Component name (currently "database")
data.status string Status: healthy, degraded, unhealthy
data.message string Описание статуса
data.latency number Latency проверки в миллисекундах
data.details object Дополнительная информация
data.lastChecked string Время последней проверки (ISO 8601)

Коды статуса:

Код статуса Описание
200 Business logic здорова
503 Business logic нездорова или недоступна

Пример cURL

# Business logic health check
curl -X GET http://localhost:8080/api/health/business

# Мониторинг business health
watch -n 30 'curl -s http://localhost:8080/api/health/business | jq .'

Пример TypeScript

Базовая функция:

async function checkBusinessHealth(): Promise<ComponentHealth> {
  const response = await fetch('/api/health/business', {
    method: 'GET',
    headers: { 'Accept': 'application/json' }
  });

  const { data } = await response.json();
  return data;
}

// Usage
const businessHealth = await checkBusinessHealth();
console.log(`Business Logic: ${businessHealth.status}`);

GET /api/health/resources

Получить информацию об использовании системных ресурсов (memory, goroutines).

Запрос

GET /api/health/resources HTTP/1.1
Host: app.saga.surf
Accept: application/json

Аутентификация не требуется.

Query параметры: Нет

Ответ

Успех (200 OK):

{
  "success": true,
  "data": {
    "resourceUsage": {
      "memory": {
        "alloc": 12345678,
        "sys": 23456789
      },
      "goroutines": 42
    },
    "timestamp": "2025-10-06T12:00:00Z"
  },
  "message": "Resource usage retrieved",
  "timestamp": "2025-10-06T12:00:00Z",
  "traceId": "req-abc123"
}

Поля ответа:

Поле Тип Описание
data.resourceUsage.memory.alloc number Allocated memory в bytes (currently used)
data.resourceUsage.memory.sys number Total memory obtained from OS в bytes
data.resourceUsage.goroutines number Количество active goroutines
data.timestamp string Время снятия метрик (ISO 8601)

Коды статуса:

Код статуса Описание
200 Resource usage получен успешно

Примечания:

  • alloc — память, выделенная и еще не освобожденная
  • sys — вся память, полученная от OS (включая освобожденную, но не возвращенную)
  • goroutines — количество активных goroutines (нормально: 10-100, высоко: >1000)

Пример cURL

# Resource usage
curl -X GET http://localhost:8080/api/health/resources

# Мониторинг resource usage
watch -n 5 'curl -s http://localhost:8080/api/health/resources | jq .'

# Конвертировать bytes в MB
curl -s http://localhost:8080/api/health/resources | \
  jq '.data.resourceUsage.memory | {allocMB: (.alloc / 1024 / 1024), sysMB: (.sys / 1024 / 1024)}'

Пример TypeScript

Базовая функция:

interface MemoryUsage {
  alloc: number;
  sys: number;
}

interface ResourceUsage {
  memory: MemoryUsage;
  goroutines: number;
}

interface ResourceUsageResponse {
  resourceUsage: ResourceUsage;
  timestamp: string;
}

async function getResourceUsage(): Promise<ResourceUsageResponse> {
  const response = await fetch('/api/health/resources', {
    method: 'GET',
    headers: { 'Accept': 'application/json' }
  });

  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  const { data } = await response.json();
  return data;
}

// Helper для конвертации bytes в MB
function bytesToMB(bytes: number): number {
  return Math.round(bytes / 1024 / 1024 * 100) / 100;
}

// Usage
const resources = await getResourceUsage();
console.log(`Memory Allocated: ${bytesToMB(resources.resourceUsage.memory.alloc)} MB`);
console.log(`Memory System: ${bytesToMB(resources.resourceUsage.memory.sys)} MB`);
console.log(`Goroutines: ${resources.resourceUsage.goroutines}`);

React Component:

import { useState, useEffect } from 'react';

function ResourceUsageMonitor() {
  const [resources, setResources] = useState<ResourceUsageResponse | null>(null);
  const [history, setHistory] = useState<ResourceUsage[]>([]);

  useEffect(() => {
    fetchResources();
    const interval = setInterval(fetchResources, 5000); // Every 5 seconds
    return () => clearInterval(interval);
  }, []);

  async function fetchResources() {
    try {
      const resourceData = await getResourceUsage();
      setResources(resourceData);

      // Keep last 60 data points (5 minutes at 5-second intervals)
      setHistory(prev => [...prev, resourceData.resourceUsage].slice(-60));
    } catch (error) {
      console.error('Resource usage fetch failed:', error);
    }
  }

  if (!resources) return <div>Loading...</div>;

  const allocMB = bytesToMB(resources.resourceUsage.memory.alloc);
  const sysMB = bytesToMB(resources.resourceUsage.memory.sys);
  const goroutines = resources.resourceUsage.goroutines;

  // Calculate trends
  const avgGoroutines = history.length > 0
    ? history.reduce((sum, r) => sum + r.goroutines, 0) / history.length
    : 0;

  return (
    <div className="resource-usage-monitor">
      <h4>System Resources</h4>

      <div className="metric-grid">
        <div className="metric">
          <span className="label">Memory Allocated:</span>
          <span className="value">{allocMB} MB</span>
        </div>
        <div className="metric">
          <span className="label">Memory System:</span>
          <span className="value">{sysMB} MB</span>
        </div>
        <div className="metric">
          <span className="label">Goroutines:</span>
          <span className="value">{goroutines}</span>
        </div>
        {history.length > 10 && (
          <div className="metric">
            <span className="label">Avg Goroutines (5min):</span>
            <span className="value">{Math.round(avgGoroutines)}</span>
          </div>
        )}
      </div>

      {goroutines > 1000 && (
        <div className="warning">
          ⚠️ High goroutine count detected ({goroutines})
        </div>
      )}
    </div>
  );
}

Alerting:

interface ResourceAlert {
  severity: 'warning' | 'critical';
  message: string;
  value: number;
}

function analyzeResourceUsage(resources: ResourceUsage): ResourceAlert[] {
  const alerts: ResourceAlert[] = [];

  const allocMB = resources.memory.alloc / 1024 / 1024;
  const sysMB = resources.memory.sys / 1024 / 1024;

  // High memory usage (>500MB allocated)
  if (allocMB > 500) {
    alerts.push({
      severity: 'warning',
      message: `High memory allocation: ${allocMB.toFixed(2)} MB`,
      value: allocMB
    });
  }

  // Critical memory usage (>1GB allocated)
  if (allocMB > 1024) {
    alerts.push({
      severity: 'critical',
      message: `Critical memory allocation: ${allocMB.toFixed(2)} MB`,
      value: allocMB
    });
  }

  // High goroutine count (>1000)
  if (resources.goroutines > 1000) {
    alerts.push({
      severity: 'warning',
      message: `High goroutine count: ${resources.goroutines}`,
      value: resources.goroutines
    });
  }

  // Critical goroutine count (>5000)
  if (resources.goroutines > 5000) {
    alerts.push({
      severity: 'critical',
      message: `Critical goroutine count: ${resources.goroutines}`,
      value: resources.goroutines
    });
  }

  return alerts;
}

// Usage
const resources = await getResourceUsage();
const alerts = analyzeResourceUsage(resources.resourceUsage);

if (alerts.length > 0) {
  console.warn(`⚠️ ${alerts.length} resource alerts:`);
  alerts.forEach(alert => {
    console.warn(`  ${alert.severity.toUpperCase()}: ${alert.message}`);
  });
}

GET /api/health/ready

Kubernetes readiness probe — проверяет готовность системы к обслуживанию запросов.

Запрос

GET /api/health/ready HTTP/1.1
Host: app.saga.surf

Аутентификация не требуется.

Query параметры: Нет

Context Timeout: GetQuickHealthTimeout() (quick timeout for readiness check)

Примечание: Проверяет критичные компоненты (database) для определения готовности.

Ответ

Ready (200 OK):

ready

Not Ready (503 Service Unavailable):

not ready

Поля ответа:

Readiness probe возвращает простой текстовый ответ (не JSON): - "ready" — система готова к обслуживанию запросов - "not ready" — система не готова (database unhealthy)

Коды статуса:

Код статуса Описание
200 Система готова к обслуживанию запросов
503 Система не готова (database unhealthy или degraded)

Примечания:

  • Проверяет database health (критичный компонент)
  • Ready если database status = healthy или degraded
  • Not ready если database status = unhealthy

Пример cURL

# Readiness probe
curl -X GET http://localhost:8080/api/health/ready

# Проверка с status code
curl -X GET http://localhost:8080/api/health/ready \
  -w "\nHTTP Status: %{http_code}\n"

# Continuous monitoring
watch -n 5 'curl -s -w "\nStatus: %{http_code}\n" http://localhost:8080/api/health/ready'

Kubernetes Configuration

Readiness Probe:

readinessProbe:
  httpGet:
    path: /api/health/ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  successThreshold: 1
  failureThreshold: 3

Как работает в Kubernetes:

  • initialDelaySeconds: 10 — ждать 10 секунд после старта контейнера
  • periodSeconds: 5 — проверять каждые 5 секунд
  • timeoutSeconds: 3 — timeout для HTTP запроса
  • successThreshold: 1 — 1 успешная проверка = ready
  • failureThreshold: 3 — 3 неуспешные проверки подряд = not ready

Поведение Kubernetes:

  • Not ready pods не получают трафик от Service
  • Load balancer не отправляет запросы на not ready pods
  • Deployment rollout ждет пока все pods станут ready

Пример TypeScript

Базовая функция:

async function checkReadiness(): Promise<boolean> {
  try {
    const response = await fetch('/api/health/ready', {
      method: 'GET'
    });

    return response.ok; // true if 200, false if 503
  } catch (error) {
    console.error('Readiness check failed:', error);
    return false;
  }
}

// Usage
const isReady = await checkReadiness();
console.log(`System ready: ${isReady}`);

React Component:

function ReadinessIndicator() {
  const [ready, setReady] = useState<boolean | null>(null);

  useEffect(() => {
    checkReady();
    const interval = setInterval(checkReady, 5000); // Every 5 seconds
    return () => clearInterval(interval);
  }, []);

  async function checkReady() {
    const isReady = await checkReadiness();
    setReady(isReady);
  }

  if (ready === null) return <div>Checking readiness...</div>;

  return (
    <div className={`readiness-indicator ${ready ? 'ready' : 'not-ready'}`}>
      {ready ? '✅ System Ready' : '⏳ System Not Ready'}
    </div>
  );
}

💚 GET /api/health/live

Kubernetes liveness probe — проверяет что приложение живо и работает.

Запрос

GET /api/health/live HTTP/1.1
Host: app.saga.surf

Аутентификация не требуется.

Query параметры: Нет

Context Timeout: GetEmergencyHealthTimeout() (emergency quick timeout for liveness)

Примечание: Проверяет базовую работоспособность приложения.

Ответ

Alive (200 OK):

alive

Unhealthy (503 Service Unavailable):

unhealthy

Поля ответа:

Liveness probe возвращает простой текстовый ответ (не JSON): - "alive" — приложение работает - "unhealthy" — приложение не работает (требуется restart)

Коды статуса:

Код статуса Описание
200 Приложение живо
503 Приложение не отвечает (требуется restart)

Примечания:

  • Проверяет базовую работоспособность через GetHealthStatus()
  • Alive если IsHealthy = true
  • Unhealthy если IsHealthy = false

Пример cURL

# Liveness probe
curl -X GET http://localhost:8080/api/health/live

# Проверка с status code
curl -X GET http://localhost:8080/api/health/live \
  -w "\nHTTP Status: %{http_code}\n"

# Continuous monitoring
watch -n 10 'curl -s -w "\nStatus: %{http_code}\n" http://localhost:8080/api/health/live'

Kubernetes Configuration

Liveness Probe:

livenessProbe:
  httpGet:
    path: /api/health/live
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

Как работает в Kubernetes:

  • initialDelaySeconds: 30 — ждать 30 секунд после старта (больше чем readiness)
  • periodSeconds: 10 — проверять каждые 10 секунд
  • timeoutSeconds: 5 — timeout для HTTP запроса
  • successThreshold: 1 — 1 успешная проверка = alive
  • failureThreshold: 3 — 3 неуспешные проверки подряд = restart container

Поведение Kubernetes:

  • Failed liveness probe → Kubernetes restarts container
  • Используется для recovery от deadlocks, infinite loops
  • Более агрессивный чем readiness (restart vs не давать трафик)

Пример TypeScript

Базовая функция:

async function checkLiveness(): Promise<boolean> {
  try {
    const response = await fetch('/api/health/live', {
      method: 'GET'
    });

    return response.ok; // true if 200, false if 503
  } catch (error) {
    console.error('Liveness check failed:', error);
    return false;
  }
}

// Usage
const isAlive = await checkLiveness();
console.log(`Application alive: ${isAlive}`);

React Component:

function LivenessIndicator() {
  const [alive, setAlive] = useState<boolean | null>(null);
  const [failureCount, setFailureCount] = useState(0);

  useEffect(() => {
    checkLive();
    const interval = setInterval(checkLive, 10000); // Every 10 seconds
    return () => clearInterval(interval);
  }, []);

  async function checkLive() {
    const isAlive = await checkLiveness();
    setAlive(isAlive);

    if (!isAlive) {
      setFailureCount(prev => prev + 1);
    } else {
      setFailureCount(0);
    }
  }

  if (alive === null) return <div>Checking liveness...</div>;

  return (
    <div className={`liveness-indicator ${alive ? 'alive' : 'dead'}`}>
      {alive ? '💚 Application Alive' : '💀 Application Unhealthy'}
      {failureCount > 0 && (
        <div className="failure-count">
          Failed checks: {failureCount}/3 (restart at 3)
        </div>
      )}
    </div>
  );
}

Use Cases

1. Comprehensive Health Dashboard

Комбинирование всех health endpoints для создания полного dashboard.

interface HealthDashboardData {
  general: HealthResponse;
  detailed: DetailedHealthResponse;
  version: VersionResponse;
  database: ComponentHealth;
  blockchain: ComponentHealth;
  business: ComponentHealth;
  resources: ResourceUsageResponse;
  readiness: boolean;
  liveness: boolean;
}

async function fetchHealthDashboard(): Promise<HealthDashboardData> {
  const [
    general,
    detailed,
    version,
    database,
    blockchain,
    business,
    resources,
    readiness,
    liveness
  ] = await Promise.all([
    checkSystemHealth(),
    checkDetailedHealth(),
    getVersionInfo(),
    checkDatabaseHealth(),
    checkBlockchainHealth(),
    checkBusinessHealth(),
    getResourceUsage(),
    checkReadiness(),
    checkLiveness()
  ]);

  return {
    general,
    detailed,
    version,
    database,
    blockchain,
    business,
    resources,
    readiness,
    liveness
  };
}

function HealthDashboard() {
  const [dashboard, setDashboard] = useState<HealthDashboardData | null>(null);

  useEffect(() => {
    fetchData();
    const interval = setInterval(fetchData, 30000);
    return () => clearInterval(interval);
  }, []);

  async function fetchData() {
    try {
      const data = await fetchHealthDashboard();
      setDashboard(data);
    } catch (error) {
      console.error('Failed to fetch dashboard data:', error);
    }
  }

  if (!dashboard) return <div>Loading dashboard...</div>;

  return (
    <div className="health-dashboard">
      <header>
        <h1>System Health Dashboard</h1>
        <div className="version">Version: {dashboard.version.version}</div>
      </header>

      <div className="status-grid">
        <div className="status-card overall">
          <h3>Overall Status</h3>
          <div className={`badge ${dashboard.general.status}`}>
            {dashboard.general.status}
          </div>
          <p>{dashboard.general.message}</p>
        </div>

        <div className="status-card">
          <h3>Database</h3>
          <StatusBadge component={dashboard.database} />
        </div>

        <div className="status-card">
          <h3>Blockchain</h3>
          <StatusBadge component={dashboard.blockchain} />
        </div>

        <div className="status-card">
          <h3>Business Logic</h3>
          <StatusBadge component={dashboard.business} />
        </div>
      </div>

      <div className="resources-section">
        <h3>System Resources</h3>
        <ResourceUsageMonitor />
      </div>

      <div className="probes-section">
        <h3>Kubernetes Probes</h3>
        <div className="probes-grid">
          <div className={`probe ${dashboard.readiness ? 'ready' : 'not-ready'}`}>
            <span>Readiness:</span>
            <span>{dashboard.readiness ? '✅ Ready' : '⏳ Not Ready'}</span>
          </div>
          <div className={`probe ${dashboard.liveness ? 'alive' : 'dead'}`}>
            <span>Liveness:</span>
            <span>{dashboard.liveness ? '💚 Alive' : '💀 Unhealthy'}</span>
          </div>
        </div>
      </div>

      <div className="detailed-components">
        <h3>Detailed Component Status</h3>
        <DetailedHealthDashboard />
      </div>
    </div>
  );
}

2. Automated Health Monitoring

Автоматический мониторинг с alerting при проблемах.

interface HealthAlert {
  timestamp: Date;
  severity: 'info' | 'warning' | 'critical';
  component: string;
  message: string;
  details?: any;
}

class HealthMonitor {
  private alerts: HealthAlert[] = [];
  private checkInterval: NodeJS.Timeout | null = null;

  start(intervalMs = 30000) {
    this.checkInterval = setInterval(() => this.performChecks(), intervalMs);
    console.log('Health monitoring started');
  }

  stop() {
    if (this.checkInterval) {
      clearInterval(this.checkInterval);
      console.log('Health monitoring stopped');
    }
  }

  private async performChecks() {
    try {
      const [general, detailed, resources, readiness, liveness] = await Promise.all([
        checkSystemHealth(),
        checkDetailedHealth(),
        getResourceUsage(),
        checkReadiness(),
        checkLiveness()
      ]);

      // Check overall status
      if (general.status === 'unhealthy') {
        this.addAlert({
          severity: 'critical',
          component: 'system',
          message: `System unhealthy: ${general.message}`
        });
      }

      // Check component health
      detailed.components.forEach(comp => {
        if (comp.status === 'unhealthy') {
          this.addAlert({
            severity: 'critical',
            component: comp.component,
            message: comp.message || 'Component unhealthy',
            details: comp.details
          });
        } else if (comp.status === 'degraded') {
          this.addAlert({
            severity: 'warning',
            component: comp.component,
            message: comp.message || 'Component degraded',
            details: comp.details
          });
        }
      });

      // Check resource usage
      const allocMB = resources.resourceUsage.memory.alloc / 1024 / 1024;
      if (allocMB > 1024) {
        this.addAlert({
          severity: 'critical',
          component: 'resources',
          message: `Critical memory usage: ${allocMB.toFixed(2)} MB`
        });
      } else if (allocMB > 500) {
        this.addAlert({
          severity: 'warning',
          component: 'resources',
          message: `High memory usage: ${allocMB.toFixed(2)} MB`
        });
      }

      if (resources.resourceUsage.goroutines > 5000) {
        this.addAlert({
          severity: 'critical',
          component: 'resources',
          message: `Critical goroutine count: ${resources.resourceUsage.goroutines}`
        });
      } else if (resources.resourceUsage.goroutines > 1000) {
        this.addAlert({
          severity: 'warning',
          component: 'resources',
          message: `High goroutine count: ${resources.resourceUsage.goroutines}`
        });
      }

      // Check probes
      if (!readiness) {
        this.addAlert({
          severity: 'warning',
          component: 'readiness',
          message: 'System not ready'
        });
      }

      if (!liveness) {
        this.addAlert({
          severity: 'critical',
          component: 'liveness',
          message: 'System not alive (requires restart)'
        });
      }

    } catch (error) {
      this.addAlert({
        severity: 'critical',
        component: 'monitoring',
        message: `Health check failed: ${error.message}`
      });
    }
  }

  private addAlert(alert: Omit<HealthAlert, 'timestamp'>) {
    const newAlert = {
      ...alert,
      timestamp: new Date()
    };

    this.alerts.push(newAlert);
    this.notifyAlert(newAlert);

    // Keep last 100 alerts
    if (this.alerts.length > 100) {
      this.alerts = this.alerts.slice(-100);
    }
  }

  private notifyAlert(alert: HealthAlert) {
    const icon = alert.severity === 'critical' ? '🚨' :
                 alert.severity === 'warning' ? '⚠️' : 'ℹ️';

    console.log(`${icon} [${alert.severity.toUpperCase()}] ${alert.component}: ${alert.message}`);

    // Можно интегрировать с PagerDuty, email
    if (alert.severity === 'critical') {
      this.sendCriticalAlert(alert);
    }
  }

  private async sendCriticalAlert(alert: HealthAlert) {
    // Integration с alerting системами
    console.error('CRITICAL ALERT:', alert);

    // Пример: отправка email
    // await sendEmailNotification({
    //   to: 'team@saga.surf',
    //   subject: `🚨 CRITICAL: ${alert.component}`,
    //   text: `${alert.message}`,
    //   details: alert.details
    // });
  }

  getAlerts(severity?: HealthAlert['severity']): HealthAlert[] {
    if (severity) {
      return this.alerts.filter(a => a.severity === severity);
    }
    return [...this.alerts];
  }

  clearAlerts() {
    this.alerts = [];
  }
}

// Usage
const monitor = new HealthMonitor();
monitor.start(30000); // Check every 30 seconds

// Получить критические alerts
const criticalAlerts = monitor.getAlerts('critical');
console.log(`Critical alerts: ${criticalAlerts.length}`);

3. Kubernetes Integration

Полная интеграция с Kubernetes probes и health checks.

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: saga-backend
  namespace: saga
spec:
  replicas: 3
  selector:
    matchLabels:
      app: saga-backend
  template:
    metadata:
      labels:
        app: saga-backend
    spec:
      containers:
      - name: saga-backend
        image: saga-backend:latest
        ports:
        - containerPort: 8080
          name: http

        # Readiness Probe
        readinessProbe:
          httpGet:
            path: /api/health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3

        # Liveness Probe
        livenessProbe:
          httpGet:
            path: /api/health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3

        # Startup Probe (для медленного старта)
        startupProbe:
          httpGet:
            path: /api/health/live
            port: 8080
          initialDelaySeconds: 0
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 30  # 150 seconds max startup time

        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: saga-secrets
              key: db-password
        - name: JWT_SECRET
          valueFrom:
            secretKeyRef:
              name: saga-secrets
              key: jwt-secret

        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: saga-backend
  namespace: saga
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: saga-backend

---
# servicemonitor.yaml (Prometheus monitoring)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: saga-backend
  namespace: saga
spec:
  selector:
    matchLabels:
      app: saga-backend
  endpoints:
  - port: http
    path: /api/health/detailed
    interval: 30s

4. Prometheus Metrics Export

Конвертация health endpoints в Prometheus metrics.

// prometheus-exporter.ts
class PrometheusHealthExporter {
  async exportMetrics(): Promise<string> {
    try {
      const [general, detailed, resources] = await Promise.all([
        checkSystemHealth(),
        checkDetailedHealth(),
        getResourceUsage()
      ]);

      const metrics: string[] = [];

      // Overall health status
      const statusValue = general.status === 'healthy' ? 2 :
                         general.status === 'degraded' ? 1 : 0;
      metrics.push(`saga_health_status{status="${general.status}"} ${statusValue}`);

      // Component health
      detailed.components.forEach(comp => {
        const compStatusValue = comp.status === 'healthy' ? 2 :
                               comp.status === 'degraded' ? 1 : 0;
        metrics.push(`saga_component_health{component="${comp.component}",status="${comp.status}"} ${compStatusValue}`);
        metrics.push(`saga_component_latency_ms{component="${comp.component}"} ${comp.latency}`);
      });

      // Resource usage
      metrics.push(`saga_memory_alloc_bytes ${resources.resourceUsage.memory.alloc}`);
      metrics.push(`saga_memory_sys_bytes ${resources.resourceUsage.memory.sys}`);
      metrics.push(`saga_goroutines ${resources.resourceUsage.goroutines}`);

      // Component count
      metrics.push(`saga_component_count ${general.componentCount}`);
      metrics.push(`saga_healthy_components ${detailed.components.filter(c => c.status === 'healthy').length}`);
      metrics.push(`saga_degraded_components ${detailed.components.filter(c => c.status === 'degraded').length}`);
      metrics.push(`saga_unhealthy_components ${detailed.components.filter(c => c.status === 'unhealthy').length}`);

      return metrics.join('\n') + '\n';
    } catch (error) {
      console.error('Failed to export metrics:', error);
      return '# Failed to export metrics\n';
    }
  }
}

// Express endpoint для Prometheus
app.get('/metrics', async (req, res) => {
  const exporter = new PrometheusHealthExporter();
  const metrics = await exporter.exportMetrics();

  res.set('Content-Type', 'text/plain; version=0.0.4');
  res.send(metrics);
});

5. Health Status Page

Публичная status page для пользователей.

function PublicStatusPage() {
  const [status, setStatus] = useState<'operational' | 'degraded' | 'major_outage'>('operational');
  const [components, setComponents] = useState<ComponentHealth[]>([]);
  const [lastUpdate, setLastUpdate] = useState<Date>(new Date());

  useEffect(() => {
    fetchStatus();
    const interval = setInterval(fetchStatus, 60000); // Every minute
    return () => clearInterval(interval);
  }, []);

  async function fetchStatus() {
    try {
      const detailed = await checkDetailedHealth();

      // Determine overall status
      const unhealthyCount = detailed.components.filter(c => c.status === 'unhealthy').length;
      const degradedCount = detailed.components.filter(c => c.status === 'degraded').length;

      if (unhealthyCount > 0) {
        setStatus('major_outage');
      } else if (degradedCount > 0) {
        setStatus('degraded');
      } else {
        setStatus('operational');
      }

      setComponents(detailed.components);
      setLastUpdate(new Date());
    } catch (error) {
      setStatus('major_outage');
    }
  }

  const statusConfig = {
    operational: { icon: '✅', color: 'green', text: 'All Systems Operational' },
    degraded: { icon: '⚠️', color: 'yellow', text: 'Partial System Outage' },
    major_outage: { icon: '❌', color: 'red', text: 'Major Service Outage' }
  };

  const config = statusConfig[status];

  return (
    <div className="status-page">
      <header>
        <h1>Saga System Status</h1>
        <div className={`overall-status ${config.color}`}>
          <span className="icon">{config.icon}</span>
          <span className="text">{config.text}</span>
        </div>
        <p className="last-update">
          Last updated: {lastUpdate.toLocaleString()}
        </p>
      </header>

      <section className="components-status">
        <h2>System Components</h2>
        <div className="components-list">
          {components.map(comp => (
            <div key={comp.component} className={`component-item ${comp.status}`}>
              <div className="component-name">
                {comp.component.charAt(0).toUpperCase() + comp.component.slice(1)}
              </div>
              <div className="component-status">
                {comp.status === 'healthy' && <span className="badge green"> Operational</span>}
                {comp.status === 'degraded' && <span className="badge yellow">⚠️ Degraded Performance</span>}
                {comp.status === 'unhealthy' && <span className="badge red"> Outage</span>}
              </div>
              {comp.message && (
                <div className="component-message">{comp.message}</div>
              )}
            </div>
          ))}
        </div>
      </section>

      <footer>
        <p>For support, contact support@saga.surf</p>
      </footer>
    </div>
  );
}

Technical Details

Health Checking Architecture

Comprehensive HealthService:

  • Координирует все health checks через единый сервис
  • Использует context.Context для timeout management
  • Параллельные проверки компонентов для минимизации latency
  • Caching результатов для снижения нагрузки

Timeout Configuration:

// Configurable timeouts в config.yaml
GetHealthCheckTimeout()          // Standard health check (5s)
GetQuickHealthTimeout()         // Quick checks (database, readiness) (2s)
GetComprehensiveHealthTimeout() // Detailed health check (10s)
GetEmergencyHealthTimeout()     // Emergency liveness (1s)
GetDefaultTimeout()             // Fallback safe value (5s)

Health Status Determination:

type HealthStatusCode string

const (
    HealthStatusHealthy   HealthStatusCode = "healthy"   // Все компоненты OK
    HealthStatusDegraded  HealthStatusCode = "degraded"  // Некоторые degraded, но работает
    HealthStatusUnhealthy HealthStatusCode = "unhealthy" // Критические компоненты failed
    HealthStatusUnknown   HealthStatusCode = "unknown"   // Невозможно определить
)

Component Health Checks:

  • Database: PostgreSQL ping + connection pool status
  • Blockchain: VPS node connectivity + latest block check
  • Business Logic: Proxy через database health (в текущей implementation)
  • Resources: Go runtime metrics (memory, goroutines)

HTTP Status Code Mapping:

switch systemHealth.Status {
case HealthStatusHealthy, HealthStatusDegraded:
    statusCode = http.StatusOK              // 200
case HealthStatusUnhealthy, HealthStatusUnknown:
    statusCode = http.StatusServiceUnavailable // 503
}

Monitoring Best Practices

Polling Intervals:

  • Ping: 30 seconds (frequent availability check)
  • Health: 60 seconds (general system status)
  • Detailed Health: 30 seconds (comprehensive monitoring)
  • Version: Once on app start (static data)
  • Database Health: 30 seconds (critical component)
  • Blockchain Health: 15 seconds (blockchain specific)
  • Resources: 5 seconds (fast-changing metrics)
  • Readiness: 5 seconds (Kubernetes default)
  • Liveness: 10 seconds (Kubernetes default)

Alerting Thresholds:

  • Memory Usage: Warning at 500MB, Critical at 1GB
  • Goroutines: Warning at 1000, Critical at 5000
  • Component Latency: Warning at 1000ms, Critical at 5000ms
  • Unhealthy Duration: Critical if unhealthy >5 minutes

Kubernetes Probe Configuration:

# Startup Probe (for slow-starting applications)
startupProbe:
  initialDelaySeconds: 0
  periodSeconds: 5
  failureThreshold: 30  # 150s max startup time

# Readiness Probe (for traffic routing)
readinessProbe:
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3   # 15s to mark as not ready

# Liveness Probe (for restart decisions)
livenessProbe:
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3   # 30s to restart container

Rate Limiting Considerations:

  • Health endpoints не имеют rate limit (мониторинг критичен)
  • Public endpoints (/health, /api/ping) accessible без auth
  • Рекомендуется не превышать 1 request/second для детальных checks

Error Handling

Connection Timeouts:

async function checkWithTimeout<T>(
  checkFunc: () => Promise<T>,
  timeoutMs: number
): Promise<T> {
  const timeout = new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Health check timeout')), timeoutMs)
  );

  return Promise.race([checkFunc(), timeout]);
}

// Usage
try {
  const health = await checkWithTimeout(
    () => checkSystemHealth(),
    5000 // 5 second timeout
  );
} catch (error) {
  console.error('Health check failed:', error);
}

Graceful Degradation:

async function getHealthWithFallback(): Promise<HealthResponse> {
  try {
    return await checkSystemHealth();
  } catch (error) {
    // Return fallback data if health check fails
    return {
      status: 'unknown',
      message: 'Health check unavailable',
      timestamp: new Date().toISOString(),
      uptime: '0s',
      version: 'unknown',
      healthy: false,
      componentCount: 0
    };
  }
}

Retry Logic:

async function checkHealthWithRetry(maxRetries = 3): Promise<HealthResponse> {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await checkSystemHealth();
    } catch (error) {
      if (i === maxRetries - 1) throw error;

      // Exponential backoff
      await new Promise(resolve => setTimeout(resolve, 1000 * Math.pow(2, i)));
    }
  }

  throw new Error('Max retries exceeded');
}

Version Management

Semantic Versioning:

  • VERSION файл в корне проекта — single source of truth
  • Automatic sync через master config generator система
  • Version читается при старте application
  • Fallback на "unknown" если файл недоступен

Build Metadata:

  • buildTime — может быть установлен через Go build flags
  • gitCommit — может быть установлен через Git hooks
  • environment — читается из UnifiedConfig system

Version Comparison:

// Semantic version comparison
function isVersionNewer(current: string, compare: string): boolean {
  if (current === 'unknown' || compare === 'unknown') return false;

  const [cMajor, cMinor, cPatch] = current.split('.').map(Number);
  const [rMajor, rMinor, rPatch] = compare.split('.').map(Number);

  if (cMajor !== rMajor) return cMajor > rMajor;
  if (cMinor !== rMinor) return cMinor > rMinor;
  return cPatch > rPatch;
}

Связанная документация

Мониторинг и Observability:

Архитектура:

DevOps и Deployment:

Конфигурация:




📋 Метаданные

Версия: 2.6.268

Обновлено: 2025-10-21

Статус: Published