Health & Monitoring Endpoints¶
Аудитория: разработчики, DevOps инженеры, мониторинговые системы Последнее обновление: 2025-11-17 Краткое содержание: Полная документация Health & Monitoring endpoints — comprehensive health checks, component-specific monitoring, Kubernetes probes, resource usage tracking. Детальное руководство по интеграции для мониторинга и observability систем.
Обзор Endpoints¶
| Метод | Endpoint | Описание | Auth Required | Status Codes |
|---|---|---|---|---|
| GET | /api/ping |
Простой pong ответ | ❌ Нет | 200 |
| GET | /api/health |
Общий health check системы | ❌ Нет | 200, 503 |
| GET | /api/health/detailed |
Детальная информация всех компонентов | ❌ Нет | 200, 503 |
| GET | /api/version |
Версия системы и build info | ❌ Нет | 200 |
| GET | /api/health/database |
Database health check | ❌ Нет | 200, 503 |
| GET | /api/health/blockchain |
Blockchain health check | ❌ Нет | 200, 503 |
| GET | /api/health/business |
Business logic health check | ❌ Нет | 200, 503 |
| GET | /api/health/resources |
System resource usage | ❌ Нет | 200 |
| GET | /api/health/ready |
Kubernetes readiness probe | ❌ Нет | 200, 503 |
| GET | /api/health/live |
Kubernetes liveness probe | ❌ Нет | 200, 503 |
Архитектурные принципы:
- Public Endpoints: Все health endpoints не требуют аутентификации
- Status Code Based: HTTP status code отражает health status (200 = OK, 503 = Unhealthy)
- Comprehensive Service: Использует
HealthServiceдля координации проверок - Context Timeouts: Все проверки имеют configurable timeout (GetHealthCheckTimeout, GetQuickHealthTimeout)
- Component Isolation: Отдельные endpoints для database, blockchain, business logic
📡 GET /api/ping¶
Простой pong endpoint для базовой проверки доступности API.
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"status": "ok",
"success": true,
"message": "pong",
"timestamp": "2025-10-06T12:00:00Z"
},
"message": "Service is responding",
"timestamp": "2025-10-06T12:00:00Z",
"traceId": "req-abc123"
}
Поля ответа:
| Поле | Тип | Описание |
|---|---|---|
data.status |
string | Всегда "ok" |
data.success |
boolean | Всегда true |
data.message |
string | Всегда "pong" |
data.timestamp |
string | Текущее время сервера (ISO 8601) |
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | API доступен |
Пример cURL¶
# Простой ping
curl -X GET http://localhost:8080/api/ping
# С Accept header
curl -X GET http://localhost:8080/api/ping \
-H "Accept: application/json"
# Production endpoint
curl -X GET https://app.saga.surf/api/ping
Пример TypeScript¶
Базовая функция:
interface PingResponse {
status: string;
success: boolean;
message: string;
timestamp: string;
}
async function checkAPIAvailability(): Promise<boolean> {
try {
const response = await fetch('/api/ping', {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
if (!response.ok) {
return false;
}
const { data } = await response.json();
return data.success && data.status === 'ok';
} catch (error) {
console.error('Ping failed:', error);
return false;
}
}
// Usage
const isAvailable = await checkAPIAvailability();
console.log(`API available: ${isAvailable}`);
React Component:
import { useState, useEffect } from 'react';
function APIPingIndicator() {
const [available, setAvailable] = useState<boolean | null>(null);
useEffect(() => {
checkAvailability();
const interval = setInterval(checkAvailability, 30000); // Every 30 seconds
return () => clearInterval(interval);
}, []);
async function checkAvailability() {
const isAvailable = await checkAPIAvailability();
setAvailable(isAvailable);
}
if (available === null) return <div>Checking...</div>;
return (
<div className={`ping-indicator ${available ? 'online' : 'offline'}`}>
<span className="status-dot"></span>
<span className="status-text">
{available ? '✅ API Online' : '❌ API Offline'}
</span>
</div>
);
}
🏥 GET /api/health¶
Общий health check системы с информацией о количестве здоровых компонентов.
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Context Timeout: GetHealthCheckTimeout() (configurable, default safe value)
Ответ¶
Успех - Healthy (200 OK):
{
"success": true,
"data": {
"status": "healthy",
"message": "3/3 components healthy",
"timestamp": "2025-10-06T12:00:00Z",
"uptime": "2h15m30s",
"version": "1.0.0",
"healthy": true,
"componentCount": 3
},
"message": "Health check completed",
"timestamp": "2025-10-06T12:00:00Z",
"systemHealthy": true
}
Успех - Degraded (200 OK):
{
"success": false,
"data": {
"status": "degraded",
"message": "2/3 components healthy",
"timestamp": "2025-10-06T12:00:00Z",
"uptime": "2h15m30s",
"version": "1.0.0",
"healthy": false,
"componentCount": 3
},
"message": "Health check completed",
"timestamp": "2025-10-06T12:00:00Z",
"systemHealthy": false
}
Unhealthy (503 Service Unavailable):
{
"success": false,
"data": {
"status": "unhealthy",
"message": "0/3 components healthy",
"timestamp": "2025-10-06T12:00:00Z",
"uptime": "2h15m30s",
"version": "1.0.0",
"healthy": false,
"componentCount": 3
},
"message": "Health check completed",
"timestamp": "2025-10-06T12:00:00Z",
"systemHealthy": false
}
Поля ответа:
| Поле | Тип | Описание |
|---|---|---|
data.status |
string | Status: healthy, degraded, unhealthy, unknown |
data.message |
string | Краткое описание (e.g., "3/3 components healthy") |
data.timestamp |
string | Время проверки (ISO 8601) |
data.uptime |
string | Uptime системы (duration string) |
data.version |
string | Версия системы |
data.healthy |
boolean | true если status = healthy или degraded |
data.componentCount |
number | Количество проверенных компонентов |
systemHealthy |
boolean | Дублирует data.healthy для удобства |
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Система здорова или деградирована |
| 503 | Система нездорова |
Пример cURL¶
# Базовый health check
curl -X GET http://localhost:8080/api/health
# Проверка с обработкой status code
curl -X GET http://localhost:8080/api/health \
-w "\nHTTP Status: %{http_code}\n"
# Production endpoint
curl -X GET https://app.saga.surf/api/health | jq .
Пример TypeScript¶
Базовая функция:
interface HealthResponse {
status: 'healthy' | 'degraded' | 'unhealthy' | 'unknown';
message: string;
timestamp: string;
uptime: string;
version: string;
healthy: boolean;
componentCount: number;
}
async function checkSystemHealth(): Promise<HealthResponse> {
const response = await fetch('/api/health', {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
// Note: 503 is expected for unhealthy state
const { data } = await response.json();
return data;
}
// Usage
const health = await checkSystemHealth();
console.log(`System status: ${health.status}`);
console.log(`Components: ${health.message}`);
console.log(`Healthy: ${health.healthy ? 'Yes' : 'No'}`);
React Component:
import { useState, useEffect } from 'react';
function SystemHealthBadge() {
const [health, setHealth] = useState<HealthResponse | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
fetchHealth();
const interval = setInterval(fetchHealth, 60000); // Every 60 seconds
return () => clearInterval(interval);
}, []);
async function fetchHealth() {
try {
const healthData = await checkSystemHealth();
setHealth(healthData);
} catch (error) {
console.error('Health check failed:', error);
} finally {
setLoading(false);
}
}
if (loading) return <div>Loading...</div>;
if (!health) return <div className="health-badge error">⚠️ Unknown</div>;
const badgeClass = health.status === 'healthy' ? 'healthy' :
health.status === 'degraded' ? 'degraded' : 'unhealthy';
const icon = health.status === 'healthy' ? '✅' :
health.status === 'degraded' ? '⚠️' : '❌';
return (
<div className={`health-badge ${badgeClass}`}>
<span className="icon">{icon}</span>
<div className="details">
<div className="status">{health.status.toUpperCase()}</div>
<div className="message">{health.message}</div>
<div className="uptime">Uptime: {health.uptime}</div>
</div>
</div>
);
}
SWR Integration:
import useSWR from 'swr';
function useSystemHealth(refreshInterval = 60000) {
const { data, error, mutate } = useSWR(
'/api/health',
checkSystemHealth,
{ refreshInterval }
);
return {
health: data,
isLoading: !error && !data,
isError: error,
refresh: mutate
};
}
// Usage
function HealthDashboard() {
const { health, isLoading, isError, refresh } = useSystemHealth();
if (isLoading) return <div>Loading health status...</div>;
if (isError) return <div>Error loading health status</div>;
return (
<div className="health-dashboard">
<h2>System Health</h2>
<SystemHealthBadge />
<button onClick={refresh}>Refresh Now</button>
</div>
);
}
GET /api/health/detailed¶
Детальная информация о здоровье всех компонентов системы с latency и status каждого компонента.
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Context Timeout: GetComprehensiveHealthTimeout() (configurable, default safe value)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"overallStatus": "healthy",
"components": [
{
"component": "database",
"status": "healthy",
"message": "Database connection successful",
"latency": 5,
"details": {
"activeConnections": "3",
"maxConnections": "20"
},
"lastChecked": "2025-10-06T12:00:00Z"
},
{
"component": "blockchain",
"status": "healthy",
"message": "Blockchain node operational",
"latency": 150,
"details": {
"blockNumber": "12345678",
"networkID": "1337"
},
"lastChecked": "2025-10-06T12:00:00Z"
}
],
"checkedAt": "2025-10-06T12:00:00Z",
"message": "System health: 3 healthy, 0 degraded, 0 unhealthy"
},
"message": "Detailed health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Degraded (200 OK):
{
"success": false,
"data": {
"overallStatus": "degraded",
"components": [
{
"component": "database",
"status": "healthy",
"latency": 5,
"lastChecked": "2025-10-06T12:00:00Z"
},
{
"component": "blockchain",
"status": "degraded",
"message": "High latency detected",
"latency": 2500,
"details": {
"blockNumber": "12345678",
"latencyThreshold": "2000ms"
},
"lastChecked": "2025-10-06T12:00:00Z"
}
],
"checkedAt": "2025-10-06T12:00:00Z",
"message": "System health: 1 healthy, 1 degraded, 0 unhealthy"
},
"message": "Detailed health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Unhealthy (503 Service Unavailable):
{
"success": false,
"data": {
"overallStatus": "unhealthy",
"components": [
{
"component": "database",
"status": "unhealthy",
"message": "Database connection failed",
"latency": 5000,
"details": {
"error": "connection timeout"
},
"lastChecked": "2025-10-06T12:00:00Z"
}
],
"checkedAt": "2025-10-06T12:00:00Z",
"message": "System health: 0 healthy, 0 degraded, 1 unhealthy"
},
"message": "Detailed health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Поля ответа:
| Поле | Тип | Описание |
|---|---|---|
data.overallStatus |
string | Общий статус: healthy, degraded, unhealthy |
data.components |
array | Массив проверок компонентов |
data.components[].component |
string | Имя компонента (database, blockchain) |
data.components[].status |
string | Статус компонента: healthy, degraded, unhealthy |
data.components[].message |
string | Описание статуса |
data.components[].latency |
number | Latency проверки в миллисекундах |
data.components[].details |
object | Дополнительная информация (key-value pairs) |
data.components[].lastChecked |
string | Время последней проверки (ISO 8601) |
data.checkedAt |
string | Время текущей проверки (ISO 8601) |
data.message |
string | Сводка статусов (e.g., "System health: 3 healthy, 0 degraded, 0 unhealthy") |
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Система здорова или деградирована |
| 503 | Система нездорова (критические компоненты недоступны) |
Пример cURL¶
# Детальный health check
curl -X GET http://localhost:8080/api/health/detailed
# Pretty print с jq
curl -X GET http://localhost:8080/api/health/detailed | jq .
# Фильтр только unhealthy компонентов
curl -X GET http://localhost:8080/api/health/detailed | \
jq '.data.components[] | select(.status != "healthy")'
Пример TypeScript¶
Базовая функция:
interface ComponentHealth {
component: string;
status: 'healthy' | 'degraded' | 'unhealthy';
message?: string;
latency: number;
details?: Record<string, string>;
lastChecked: string;
}
interface DetailedHealthResponse {
overallStatus: 'healthy' | 'degraded' | 'unhealthy';
components: ComponentHealth[];
checkedAt: string;
message: string;
}
async function checkDetailedHealth(): Promise<DetailedHealthResponse> {
const response = await fetch('/api/health/detailed', {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
const { data } = await response.json();
return data;
}
// Usage
const detailed = await checkDetailedHealth();
console.log(`Overall: ${detailed.overallStatus}`);
detailed.components.forEach(comp => {
console.log(` ${comp.component}: ${comp.status} (${comp.latency}ms)`);
});
React Component:
import { useState, useEffect } from 'react';
function DetailedHealthDashboard() {
const [health, setHealth] = useState<DetailedHealthResponse | null>(null);
useEffect(() => {
fetchHealth();
const interval = setInterval(fetchHealth, 30000); // Every 30 seconds
return () => clearInterval(interval);
}, []);
async function fetchHealth() {
try {
const healthData = await checkDetailedHealth();
setHealth(healthData);
} catch (error) {
console.error('Detailed health check failed:', error);
}
}
if (!health) return <div>Loading...</div>;
return (
<div className="detailed-health-dashboard">
<h2>System Health</h2>
<div className={`overall-status ${health.overallStatus}`}>
{health.overallStatus.toUpperCase()}
</div>
<p>{health.message}</p>
<h3>Components</h3>
<div className="components-grid">
{health.components.map(comp => (
<div key={comp.component} className={`component-card ${comp.status}`}>
<h4>{comp.component}</h4>
<div className="status">{comp.status}</div>
{comp.message && <p className="message">{comp.message}</p>}
<div className="latency">Latency: {comp.latency}ms</div>
{comp.details && (
<div className="details">
{Object.entries(comp.details).map(([key, value]) => (
<div key={key} className="detail-item">
<span className="key">{key}:</span>
<span className="value">{value}</span>
</div>
))}
</div>
)}
<div className="last-checked">
Checked: {new Date(comp.lastChecked).toLocaleTimeString()}
</div>
</div>
))}
</div>
</div>
);
}
Alerting Integration:
interface HealthAlert {
component: string;
severity: 'warning' | 'critical';
message: string;
latency: number;
}
function analyzeHealthIssues(health: DetailedHealthResponse): HealthAlert[] {
const alerts: HealthAlert[] = [];
health.components.forEach(comp => {
// Unhealthy components
if (comp.status === 'unhealthy') {
alerts.push({
component: comp.component,
severity: 'critical',
message: comp.message || 'Component unhealthy',
latency: comp.latency
});
}
// Degraded components
if (comp.status === 'degraded') {
alerts.push({
component: comp.component,
severity: 'warning',
message: comp.message || 'Component degraded',
latency: comp.latency
});
}
// High latency (>1000ms)
if (comp.latency > 1000) {
alerts.push({
component: comp.component,
severity: 'warning',
message: `High latency detected: ${comp.latency}ms`,
latency: comp.latency
});
}
});
return alerts;
}
// Usage
const health = await checkDetailedHealth();
const alerts = analyzeHealthIssues(health);
if (alerts.length > 0) {
console.warn(`⚠️ ${alerts.length} health issues detected:`);
alerts.forEach(alert => {
console.warn(` ${alert.severity.toUpperCase()}: ${alert.component} - ${alert.message}`);
});
}
GET /api/version¶
Получить информацию о версии системы, build metadata и environment.
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"version": "2.1.536",
"service": "saga-backend",
"timestamp": "2025-10-06T12:00:00Z",
"buildInfo": {
"version": "2.1.536",
"environment": "development",
"goVersion": "go1.21.5",
"buildTime": "build-time-placeholder",
"gitCommit": "git-commit-placeholder"
}
},
"message": "Version information retrieved",
"timestamp": "2025-10-06T12:00:00Z",
"traceId": "req-abc123"
}
Unknown Version (200 OK):
{
"success": true,
"data": {
"version": "unknown",
"service": "saga-backend",
"timestamp": "2025-10-06T12:00:00Z",
"buildInfo": {
"version": "unknown",
"environment": "development",
"goVersion": "go1.21.5",
"buildTime": "build-time-placeholder",
"gitCommit": "git-commit-placeholder"
}
},
"message": "Version information retrieved",
"timestamp": "2025-10-06T12:00:00Z"
}
Поля ответа:
| Поле | Тип | Описание |
|---|---|---|
data.version |
string | Semantic version (MAJOR.MINOR.PATCH) или "unknown" |
data.service |
string | Имя сервиса (из GetServiceName()) |
data.timestamp |
string | Текущее время (ISO 8601) |
data.buildInfo.version |
string | Версия (дублирует data.version) |
data.buildInfo.environment |
string | Environment (development, staging, production) |
data.buildInfo.goVersion |
string | Версия Go runtime (e.g., "go1.21.5") |
data.buildInfo.buildTime |
string | Время сборки (может быть placeholder) |
data.buildInfo.gitCommit |
string | Git commit hash (может быть placeholder) |
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Версия получена успешно |
Примечания:
- Версия читается из
VERSIONфайла в корне проекта - Если файл недоступен, возвращается
"unknown" - Environment определяется из
GetEnvironment()конфигурации - Build metadata может содержать placeholder значения если не настроены build flags
Пример cURL¶
# Получить версию
curl -X GET http://localhost:8080/api/version
# Извлечь конкретное поле
curl -X GET http://localhost:8080/api/version | jq '.data.version'
# Production endpoint
curl -X GET https://app.saga.surf/api/version | jq .
Пример TypeScript¶
Базовая функция:
interface BuildInfo {
version: string;
environment: string;
goVersion: string;
buildTime: string;
gitCommit: string;
}
interface VersionResponse {
version: string;
service: string;
timestamp: string;
buildInfo: BuildInfo;
}
async function getVersionInfo(): Promise<VersionResponse> {
const response = await fetch('/api/version', {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const { data } = await response.json();
return data;
}
// Usage
const version = await getVersionInfo();
console.log(`Version: ${version.version}`);
console.log(`Service: ${version.service}`);
console.log(`Environment: ${version.buildInfo.environment}`);
console.log(`Go Version: ${version.buildInfo.goVersion}`);
React Component:
import { useState, useEffect } from 'react';
function VersionInfo() {
const [version, setVersion] = useState<VersionResponse | null>(null);
useEffect(() => {
fetchVersion();
}, []);
async function fetchVersion() {
try {
const versionData = await getVersionInfo();
setVersion(versionData);
} catch (error) {
console.error('Failed to fetch version:', error);
}
}
if (!version) return <div>Loading version info...</div>;
return (
<div className="version-info">
<h3>System Version</h3>
<div className="version-details">
<div className="detail-row">
<span className="label">Version:</span>
<span className="value">{version.version}</span>
</div>
<div className="detail-row">
<span className="label">Service:</span>
<span className="value">{version.service}</span>
</div>
<div className="detail-row">
<span className="label">Environment:</span>
<span className="value">{version.buildInfo.environment}</span>
</div>
<div className="detail-row">
<span className="label">Go Version:</span>
<span className="value">{version.buildInfo.goVersion}</span>
</div>
<div className="detail-row">
<span className="label">Build Time:</span>
<span className="value">{version.buildInfo.buildTime}</span>
</div>
<div className="detail-row">
<span className="label">Git Commit:</span>
<span className="value code">{version.buildInfo.gitCommit}</span>
</div>
</div>
</div>
);
}
Version Comparison:
function parseVersion(version: string): number[] {
if (version === 'unknown') return [0, 0, 0];
return version.split('.').map(Number);
}
function compareVersions(v1: string, v2: string): number {
const parts1 = parseVersion(v1);
const parts2 = parseVersion(v2);
for (let i = 0; i < 3; i++) {
if (parts1[i] > parts2[i]) return 1;
if (parts1[i] < parts2[i]) return -1;
}
return 0;
}
// Usage
const currentVersion = await getVersionInfo();
const minRequiredVersion = "2.1.0";
if (compareVersions(currentVersion.version, minRequiredVersion) < 0) {
console.warn(`⚠️ Version ${currentVersion.version} is below minimum required ${minRequiredVersion}`);
}
🗄️ GET /api/health/database¶
Проверка здоровья базы данных (PostgreSQL connection, connection pool status).
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Context Timeout: GetQuickHealthTimeout() (quick timeout for DB check)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"component": "database",
"status": "healthy",
"message": "Database connection successful",
"latency": 5,
"details": {
"activeConnections": "3",
"idleConnections": "7",
"maxConnections": "20"
},
"lastChecked": "2025-10-06T12:00:00Z"
},
"message": "Database health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Unhealthy (503 Service Unavailable):
{
"success": false,
"data": {
"component": "database",
"status": "unhealthy",
"message": "Database connection failed",
"latency": 5000,
"details": {
"error": "connection timeout after 5s"
},
"lastChecked": "2025-10-06T12:00:00Z"
},
"message": "Database health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Поля ответа:
| Поле | Тип | Описание |
|---|---|---|
data.component |
string | Всегда "database" |
data.status |
string | Status: healthy, degraded, unhealthy |
data.message |
string | Описание статуса |
data.latency |
number | Latency проверки в миллисекундах |
data.details |
object | Дополнительная информация (connections, errors) |
data.lastChecked |
string | Время последней проверки (ISO 8601) |
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Database здорова |
| 503 | Database нездорова или недоступна |
Пример cURL¶
# Database health check
curl -X GET http://localhost:8080/api/health/database
# Мониторинг database health
watch -n 10 'curl -s http://localhost:8080/api/health/database | jq .'
Пример TypeScript¶
Базовая функция:
async function checkDatabaseHealth(): Promise<ComponentHealth> {
const response = await fetch('/api/health/database', {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
const { data } = await response.json();
return data;
}
// Usage
const dbHealth = await checkDatabaseHealth();
console.log(`Database: ${dbHealth.status} (${dbHealth.latency}ms)`);
if (dbHealth.details) {
console.log(`Active connections: ${dbHealth.details.activeConnections}`);
}
React Component:
function DatabaseHealthMonitor() {
const [health, setHealth] = useState<ComponentHealth | null>(null);
useEffect(() => {
fetchHealth();
const interval = setInterval(fetchHealth, 30000);
return () => clearInterval(interval);
}, []);
async function fetchHealth() {
try {
const dbHealth = await checkDatabaseHealth();
setHealth(dbHealth);
} catch (error) {
console.error('Database health check failed:', error);
}
}
if (!health) return <div>Loading...</div>;
return (
<div className={`db-health ${health.status}`}>
<h4>Database Health</h4>
<div className="status-badge">{health.status}</div>
<p>{health.message}</p>
<div className="latency">Response time: {health.latency}ms</div>
{health.details && (
<div className="connection-pool">
<p>Active: {health.details.activeConnections}</p>
<p>Idle: {health.details.idleConnections}</p>
<p>Max: {health.details.maxConnections}</p>
</div>
)}
</div>
);
}
⛓️ GET /api/health/blockchain¶
Проверка здоровья blockchain подключения.
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Context Timeout: GetHealthCheckTimeout() (configurable timeout)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"component": "blockchain",
"status": "healthy",
"message": "Blockchain node operational",
"latency": 150,
"details": {
"blockNumber": "12345678",
"networkID": "1337",
"nodeVersion": "anvil"
},
"lastChecked": "2025-10-06T12:00:00Z"
},
"message": "Blockchain health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Unhealthy (503 Service Unavailable):
{
"success": false,
"data": {
"component": "blockchain",
"status": "unhealthy",
"message": "Blockchain node unreachable",
"latency": 5000,
"details": {
"error": "connection timeout",
"rpcUrl": "http://188.42.218.164:8545"
},
"lastChecked": "2025-10-06T12:00:00Z"
},
"message": "Blockchain health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Поля ответа:
| Поле | Тип | Описание |
|---|---|---|
data.component |
string | Всегда "blockchain" |
data.status |
string | Status: healthy, degraded, unhealthy |
data.message |
string | Описание статуса |
data.latency |
number | Latency проверки в миллисекундах |
data.details |
object | Информация о ноде (blockNumber, networkID, errors) |
data.lastChecked |
string | Время последней проверки (ISO 8601) |
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Blockchain здоров |
| 503 | Blockchain нездоров или недоступен |
Пример cURL¶
# Blockchain health check
curl -X GET http://localhost:8080/api/health/blockchain
# Мониторинг blockchain health
watch -n 15 'curl -s http://localhost:8080/api/health/blockchain | jq .'
Пример TypeScript¶
Базовая функция:
async function checkBlockchainHealth(): Promise<ComponentHealth> {
const response = await fetch('/api/health/blockchain', {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
const { data } = await response.json();
return data;
}
// Usage
const bcHealth = await checkBlockchainHealth();
console.log(`Blockchain: ${bcHealth.status} (${bcHealth.latency}ms)`);
if (bcHealth.details) {
console.log(`Block: ${bcHealth.details.blockNumber}`);
console.log(`Network: ${bcHealth.details.networkID}`);
}
React Component:
function BlockchainHealthMonitor() {
const [health, setHealth] = useState<ComponentHealth | null>(null);
useEffect(() => {
fetchHealth();
const interval = setInterval(fetchHealth, 15000); // Every 15 seconds
return () => clearInterval(interval);
}, []);
async function fetchHealth() {
try {
const bcHealth = await checkBlockchainHealth();
setHealth(bcHealth);
} catch (error) {
console.error('Blockchain health check failed:', error);
}
}
if (!health) return <div>Loading...</div>;
return (
<div className={`blockchain-health ${health.status}`}>
<h4>Blockchain Health</h4>
<div className="status-badge">{health.status}</div>
<p>{health.message}</p>
<div className="latency">Response time: {health.latency}ms</div>
{health.details && (
<div className="blockchain-details">
<p>Block: #{health.details.blockNumber}</p>
<p>Network ID: {health.details.networkID}</p>
<p>Node: {health.details.nodeVersion}</p>
</div>
)}
</div>
);
}
💼 GET /api/health/business¶
Проверка здоровья бизнес-логики системы.
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Context Timeout: GetHealthCheckTimeout() (configurable timeout)
Примечание: В текущей implementation использует database health как proxy для business logic health.
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"component": "database",
"status": "healthy",
"message": "Database connection successful",
"latency": 5,
"details": {
"activeConnections": "3",
"businessLogicReady": "true"
},
"lastChecked": "2025-10-06T12:00:00Z"
},
"message": "Business logic health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Unhealthy (503 Service Unavailable):
{
"success": false,
"data": {
"component": "database",
"status": "unhealthy",
"message": "Business logic unavailable",
"latency": 5000,
"details": {
"error": "database connection failed"
},
"lastChecked": "2025-10-06T12:00:00Z"
},
"message": "Business logic health check completed",
"timestamp": "2025-10-06T12:00:00Z"
}
Поля ответа:
| Поле | Тип | Описание |
|---|---|---|
data.component |
string | Component name (currently "database") |
data.status |
string | Status: healthy, degraded, unhealthy |
data.message |
string | Описание статуса |
data.latency |
number | Latency проверки в миллисекундах |
data.details |
object | Дополнительная информация |
data.lastChecked |
string | Время последней проверки (ISO 8601) |
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Business logic здорова |
| 503 | Business logic нездорова или недоступна |
Пример cURL¶
# Business logic health check
curl -X GET http://localhost:8080/api/health/business
# Мониторинг business health
watch -n 30 'curl -s http://localhost:8080/api/health/business | jq .'
Пример TypeScript¶
Базовая функция:
async function checkBusinessHealth(): Promise<ComponentHealth> {
const response = await fetch('/api/health/business', {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
const { data } = await response.json();
return data;
}
// Usage
const businessHealth = await checkBusinessHealth();
console.log(`Business Logic: ${businessHealth.status}`);
GET /api/health/resources¶
Получить информацию об использовании системных ресурсов (memory, goroutines).
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"resourceUsage": {
"memory": {
"alloc": 12345678,
"sys": 23456789
},
"goroutines": 42
},
"timestamp": "2025-10-06T12:00:00Z"
},
"message": "Resource usage retrieved",
"timestamp": "2025-10-06T12:00:00Z",
"traceId": "req-abc123"
}
Поля ответа:
| Поле | Тип | Описание |
|---|---|---|
data.resourceUsage.memory.alloc |
number | Allocated memory в bytes (currently used) |
data.resourceUsage.memory.sys |
number | Total memory obtained from OS в bytes |
data.resourceUsage.goroutines |
number | Количество active goroutines |
data.timestamp |
string | Время снятия метрик (ISO 8601) |
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Resource usage получен успешно |
Примечания:
alloc— память, выделенная и еще не освобожденнаяsys— вся память, полученная от OS (включая освобожденную, но не возвращенную)goroutines— количество активных goroutines (нормально: 10-100, высоко: >1000)
Пример cURL¶
# Resource usage
curl -X GET http://localhost:8080/api/health/resources
# Мониторинг resource usage
watch -n 5 'curl -s http://localhost:8080/api/health/resources | jq .'
# Конвертировать bytes в MB
curl -s http://localhost:8080/api/health/resources | \
jq '.data.resourceUsage.memory | {allocMB: (.alloc / 1024 / 1024), sysMB: (.sys / 1024 / 1024)}'
Пример TypeScript¶
Базовая функция:
interface MemoryUsage {
alloc: number;
sys: number;
}
interface ResourceUsage {
memory: MemoryUsage;
goroutines: number;
}
interface ResourceUsageResponse {
resourceUsage: ResourceUsage;
timestamp: string;
}
async function getResourceUsage(): Promise<ResourceUsageResponse> {
const response = await fetch('/api/health/resources', {
method: 'GET',
headers: { 'Accept': 'application/json' }
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const { data } = await response.json();
return data;
}
// Helper для конвертации bytes в MB
function bytesToMB(bytes: number): number {
return Math.round(bytes / 1024 / 1024 * 100) / 100;
}
// Usage
const resources = await getResourceUsage();
console.log(`Memory Allocated: ${bytesToMB(resources.resourceUsage.memory.alloc)} MB`);
console.log(`Memory System: ${bytesToMB(resources.resourceUsage.memory.sys)} MB`);
console.log(`Goroutines: ${resources.resourceUsage.goroutines}`);
React Component:
import { useState, useEffect } from 'react';
function ResourceUsageMonitor() {
const [resources, setResources] = useState<ResourceUsageResponse | null>(null);
const [history, setHistory] = useState<ResourceUsage[]>([]);
useEffect(() => {
fetchResources();
const interval = setInterval(fetchResources, 5000); // Every 5 seconds
return () => clearInterval(interval);
}, []);
async function fetchResources() {
try {
const resourceData = await getResourceUsage();
setResources(resourceData);
// Keep last 60 data points (5 minutes at 5-second intervals)
setHistory(prev => [...prev, resourceData.resourceUsage].slice(-60));
} catch (error) {
console.error('Resource usage fetch failed:', error);
}
}
if (!resources) return <div>Loading...</div>;
const allocMB = bytesToMB(resources.resourceUsage.memory.alloc);
const sysMB = bytesToMB(resources.resourceUsage.memory.sys);
const goroutines = resources.resourceUsage.goroutines;
// Calculate trends
const avgGoroutines = history.length > 0
? history.reduce((sum, r) => sum + r.goroutines, 0) / history.length
: 0;
return (
<div className="resource-usage-monitor">
<h4>System Resources</h4>
<div className="metric-grid">
<div className="metric">
<span className="label">Memory Allocated:</span>
<span className="value">{allocMB} MB</span>
</div>
<div className="metric">
<span className="label">Memory System:</span>
<span className="value">{sysMB} MB</span>
</div>
<div className="metric">
<span className="label">Goroutines:</span>
<span className="value">{goroutines}</span>
</div>
{history.length > 10 && (
<div className="metric">
<span className="label">Avg Goroutines (5min):</span>
<span className="value">{Math.round(avgGoroutines)}</span>
</div>
)}
</div>
{goroutines > 1000 && (
<div className="warning">
⚠️ High goroutine count detected ({goroutines})
</div>
)}
</div>
);
}
Alerting:
interface ResourceAlert {
severity: 'warning' | 'critical';
message: string;
value: number;
}
function analyzeResourceUsage(resources: ResourceUsage): ResourceAlert[] {
const alerts: ResourceAlert[] = [];
const allocMB = resources.memory.alloc / 1024 / 1024;
const sysMB = resources.memory.sys / 1024 / 1024;
// High memory usage (>500MB allocated)
if (allocMB > 500) {
alerts.push({
severity: 'warning',
message: `High memory allocation: ${allocMB.toFixed(2)} MB`,
value: allocMB
});
}
// Critical memory usage (>1GB allocated)
if (allocMB > 1024) {
alerts.push({
severity: 'critical',
message: `Critical memory allocation: ${allocMB.toFixed(2)} MB`,
value: allocMB
});
}
// High goroutine count (>1000)
if (resources.goroutines > 1000) {
alerts.push({
severity: 'warning',
message: `High goroutine count: ${resources.goroutines}`,
value: resources.goroutines
});
}
// Critical goroutine count (>5000)
if (resources.goroutines > 5000) {
alerts.push({
severity: 'critical',
message: `Critical goroutine count: ${resources.goroutines}`,
value: resources.goroutines
});
}
return alerts;
}
// Usage
const resources = await getResourceUsage();
const alerts = analyzeResourceUsage(resources.resourceUsage);
if (alerts.length > 0) {
console.warn(`⚠️ ${alerts.length} resource alerts:`);
alerts.forEach(alert => {
console.warn(` ${alert.severity.toUpperCase()}: ${alert.message}`);
});
}
GET /api/health/ready¶
Kubernetes readiness probe — проверяет готовность системы к обслуживанию запросов.
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Context Timeout: GetQuickHealthTimeout() (quick timeout for readiness check)
Примечание: Проверяет критичные компоненты (database) для определения готовности.
Ответ¶
Ready (200 OK):
Not Ready (503 Service Unavailable):
Поля ответа:
Readiness probe возвращает простой текстовый ответ (не JSON):
- "ready" — система готова к обслуживанию запросов
- "not ready" — система не готова (database unhealthy)
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Система готова к обслуживанию запросов |
| 503 | Система не готова (database unhealthy или degraded) |
Примечания:
- Проверяет database health (критичный компонент)
- Ready если database status =
healthyилиdegraded - Not ready если database status =
unhealthy
Пример cURL¶
# Readiness probe
curl -X GET http://localhost:8080/api/health/ready
# Проверка с status code
curl -X GET http://localhost:8080/api/health/ready \
-w "\nHTTP Status: %{http_code}\n"
# Continuous monitoring
watch -n 5 'curl -s -w "\nStatus: %{http_code}\n" http://localhost:8080/api/health/ready'
Kubernetes Configuration¶
Readiness Probe:
readinessProbe:
httpGet:
path: /api/health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
Как работает в Kubernetes:
- initialDelaySeconds: 10 — ждать 10 секунд после старта контейнера
- periodSeconds: 5 — проверять каждые 5 секунд
- timeoutSeconds: 3 — timeout для HTTP запроса
- successThreshold: 1 — 1 успешная проверка = ready
- failureThreshold: 3 — 3 неуспешные проверки подряд = not ready
Поведение Kubernetes:
- Not ready pods не получают трафик от Service
- Load balancer не отправляет запросы на not ready pods
- Deployment rollout ждет пока все pods станут ready
Пример TypeScript¶
Базовая функция:
async function checkReadiness(): Promise<boolean> {
try {
const response = await fetch('/api/health/ready', {
method: 'GET'
});
return response.ok; // true if 200, false if 503
} catch (error) {
console.error('Readiness check failed:', error);
return false;
}
}
// Usage
const isReady = await checkReadiness();
console.log(`System ready: ${isReady}`);
React Component:
function ReadinessIndicator() {
const [ready, setReady] = useState<boolean | null>(null);
useEffect(() => {
checkReady();
const interval = setInterval(checkReady, 5000); // Every 5 seconds
return () => clearInterval(interval);
}, []);
async function checkReady() {
const isReady = await checkReadiness();
setReady(isReady);
}
if (ready === null) return <div>Checking readiness...</div>;
return (
<div className={`readiness-indicator ${ready ? 'ready' : 'not-ready'}`}>
{ready ? '✅ System Ready' : '⏳ System Not Ready'}
</div>
);
}
💚 GET /api/health/live¶
Kubernetes liveness probe — проверяет что приложение живо и работает.
Запрос¶
Аутентификация не требуется.
Query параметры: Нет
Context Timeout: GetEmergencyHealthTimeout() (emergency quick timeout for liveness)
Примечание: Проверяет базовую работоспособность приложения.
Ответ¶
Alive (200 OK):
Unhealthy (503 Service Unavailable):
Поля ответа:
Liveness probe возвращает простой текстовый ответ (не JSON):
- "alive" — приложение работает
- "unhealthy" — приложение не работает (требуется restart)
Коды статуса:
| Код статуса | Описание |
|---|---|
| 200 | Приложение живо |
| 503 | Приложение не отвечает (требуется restart) |
Примечания:
- Проверяет базовую работоспособность через
GetHealthStatus() - Alive если
IsHealthy = true - Unhealthy если
IsHealthy = false
Пример cURL¶
# Liveness probe
curl -X GET http://localhost:8080/api/health/live
# Проверка с status code
curl -X GET http://localhost:8080/api/health/live \
-w "\nHTTP Status: %{http_code}\n"
# Continuous monitoring
watch -n 10 'curl -s -w "\nStatus: %{http_code}\n" http://localhost:8080/api/health/live'
Kubernetes Configuration¶
Liveness Probe:
livenessProbe:
httpGet:
path: /api/health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
Как работает в Kubernetes:
- initialDelaySeconds: 30 — ждать 30 секунд после старта (больше чем readiness)
- periodSeconds: 10 — проверять каждые 10 секунд
- timeoutSeconds: 5 — timeout для HTTP запроса
- successThreshold: 1 — 1 успешная проверка = alive
- failureThreshold: 3 — 3 неуспешные проверки подряд = restart container
Поведение Kubernetes:
- Failed liveness probe → Kubernetes restarts container
- Используется для recovery от deadlocks, infinite loops
- Более агрессивный чем readiness (restart vs не давать трафик)
Пример TypeScript¶
Базовая функция:
async function checkLiveness(): Promise<boolean> {
try {
const response = await fetch('/api/health/live', {
method: 'GET'
});
return response.ok; // true if 200, false if 503
} catch (error) {
console.error('Liveness check failed:', error);
return false;
}
}
// Usage
const isAlive = await checkLiveness();
console.log(`Application alive: ${isAlive}`);
React Component:
function LivenessIndicator() {
const [alive, setAlive] = useState<boolean | null>(null);
const [failureCount, setFailureCount] = useState(0);
useEffect(() => {
checkLive();
const interval = setInterval(checkLive, 10000); // Every 10 seconds
return () => clearInterval(interval);
}, []);
async function checkLive() {
const isAlive = await checkLiveness();
setAlive(isAlive);
if (!isAlive) {
setFailureCount(prev => prev + 1);
} else {
setFailureCount(0);
}
}
if (alive === null) return <div>Checking liveness...</div>;
return (
<div className={`liveness-indicator ${alive ? 'alive' : 'dead'}`}>
{alive ? '💚 Application Alive' : '💀 Application Unhealthy'}
{failureCount > 0 && (
<div className="failure-count">
Failed checks: {failureCount}/3 (restart at 3)
</div>
)}
</div>
);
}
Use Cases¶
1. Comprehensive Health Dashboard¶
Комбинирование всех health endpoints для создания полного dashboard.
interface HealthDashboardData {
general: HealthResponse;
detailed: DetailedHealthResponse;
version: VersionResponse;
database: ComponentHealth;
blockchain: ComponentHealth;
business: ComponentHealth;
resources: ResourceUsageResponse;
readiness: boolean;
liveness: boolean;
}
async function fetchHealthDashboard(): Promise<HealthDashboardData> {
const [
general,
detailed,
version,
database,
blockchain,
business,
resources,
readiness,
liveness
] = await Promise.all([
checkSystemHealth(),
checkDetailedHealth(),
getVersionInfo(),
checkDatabaseHealth(),
checkBlockchainHealth(),
checkBusinessHealth(),
getResourceUsage(),
checkReadiness(),
checkLiveness()
]);
return {
general,
detailed,
version,
database,
blockchain,
business,
resources,
readiness,
liveness
};
}
function HealthDashboard() {
const [dashboard, setDashboard] = useState<HealthDashboardData | null>(null);
useEffect(() => {
fetchData();
const interval = setInterval(fetchData, 30000);
return () => clearInterval(interval);
}, []);
async function fetchData() {
try {
const data = await fetchHealthDashboard();
setDashboard(data);
} catch (error) {
console.error('Failed to fetch dashboard data:', error);
}
}
if (!dashboard) return <div>Loading dashboard...</div>;
return (
<div className="health-dashboard">
<header>
<h1>System Health Dashboard</h1>
<div className="version">Version: {dashboard.version.version}</div>
</header>
<div className="status-grid">
<div className="status-card overall">
<h3>Overall Status</h3>
<div className={`badge ${dashboard.general.status}`}>
{dashboard.general.status}
</div>
<p>{dashboard.general.message}</p>
</div>
<div className="status-card">
<h3>Database</h3>
<StatusBadge component={dashboard.database} />
</div>
<div className="status-card">
<h3>Blockchain</h3>
<StatusBadge component={dashboard.blockchain} />
</div>
<div className="status-card">
<h3>Business Logic</h3>
<StatusBadge component={dashboard.business} />
</div>
</div>
<div className="resources-section">
<h3>System Resources</h3>
<ResourceUsageMonitor />
</div>
<div className="probes-section">
<h3>Kubernetes Probes</h3>
<div className="probes-grid">
<div className={`probe ${dashboard.readiness ? 'ready' : 'not-ready'}`}>
<span>Readiness:</span>
<span>{dashboard.readiness ? '✅ Ready' : '⏳ Not Ready'}</span>
</div>
<div className={`probe ${dashboard.liveness ? 'alive' : 'dead'}`}>
<span>Liveness:</span>
<span>{dashboard.liveness ? '💚 Alive' : '💀 Unhealthy'}</span>
</div>
</div>
</div>
<div className="detailed-components">
<h3>Detailed Component Status</h3>
<DetailedHealthDashboard />
</div>
</div>
);
}
2. Automated Health Monitoring¶
Автоматический мониторинг с alerting при проблемах.
interface HealthAlert {
timestamp: Date;
severity: 'info' | 'warning' | 'critical';
component: string;
message: string;
details?: any;
}
class HealthMonitor {
private alerts: HealthAlert[] = [];
private checkInterval: NodeJS.Timeout | null = null;
start(intervalMs = 30000) {
this.checkInterval = setInterval(() => this.performChecks(), intervalMs);
console.log('Health monitoring started');
}
stop() {
if (this.checkInterval) {
clearInterval(this.checkInterval);
console.log('Health monitoring stopped');
}
}
private async performChecks() {
try {
const [general, detailed, resources, readiness, liveness] = await Promise.all([
checkSystemHealth(),
checkDetailedHealth(),
getResourceUsage(),
checkReadiness(),
checkLiveness()
]);
// Check overall status
if (general.status === 'unhealthy') {
this.addAlert({
severity: 'critical',
component: 'system',
message: `System unhealthy: ${general.message}`
});
}
// Check component health
detailed.components.forEach(comp => {
if (comp.status === 'unhealthy') {
this.addAlert({
severity: 'critical',
component: comp.component,
message: comp.message || 'Component unhealthy',
details: comp.details
});
} else if (comp.status === 'degraded') {
this.addAlert({
severity: 'warning',
component: comp.component,
message: comp.message || 'Component degraded',
details: comp.details
});
}
});
// Check resource usage
const allocMB = resources.resourceUsage.memory.alloc / 1024 / 1024;
if (allocMB > 1024) {
this.addAlert({
severity: 'critical',
component: 'resources',
message: `Critical memory usage: ${allocMB.toFixed(2)} MB`
});
} else if (allocMB > 500) {
this.addAlert({
severity: 'warning',
component: 'resources',
message: `High memory usage: ${allocMB.toFixed(2)} MB`
});
}
if (resources.resourceUsage.goroutines > 5000) {
this.addAlert({
severity: 'critical',
component: 'resources',
message: `Critical goroutine count: ${resources.resourceUsage.goroutines}`
});
} else if (resources.resourceUsage.goroutines > 1000) {
this.addAlert({
severity: 'warning',
component: 'resources',
message: `High goroutine count: ${resources.resourceUsage.goroutines}`
});
}
// Check probes
if (!readiness) {
this.addAlert({
severity: 'warning',
component: 'readiness',
message: 'System not ready'
});
}
if (!liveness) {
this.addAlert({
severity: 'critical',
component: 'liveness',
message: 'System not alive (requires restart)'
});
}
} catch (error) {
this.addAlert({
severity: 'critical',
component: 'monitoring',
message: `Health check failed: ${error.message}`
});
}
}
private addAlert(alert: Omit<HealthAlert, 'timestamp'>) {
const newAlert = {
...alert,
timestamp: new Date()
};
this.alerts.push(newAlert);
this.notifyAlert(newAlert);
// Keep last 100 alerts
if (this.alerts.length > 100) {
this.alerts = this.alerts.slice(-100);
}
}
private notifyAlert(alert: HealthAlert) {
const icon = alert.severity === 'critical' ? '🚨' :
alert.severity === 'warning' ? '⚠️' : 'ℹ️';
console.log(`${icon} [${alert.severity.toUpperCase()}] ${alert.component}: ${alert.message}`);
// Можно интегрировать с PagerDuty, email
if (alert.severity === 'critical') {
this.sendCriticalAlert(alert);
}
}
private async sendCriticalAlert(alert: HealthAlert) {
// Integration с alerting системами
console.error('CRITICAL ALERT:', alert);
// Пример: отправка email
// await sendEmailNotification({
// to: 'team@saga.surf',
// subject: `🚨 CRITICAL: ${alert.component}`,
// text: `${alert.message}`,
// details: alert.details
// });
}
getAlerts(severity?: HealthAlert['severity']): HealthAlert[] {
if (severity) {
return this.alerts.filter(a => a.severity === severity);
}
return [...this.alerts];
}
clearAlerts() {
this.alerts = [];
}
}
// Usage
const monitor = new HealthMonitor();
monitor.start(30000); // Check every 30 seconds
// Получить критические alerts
const criticalAlerts = monitor.getAlerts('critical');
console.log(`Critical alerts: ${criticalAlerts.length}`);
3. Kubernetes Integration¶
Полная интеграция с Kubernetes probes и health checks.
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: saga-backend
namespace: saga
spec:
replicas: 3
selector:
matchLabels:
app: saga-backend
template:
metadata:
labels:
app: saga-backend
spec:
containers:
- name: saga-backend
image: saga-backend:latest
ports:
- containerPort: 8080
name: http
# Readiness Probe
readinessProbe:
httpGet:
path: /api/health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
# Liveness Probe
livenessProbe:
httpGet:
path: /api/health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
# Startup Probe (для медленного старта)
startupProbe:
httpGet:
path: /api/health/live
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 30 # 150 seconds max startup time
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: saga-secrets
key: db-password
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: saga-secrets
key: jwt-secret
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: saga-backend
namespace: saga
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
app: saga-backend
---
# servicemonitor.yaml (Prometheus monitoring)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: saga-backend
namespace: saga
spec:
selector:
matchLabels:
app: saga-backend
endpoints:
- port: http
path: /api/health/detailed
interval: 30s
4. Prometheus Metrics Export¶
Конвертация health endpoints в Prometheus metrics.
// prometheus-exporter.ts
class PrometheusHealthExporter {
async exportMetrics(): Promise<string> {
try {
const [general, detailed, resources] = await Promise.all([
checkSystemHealth(),
checkDetailedHealth(),
getResourceUsage()
]);
const metrics: string[] = [];
// Overall health status
const statusValue = general.status === 'healthy' ? 2 :
general.status === 'degraded' ? 1 : 0;
metrics.push(`saga_health_status{status="${general.status}"} ${statusValue}`);
// Component health
detailed.components.forEach(comp => {
const compStatusValue = comp.status === 'healthy' ? 2 :
comp.status === 'degraded' ? 1 : 0;
metrics.push(`saga_component_health{component="${comp.component}",status="${comp.status}"} ${compStatusValue}`);
metrics.push(`saga_component_latency_ms{component="${comp.component}"} ${comp.latency}`);
});
// Resource usage
metrics.push(`saga_memory_alloc_bytes ${resources.resourceUsage.memory.alloc}`);
metrics.push(`saga_memory_sys_bytes ${resources.resourceUsage.memory.sys}`);
metrics.push(`saga_goroutines ${resources.resourceUsage.goroutines}`);
// Component count
metrics.push(`saga_component_count ${general.componentCount}`);
metrics.push(`saga_healthy_components ${detailed.components.filter(c => c.status === 'healthy').length}`);
metrics.push(`saga_degraded_components ${detailed.components.filter(c => c.status === 'degraded').length}`);
metrics.push(`saga_unhealthy_components ${detailed.components.filter(c => c.status === 'unhealthy').length}`);
return metrics.join('\n') + '\n';
} catch (error) {
console.error('Failed to export metrics:', error);
return '# Failed to export metrics\n';
}
}
}
// Express endpoint для Prometheus
app.get('/metrics', async (req, res) => {
const exporter = new PrometheusHealthExporter();
const metrics = await exporter.exportMetrics();
res.set('Content-Type', 'text/plain; version=0.0.4');
res.send(metrics);
});
5. Health Status Page¶
Публичная status page для пользователей.
function PublicStatusPage() {
const [status, setStatus] = useState<'operational' | 'degraded' | 'major_outage'>('operational');
const [components, setComponents] = useState<ComponentHealth[]>([]);
const [lastUpdate, setLastUpdate] = useState<Date>(new Date());
useEffect(() => {
fetchStatus();
const interval = setInterval(fetchStatus, 60000); // Every minute
return () => clearInterval(interval);
}, []);
async function fetchStatus() {
try {
const detailed = await checkDetailedHealth();
// Determine overall status
const unhealthyCount = detailed.components.filter(c => c.status === 'unhealthy').length;
const degradedCount = detailed.components.filter(c => c.status === 'degraded').length;
if (unhealthyCount > 0) {
setStatus('major_outage');
} else if (degradedCount > 0) {
setStatus('degraded');
} else {
setStatus('operational');
}
setComponents(detailed.components);
setLastUpdate(new Date());
} catch (error) {
setStatus('major_outage');
}
}
const statusConfig = {
operational: { icon: '✅', color: 'green', text: 'All Systems Operational' },
degraded: { icon: '⚠️', color: 'yellow', text: 'Partial System Outage' },
major_outage: { icon: '❌', color: 'red', text: 'Major Service Outage' }
};
const config = statusConfig[status];
return (
<div className="status-page">
<header>
<h1>Saga System Status</h1>
<div className={`overall-status ${config.color}`}>
<span className="icon">{config.icon}</span>
<span className="text">{config.text}</span>
</div>
<p className="last-update">
Last updated: {lastUpdate.toLocaleString()}
</p>
</header>
<section className="components-status">
<h2>System Components</h2>
<div className="components-list">
{components.map(comp => (
<div key={comp.component} className={`component-item ${comp.status}`}>
<div className="component-name">
{comp.component.charAt(0).toUpperCase() + comp.component.slice(1)}
</div>
<div className="component-status">
{comp.status === 'healthy' && <span className="badge green">✅ Operational</span>}
{comp.status === 'degraded' && <span className="badge yellow">⚠️ Degraded Performance</span>}
{comp.status === 'unhealthy' && <span className="badge red">❌ Outage</span>}
</div>
{comp.message && (
<div className="component-message">{comp.message}</div>
)}
</div>
))}
</div>
</section>
<footer>
<p>For support, contact support@saga.surf</p>
</footer>
</div>
);
}
Technical Details¶
Health Checking Architecture¶
Comprehensive HealthService:
- Координирует все health checks через единый сервис
- Использует context.Context для timeout management
- Параллельные проверки компонентов для минимизации latency
- Caching результатов для снижения нагрузки
Timeout Configuration:
// Configurable timeouts в config.yaml
GetHealthCheckTimeout() // Standard health check (5s)
GetQuickHealthTimeout() // Quick checks (database, readiness) (2s)
GetComprehensiveHealthTimeout() // Detailed health check (10s)
GetEmergencyHealthTimeout() // Emergency liveness (1s)
GetDefaultTimeout() // Fallback safe value (5s)
Health Status Determination:
type HealthStatusCode string
const (
HealthStatusHealthy HealthStatusCode = "healthy" // Все компоненты OK
HealthStatusDegraded HealthStatusCode = "degraded" // Некоторые degraded, но работает
HealthStatusUnhealthy HealthStatusCode = "unhealthy" // Критические компоненты failed
HealthStatusUnknown HealthStatusCode = "unknown" // Невозможно определить
)
Component Health Checks:
- Database: PostgreSQL ping + connection pool status
- Blockchain: VPS node connectivity + latest block check
- Business Logic: Proxy через database health (в текущей implementation)
- Resources: Go runtime metrics (memory, goroutines)
HTTP Status Code Mapping:
switch systemHealth.Status {
case HealthStatusHealthy, HealthStatusDegraded:
statusCode = http.StatusOK // 200
case HealthStatusUnhealthy, HealthStatusUnknown:
statusCode = http.StatusServiceUnavailable // 503
}
Monitoring Best Practices¶
Polling Intervals:
- Ping: 30 seconds (frequent availability check)
- Health: 60 seconds (general system status)
- Detailed Health: 30 seconds (comprehensive monitoring)
- Version: Once on app start (static data)
- Database Health: 30 seconds (critical component)
- Blockchain Health: 15 seconds (blockchain specific)
- Resources: 5 seconds (fast-changing metrics)
- Readiness: 5 seconds (Kubernetes default)
- Liveness: 10 seconds (Kubernetes default)
Alerting Thresholds:
- Memory Usage: Warning at 500MB, Critical at 1GB
- Goroutines: Warning at 1000, Critical at 5000
- Component Latency: Warning at 1000ms, Critical at 5000ms
- Unhealthy Duration: Critical if unhealthy >5 minutes
Kubernetes Probe Configuration:
# Startup Probe (for slow-starting applications)
startupProbe:
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # 150s max startup time
# Readiness Probe (for traffic routing)
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3 # 15s to mark as not ready
# Liveness Probe (for restart decisions)
livenessProbe:
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3 # 30s to restart container
Rate Limiting Considerations:
- Health endpoints не имеют rate limit (мониторинг критичен)
- Public endpoints (/health, /api/ping) accessible без auth
- Рекомендуется не превышать 1 request/second для детальных checks
Error Handling¶
Connection Timeouts:
async function checkWithTimeout<T>(
checkFunc: () => Promise<T>,
timeoutMs: number
): Promise<T> {
const timeout = new Promise((_, reject) =>
setTimeout(() => reject(new Error('Health check timeout')), timeoutMs)
);
return Promise.race([checkFunc(), timeout]);
}
// Usage
try {
const health = await checkWithTimeout(
() => checkSystemHealth(),
5000 // 5 second timeout
);
} catch (error) {
console.error('Health check failed:', error);
}
Graceful Degradation:
async function getHealthWithFallback(): Promise<HealthResponse> {
try {
return await checkSystemHealth();
} catch (error) {
// Return fallback data if health check fails
return {
status: 'unknown',
message: 'Health check unavailable',
timestamp: new Date().toISOString(),
uptime: '0s',
version: 'unknown',
healthy: false,
componentCount: 0
};
}
}
Retry Logic:
async function checkHealthWithRetry(maxRetries = 3): Promise<HealthResponse> {
for (let i = 0; i < maxRetries; i++) {
try {
return await checkSystemHealth();
} catch (error) {
if (i === maxRetries - 1) throw error;
// Exponential backoff
await new Promise(resolve => setTimeout(resolve, 1000 * Math.pow(2, i)));
}
}
throw new Error('Max retries exceeded');
}
Version Management¶
Semantic Versioning:
- VERSION файл в корне проекта — single source of truth
- Automatic sync через master config generator система
- Version читается при старте application
- Fallback на "unknown" если файл недоступен
Build Metadata:
buildTime— может быть установлен через Go build flagsgitCommit— может быть установлен через Git hooksenvironment— читается из UnifiedConfig system
Version Comparison:
// Semantic version comparison
function isVersionNewer(current: string, compare: string): boolean {
if (current === 'unknown' || compare === 'unknown') return false;
const [cMajor, cMinor, cPatch] = current.split('.').map(Number);
const [rMajor, rMinor, rPatch] = compare.split('.').map(Number);
if (cMajor !== rMajor) return cMajor > rMajor;
if (cMinor !== rMinor) return cMinor > rMinor;
return cPatch > rPatch;
}
Связанная документация¶
Мониторинг и Observability:
- SLA Monitoring - SLA метрики и compliance
- Blockchain Status - Blockchain node мониторинг
Архитектура:
- System Architecture - Общая архитектура
DevOps и Deployment:
- Blue-Green Deployment - Production deployment
Конфигурация:
- Unified Config System - Система конфигурации
📋 Метаданные¶
Версия: 2.6.268
Обновлено: 2025-10-21
Статус: Published