SLA Monitoring Endpoints¶
Обзор Endpoints¶
| Метод | Endpoint | Описание | Auth требуется |
|---|---|---|---|
| GET | /api/sla/status |
Получить текущий статус SLA | ❌ Нет |
| GET | /api/sla/report |
Получить полный SLA отчет | ❌ Нет |
| GET | /api/sla/metrics |
Получить основные SLA метрики | ❌ Нет |
| GET | /api/sla/availability |
Получить метрики доступности | ❌ Нет |
| GET | /api/sla/response-times |
Получить метрики времени отклика | ❌ Нет |
| GET | /api/sla/error-rate |
Получить метрики частоты ошибок | ❌ Нет |
GET /api/sla/status¶
Получить текущий статус SLA и информацию о нарушениях.
Запрос¶
Заголовки: | Заголовок | Значение | Обязательно | |-----------|----------|-------------| | Accept | application/json | ✅ Да |
Query параметры: Нет
Body параметры: Нет (GET запрос)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"status": "healthy",
"breaches": [],
"timestamp": "2025-10-06T12:00:00Z",
"healthy": true
},
"message": "SLA status retrieved successfully"
}
Возможные статусы:
healthy- Все SLA метрики в нормеwarning- Приближение к нарушению SLAcritical- Нарушение SLA обнаружено
Пример с нарушениями:
{
"success": true,
"data": {
"status": "critical",
"breaches": [
"Availability below 99.9% (current: 99.5%)",
"P95 response time exceeds 2000ms (current: 2500ms)"
],
"timestamp": "2025-10-06T12:00:00Z",
"healthy": false
}
}
Пример cURL¶
Пример TypeScript¶
interface SLAStatus {
status: 'healthy' | 'warning' | 'critical';
breaches: string[];
timestamp: string;
healthy: boolean;
}
async function getSLAStatus(): Promise<SLAStatus> {
const response = await fetch('/api/sla/status');
const { data } = await response.json();
return data;
}
// Использование с real-time monitoring
async function monitorSLAStatus() {
const status = await getSLAStatus();
if (!status.healthy) {
console.error(`⚠️ SLA Status: ${status.status}`);
status.breaches.forEach(breach => {
console.error(` - ${breach}`);
});
// Отправить алерт
await sendAlert({
severity: status.status === 'critical' ? 'high' : 'medium',
message: `SLA breaches detected: ${status.breaches.length}`,
details: status.breaches
});
} else {
console.log('✅ SLA Status: Healthy');
}
}
Пример React компонента¶
import { useState, useEffect } from 'react';
function SLAStatusBadge() {
const [status, setStatus] = useState<SLAStatus | null>(null);
useEffect(() => {
fetchStatus();
const interval = setInterval(fetchStatus, 60000); // Каждые 60 секунд
return () => clearInterval(interval);
}, []);
async function fetchStatus() {
try {
const data = await getSLAStatus();
setStatus(data);
} catch (error) {
console.error('Failed to fetch SLA status:', error);
}
}
if (!status) return null;
const badgeClass = status.healthy ? 'badge-success' : 'badge-danger';
const icon = status.healthy ? '✅' : '⚠️';
return (
<div className={`sla-badge ${badgeClass}`}>
<span className="icon">{icon}</span>
<span className="status">{status.status.toUpperCase()}</span>
{status.breaches.length > 0 && (
<span className="breach-count">{status.breaches.length}</span>
)}
</div>
);
}
GET /api/sla/report¶
Получить полный детальный SLA отчет с историческими данными.
Запрос¶
Заголовки: | Заголовок | Значение | Обязательно | |-----------|----------|-------------| | Accept | application/json | ✅ Да |
Query параметры: Нет
Body параметры: Нет (GET запрос)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"period": {
"start": "2025-09-06T00:00:00Z",
"end": "2025-10-06T23:59:59Z",
"durationDays": 30
},
"summary": {
"availability": 99.95,
"uptime": "29d 23h 36m",
"downtime": "24m",
"errorRate": 0.08,
"totalRequests": 15000000,
"failedRequests": 12000
},
"targets": {
"availability": 99.9,
"maxErrorRate": 0.1,
"p95ResponseTime": 2000,
"p99ResponseTime": 5000
},
"compliance": {
"availability": true,
"errorRate": true,
"responseTimeP95": true,
"responseTimeP99": true,
"overallCompliance": true
},
"incidents": [
{
"timestamp": "2025-09-15T14:30:00Z",
"duration": "15 minutes",
"impact": "Database connection timeout",
"affected": ["user-api", "admin-api"],
"resolved": true
}
],
"timestamp": "2025-10-06T12:00:00Z"
},
"message": "SLA report generated successfully"
}
Пример cURL¶
Пример TypeScript¶
interface SLAReport {
period: {
start: string;
end: string;
durationDays: number;
};
summary: {
availability: number;
uptime: string;
downtime: string;
errorRate: number;
totalRequests: number;
failedRequests: number;
};
targets: {
availability: number;
maxErrorRate: number;
p95ResponseTime: number;
p99ResponseTime: number;
};
compliance: {
availability: boolean;
errorRate: boolean;
responseTimeP95: boolean;
responseTimeP99: boolean;
overallCompliance: boolean;
};
incidents: Array<{
timestamp: string;
duration: string;
impact: string;
affected: string[];
resolved: boolean;
}>;
timestamp: string;
}
async function getSLAReport(): Promise<SLAReport> {
const response = await fetch('/api/sla/report');
const { data } = await response.json();
return data;
}
// Генерация PDF отчета
async function generatePDFReport() {
const report = await getSLAReport();
const pdfData = {
title: 'SLA Compliance Report',
period: `${report.period.start} - ${report.period.end}`,
sections: [
{
title: 'Summary',
data: report.summary
},
{
title: 'Compliance Status',
data: report.compliance
},
{
title: 'Incidents',
data: report.incidents
}
]
};
// Send to PDF generation service
await generatePDF(pdfData);
}
GET /api/sla/metrics¶
Получить основные SLA метрики в реальном времени.
Запрос¶
Заголовки: | Заголовок | Значение | Обязательно | |-----------|----------|-------------| | Accept | application/json | ✅ Да |
Query параметры: Нет
Body параметры: Нет (GET запрос)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"availability": 99.95,
"uptime": "29d 23h 36m",
"errorRate": 0.08,
"responseTimeP50": "35ms",
"responseTimeP95": "150ms",
"responseTimeP99": "450ms",
"timestamp": "2025-10-06T12:00:00Z"
},
"message": "SLA metrics retrieved successfully"
}
Пример cURL¶
Пример TypeScript¶
interface SLAMetrics {
availability: number;
uptime: string;
errorRate: number;
responseTimeP50: string;
responseTimeP95: string;
responseTimeP99: string;
timestamp: string;
}
async function getSLAMetrics(): Promise<SLAMetrics> {
const response = await fetch('/api/sla/metrics');
const { data } = await response.json();
return data;
}
// Dashboard widget с auto-refresh
function SLAMetricsWidget() {
const [metrics, setMetrics] = useState<SLAMetrics | null>(null);
useEffect(() => {
fetchMetrics();
const interval = setInterval(fetchMetrics, 30000); // Каждые 30 секунд
return () => clearInterval(interval);
}, []);
async function fetchMetrics() {
try {
const data = await getSLAMetrics();
setMetrics(data);
} catch (error) {
console.error('Failed to fetch metrics:', error);
}
}
if (!metrics) return <div>Loading metrics...</div>;
return (
<div className="sla-metrics-widget">
<h3>SLA Metrics</h3>
<div className="metric">
<span className="label">Availability:</span>
<span className="value">{metrics.availability}%</span>
</div>
<div className="metric">
<span className="label">Uptime:</span>
<span className="value">{metrics.uptime}</span>
</div>
<div className="metric">
<span className="label">Error Rate:</span>
<span className="value">{metrics.errorRate}%</span>
</div>
<div className="response-times">
<h4>Response Times</h4>
<div>P50: {metrics.responseTimeP50}</div>
<div>P95: {metrics.responseTimeP95}</div>
<div>P99: {metrics.responseTimeP99}</div>
</div>
<small>Last updated: {new Date(metrics.timestamp).toLocaleString()}</small>
</div>
);
}
🟢 GET /api/sla/availability¶
Получить детальные метрики доступности системы.
Запрос¶
Заголовки: | Заголовок | Значение | Обязательно | |-----------|----------|-------------| | Accept | application/json | ✅ Да |
Query параметры: Нет
Body параметры: Нет (GET запрос)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"availability": 99.95,
"uptime": "29d 23h 36m",
"uptimeSeconds": 2591760,
"timestamp": "2025-10-06T12:00:00Z",
"slaTarget": 99.9,
"meetsSLA": true
},
"message": "Availability metrics retrieved successfully"
}
Пример cURL¶
Пример TypeScript¶
interface AvailabilityMetrics {
availability: number;
uptime: string;
uptimeSeconds: number;
timestamp: string;
slaTarget: number;
meetsSLA: boolean;
}
async function getAvailability(): Promise<AvailabilityMetrics> {
const response = await fetch('/api/sla/availability');
const { data } = await response.json();
return data;
}
// Availability chart component
function AvailabilityChart() {
const [metrics, setMetrics] = useState<AvailabilityMetrics | null>(null);
useEffect(() => {
fetchMetrics();
const interval = setInterval(fetchMetrics, 60000);
return () => clearInterval(interval);
}, []);
async function fetchMetrics() {
const data = await getAvailability();
setMetrics(data);
}
if (!metrics) return null;
const percentage = metrics.availability;
const target = metrics.slaTarget;
const meetsSLA = metrics.meetsSLA;
return (
<div className="availability-chart">
<h3>System Availability</h3>
<div className="chart-container">
<div className="progress-bar">
<div
className={`progress ${meetsSLA ? 'success' : 'warning'}`}
style={{ width: `${percentage}%` }}
>
{percentage}%
</div>
<div className="target-line" style={{ left: `${target}%` }}>
SLA Target: {target}%
</div>
</div>
</div>
<div className="details">
<div>Uptime: {metrics.uptime}</div>
<div>Status: {meetsSLA ? '✅ Meeting SLA' : '⚠️ Below SLA'}</div>
</div>
</div>
);
}
GET /api/sla/response-times¶
Получить метрики времени отклика (response time percentiles).
Запрос¶
Заголовки: | Заголовок | Значение | Обязательно | |-----------|----------|-------------| | Accept | application/json | ✅ Да |
Query параметры: Нет
Body параметры: Нет (GET запрос)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"responseTimeP50": "35ms",
"responseTimeP95": "150ms",
"responseTimeP99": "450ms",
"responseTimeP50Ms": 35,
"responseTimeP95Ms": 150,
"responseTimeP99Ms": 450,
"timestamp": "2025-10-06T12:00:00Z",
"slaTargetP95Ms": 2000,
"slaTargetP99Ms": 5000,
"meetsP95SLA": true,
"meetsP99SLA": true
},
"message": "Response time metrics retrieved successfully"
}
SLA Targets:
- P95: Максимум 2000ms (2 секунды)
- P99: Максимум 5000ms (5 секунд)
Пример cURL¶
Пример TypeScript¶
interface ResponseTimeMetrics {
responseTimeP50: string;
responseTimeP95: string;
responseTimeP99: string;
responseTimeP50Ms: number;
responseTimeP95Ms: number;
responseTimeP99Ms: number;
timestamp: string;
slaTargetP95Ms: number;
slaTargetP99Ms: number;
meetsP95SLA: boolean;
meetsP99SLA: boolean;
}
async function getResponseTimes(): Promise<ResponseTimeMetrics> {
const response = await fetch('/api/sla/response-times');
const { data } = await response.json();
return data;
}
// Response time histogram
function ResponseTimeHistogram() {
const [metrics, setMetrics] = useState<ResponseTimeMetrics | null>(null);
useEffect(() => {
fetchMetrics();
const interval = setInterval(fetchMetrics, 30000);
return () => clearInterval(interval);
}, []);
async function fetchMetrics() {
const data = await getResponseTimes();
setMetrics(data);
}
if (!metrics) return null;
return (
<div className="response-time-histogram">
<h3>Response Time Distribution</h3>
<div className="percentiles">
<div className="percentile">
<span className="label">P50 (Median):</span>
<span className="value">{metrics.responseTimeP50}</span>
</div>
<div className={`percentile ${metrics.meetsP95SLA ? 'success' : 'warning'}`}>
<span className="label">P95:</span>
<span className="value">{metrics.responseTimeP95}</span>
<span className="target">Target: {metrics.slaTargetP95Ms}ms</span>
{metrics.meetsP95SLA ? '✅' : '⚠️'}
</div>
<div className={`percentile ${metrics.meetsP99SLA ? 'success' : 'warning'}`}>
<span className="label">P99:</span>
<span className="value">{metrics.responseTimeP99}</span>
<span className="target">Target: {metrics.slaTargetP99Ms}ms</span>
{metrics.meetsP99SLA ? '✅' : '⚠️'}
</div>
</div>
<div className="chart">
<Bar
data={{
labels: ['P50', 'P95', 'P99'],
datasets: [{
label: 'Response Time (ms)',
data: [
metrics.responseTimeP50Ms,
metrics.responseTimeP95Ms,
metrics.responseTimeP99Ms
],
backgroundColor: [
'rgba(75, 192, 192, 0.6)',
metrics.meetsP95SLA ? 'rgba(75, 192, 192, 0.6)' : 'rgba(255, 99, 132, 0.6)',
metrics.meetsP99SLA ? 'rgba(75, 192, 192, 0.6)' : 'rgba(255, 99, 132, 0.6)'
]
}]
}}
/>
</div>
</div>
);
}
GET /api/sla/error-rate¶
Получить метрики частоты ошибок (error rate).
Запрос¶
Заголовки: | Заголовок | Значение | Обязательно | |-----------|----------|-------------| | Accept | application/json | ✅ Да |
Query параметры: Нет
Body параметры: Нет (GET запрос)
Ответ¶
Успех (200 OK):
{
"success": true,
"data": {
"errorRate": "0.08",
"timestamp": "2025-10-06T12:00:00Z",
"slaTarget": "0.1",
"meetsSLA": true
},
"message": "Error rate metrics retrieved successfully"
}
SLA Target:
- Max Error Rate: 0.1% (максимум 1 ошибка на 1000 запросов)
Пример cURL¶
Пример TypeScript¶
interface ErrorRateMetrics {
errorRate: string; // SafeDecimal format
timestamp: string;
slaTarget: string; // SafeDecimal format
meetsSLA: boolean;
}
async function getErrorRate(): Promise<ErrorRateMetrics> {
const response = await fetch('/api/sla/error-rate');
const { data } = await response.json();
return data;
}
// Error rate gauge
function ErrorRateGauge() {
const [metrics, setMetrics] = useState<ErrorRateMetrics | null>(null);
useEffect(() => {
fetchMetrics();
const interval = setInterval(fetchMetrics, 30000);
return () => clearInterval(interval);
}, []);
async function fetchMetrics() {
const data = await getErrorRate();
setMetrics(data);
}
if (!metrics) return null;
const errorRate = parseFloat(metrics.errorRate);
const target = parseFloat(metrics.slaTarget);
const percentage = (errorRate / target) * 100;
return (
<div className="error-rate-gauge">
<h3>Error Rate</h3>
<div className="gauge-container">
<div className={`gauge ${metrics.meetsSLA ? 'success' : 'danger'}`}>
<div className="needle" style={{ transform: `rotate(${percentage * 1.8}deg)` }}></div>
<div className="gauge-value">
{errorRate}%
</div>
</div>
</div>
<div className="gauge-labels">
<span className="current">Current: {metrics.errorRate}%</span>
<span className="target">Target: ≤ {metrics.slaTarget}%</span>
</div>
<div className={`status ${metrics.meetsSLA ? 'success' : 'warning'}`}>
{metrics.meetsSLA ? '✅ Within SLA' : '⚠️ Exceeds SLA'}
</div>
</div>
);
}
Распространённые сценарии использования¶
1. Comprehensive SLA Dashboard¶
async function createSLADashboard() {
const [status, report, metrics, availability, responseTimes, errorRate] = await Promise.all([
getSLAStatus(),
getSLAReport(),
getSLAMetrics(),
getAvailability(),
getResponseTimes(),
getErrorRate()
]);
return {
overview: {
status: status.status,
healthy: status.healthy,
breaches: status.breaches.length
},
compliance: report.compliance,
realtime: {
availability: availability.availability,
p95ResponseTime: responseTimes.responseTimeP95Ms,
errorRate: parseFloat(errorRate.errorRate)
},
targets: {
availability: availability.slaTarget,
p95ResponseTime: responseTimes.slaTargetP95Ms,
errorRate: parseFloat(errorRate.slaTarget)
}
};
}
2. Automated SLA Alerting¶
async function monitorSLAAndAlert() {
const status = await getSLAStatus();
if (!status.healthy) {
// Fetch detailed metrics
const [availability, responseTimes, errorRate] = await Promise.all([
getAvailability(),
getResponseTimes(),
getErrorRate()
]);
// Determine alert severity
let severity: 'low' | 'medium' | 'high' = 'medium';
if (status.status === 'critical') {
severity = 'high';
}
// Send alert with details
await sendAlert({
severity,
title: 'SLA Breach Detected',
breaches: status.breaches,
metrics: {
availability: !availability.meetsSLA,
p95ResponseTime: !responseTimes.meetsP95SLA,
p99ResponseTime: !responseTimes.meetsP99SLA,
errorRate: !errorRate.meetsSLA
}
});
}
}
// Run monitoring every minute
setInterval(monitorSLAAndAlert, 60000);
3. Historical SLA Trend Analysis¶
interface SLATrend {
timestamp: string;
availability: number;
errorRate: number;
p95ResponseTime: number;
}
const slaHistory: SLATrend[] = [];
async function trackSLATrends() {
const [availability, responseTimes, errorRate] = await Promise.all([
getAvailability(),
getResponseTimes(),
getErrorRate()
]);
const dataPoint: SLATrend = {
timestamp: new Date().toISOString(),
availability: availability.availability,
errorRate: parseFloat(errorRate.errorRate),
p95ResponseTime: responseTimes.responseTimeP95Ms
};
slaHistory.push(dataPoint);
// Keep only last 24 hours
const cutoff = Date.now() - 24 * 60 * 60 * 1000;
const filtered = slaHistory.filter(
point => new Date(point.timestamp).getTime() > cutoff
);
return filtered;
}
4. SLA Compliance Report Generation¶
async function generateComplianceReport(period: '7d' | '30d' | '90d') {
const report = await getSLAReport();
const complianceReport = {
period: period,
generatedAt: new Date().toISOString(),
summary: {
overallCompliance: report.compliance.overallCompliance,
availability: {
actual: report.summary.availability,
target: report.targets.availability,
compliant: report.compliance.availability
},
responseTime: {
p95: {
compliant: report.compliance.responseTimeP95,
target: report.targets.p95ResponseTime
},
p99: {
compliant: report.compliance.responseTimeP99,
target: report.targets.p99ResponseTime
}
},
errorRate: {
actual: report.summary.errorRate,
target: report.targets.maxErrorRate,
compliant: report.compliance.errorRate
}
},
incidents: report.incidents,
recommendations: generateRecommendations(report)
};
return complianceReport;
}
function generateRecommendations(report: SLAReport): string[] {
const recommendations: string[] = [];
if (!report.compliance.availability) {
recommendations.push('Improve system availability through redundancy');
}
if (!report.compliance.responseTimeP95) {
recommendations.push('Optimize database queries and add caching');
}
if (!report.compliance.errorRate) {
recommendations.push('Review error logs and fix recurring issues');
}
return recommendations;
}
5. Real-time SLA Monitoring Widget¶
function SLAMonitoringWidget() {
const [status, setStatus] = useState<SLAStatus | null>(null);
const [metrics, setMetrics] = useState<SLAMetrics | null>(null);
useEffect(() => {
fetchData();
const interval = setInterval(fetchData, 30000); // Every 30 seconds
return () => clearInterval(interval);
}, []);
async function fetchData() {
try {
const [statusData, metricsData] = await Promise.all([
getSLAStatus(),
getSLAMetrics()
]);
setStatus(statusData);
setMetrics(metricsData);
} catch (error) {
console.error('Failed to fetch SLA data:', error);
}
}
if (!status || !metrics) return <div>Loading...</div>;
return (
<div className={`sla-widget ${status.healthy ? 'healthy' : 'unhealthy'}`}>
<div className="status-header">
<h3>SLA Status</h3>
<span className={`badge ${status.status}`}>
{status.status.toUpperCase()}
</span>
</div>
<div className="metrics-grid">
<div className="metric">
<span className="label">Availability</span>
<span className="value">{metrics.availability}%</span>
</div>
<div className="metric">
<span className="label">Error Rate</span>
<span className="value">{metrics.errorRate}%</span>
</div>
<div className="metric">
<span className="label">P95 Response</span>
<span className="value">{metrics.responseTimeP95}</span>
</div>
</div>
{status.breaches.length > 0 && (
<div className="breaches">
<h4>⚠️ SLA Breaches</h4>
<ul>
{status.breaches.map((breach, idx) => (
<li key={idx}>{breach}</li>
))}
</ul>
</div>
)}
<small>Last updated: {new Date(status.timestamp).toLocaleString()}</small>
</div>
);
}
Связанная документация¶
- Health Endpoints - System health checks
- Blockchain Status - Blockchain monitoring
- Monitoring Guide - Comprehensive monitoring setup
Технические детали¶
SLA Targets¶
Проект Saga использует следующие SLA targets:
| Метрика | Target | Критичность |
|---|---|---|
| Availability | ≥ 99.9% | 🔴 Critical |
| Error Rate | ≤ 0.1% | 🔴 Critical |
| P95 Response Time | ≤ 2000ms | 🟡 High |
| P99 Response Time | ≤ 5000ms | 🟡 High |
Monitoring Architecture¶
Real-time Metrics Collection:
- Метрики собираются каждые 10 секунд
- Aggregation в 1-минутные, 5-минутные и hourly buckets
- Retention: 7 дней детальных метрик, 90 дней aggregated
Alerting Thresholds:
- Warning: 95% of SLA target (например, 99.855% availability)
- Critical: Below SLA target (например, < 99.9% availability)
Calculation Methods¶
Availability Calculation:
Error Rate Calculation:
Response Time Percentiles:
- P50 (Median): 50% of requests faster than this value
- P95: 95% of requests faster than this value
- P99: 99% of requests faster than this value
Best Practices¶
Monitoring Best Practices:
- ✅ Monitor SLA metrics every 30-60 seconds
- ✅ Set up automated alerts for SLA breaches
- ✅ Review SLA reports weekly
- ✅ Investigate degraded status immediately
- ✅ Maintain historical trends for capacity planning
Response to SLA Breaches:
- Immediate: Check
/api/sla/statusfor breach details - Investigation: Review
/api/sla/reportfor incident context - Mitigation: Address root cause based on specific metric breach
- Post-mortem: Document incident and prevention measures
Версия документа: 2.0.0 Дата обновления: 2025-10-06 Связанные модули: sla, monitoring, health, performance