引言:一次線上故障排查的啓示

凌晨3點,系統報警響起:訂單支付接口成功率驟降至65%。開發團隊迅速響應,卻花了近兩小時才定位到問題根源——一個第三方支付接口的證書過期問題。這次經歷讓我們深刻認識到:一個完整的、結構化的日誌體系對前後端分離項目的重要性不亞於代碼本身。

本文將系統介紹前後端分離項目部署後,如何建立從瀏覽器到數據庫的完整日誌鏈路,以及如何通過這些日誌快速定位和解決問題。

一、前後端分離架構與日誌挑戰

1.1 架構概覽

典型的現代前後端分離項目包含以下組件:

  • 前端:React/Vue/Angular應用,運行在瀏覽器或Node.js服務器
  • 後端:Spring Boot/Express/Django等API服務
  • 網關:Nginx/Traefik/API Gateway
  • 基礎設施:Docker/Kubernetes、數據庫、緩存、消息隊列

1.2 日誌面臨的挑戰

  1. 請求鏈路分散:一次用户請求可能經過多個服務
  2. 時間同步問題:各服務器時鐘可能存在偏差
  3. 日誌格式不統一:不同組件使用不同的日誌格式
  4. 數據量巨大:高併發下產生的日誌量巨大

二、前端日誌體系

2.1 瀏覽器控制枱日誌

// 生產環境日誌配置示例
class Logger {
  constructor() {
    this.environment = process.env.NODE_ENV;
  }
  
  info(message, metadata = {}) {
    if (this.environment === 'development') {
      console.log(`[INFO] ${new Date().toISOString()} ${message}`, metadata);
    }
    // 生產環境發送到日誌服務器
    this.sendToServer('INFO', message, metadata);
  }
  
  error(error, context = {}) {
    console.error(`[ERROR] ${new Date().toISOString()}`, error, context);
    
    // 收集瀏覽器信息
    const browserInfo = {
      userAgent: navigator.userAgent,
      url: window.location.href,
      screenResolution: `${screen.width}x${screen.height}`,
      timestamp: new Date().toISOString()
    };
    
    // 發送錯誤日誌
    this.sendToServer('ERROR', {
      message: error.message,
      stack: error.stack,
      context,
      browserInfo
    });
  }
  
  sendToServer(level, data) {
    // 使用navigator.sendBeacon確保日誌發送可靠性
    const blob = new Blob([JSON.stringify({
      level,
      data,
      timestamp: new Date().toISOString(),
      appVersion: '1.0.0'
    })], { type: 'application/json' });
    
    navigator.sendBeacon('/api/logs', blob);
  }
}

2.2 性能監控日誌

// 性能數據收集
const performanceLogger = {
  logPageLoad() {
    window.addEventListener('load', () => {
      setTimeout(() => {
        const timing = performance.timing;
        const metrics = {
          dnsLookup: timing.domainLookupEnd - timing.domainLookupStart,
          tcpConnect: timing.connectEnd - timing.connectStart,
          ttfb: timing.responseStart - timing.requestStart,
          domReady: timing.domContentLoadedEventEnd - timing.navigationStart,
          pageLoad: timing.loadEventEnd - timing.navigationStart,
          frontendMemory: performance.memory ? {
            usedJSHeapSize: performance.memory.usedJSHeapSize,
            totalJSHeapSize: performance.memory.totalJSHeapSize
          } : null
        };
        
        logger.info('Page performance metrics', metrics);
      }, 0);
    });
  }
};

三、Nginx網關日誌配置

3.1 訪問日誌格式優化

http {
    log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" '
                      '"$host" "sn=$server_name" '
                      'rt=$request_time uct="$upstream_connect_time" '
                      'uht="$upstream_header_time" urt="$upstream_response_time" '
                      'app_trace_id=$http_x_trace_id';
    
    # 健康檢查不記錄日誌
    map $request_uri $loggable {
        ~^/health 0;
        default 1;
    }
    
    access_log /var/log/nginx/access.log main_ext if=$loggable;
    access_log /var/log/nginx/access_json.log json_format if=$loggable;
    
    # 錯誤日誌分級
    error_log /var/log/nginx/error.log warn;
}

3.2 JSON格式日誌示例

{
  "timestamp": "2024-01-15T08:30:45+08:00",
  "client_ip": "203.0.113.12",
  "method": "POST",
  "uri": "/api/v1/orders",
  "status": 201,
  "response_time": 0.245,
  "request_size": 1245,
  "response_size": 456,
  "user_agent": "Mozilla/5.0 (Macintosh) AppleWebKit/537.36",
  "referrer": "https://example.com/checkout",
  "upstream_addr": "10.0.1.15:8080",
  "upstream_response_time": 0.234,
  "trace_id": "trace-abc123-def456",
  "user_id": "user_789"
}

四、後端應用日誌

4.1 結構化日誌配置(以Spring Boot為例)

# application-prod.yml
logging:
  level:
    com.example.api: INFO
    org.springframework.web: WARN
    org.hibernate.SQL: ERROR
  file:
    name: /var/log/app/application.log
  logback:
    rollingpolicy:
      max-file-size: 100MB
      max-history: 30
      file-name-pattern: /var/log/app/application.%d{yyyy-MM-dd}.%i.log

# 日誌模式(支持JSON格式)
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss} - %logger{36} - %msg%n"
    file: "%d{ISO8601} [%thread] %-5level %logger{36} - trace_id=%X{traceId} user_id=%X{userId} - %msg%n"

4.2 增強的日誌切面

@Aspect
@Component
@Slf4j
public class ApiLogAspect {
    
    @Around("@annotation(org.springframework.web.bind.annotation.RequestMapping)")
    public Object logApiRequest(ProceedingJoinPoint joinPoint) throws Throwable {
        String traceId = MDC.get("traceId") != null ? MDC.get("traceId") : generateTraceId();
        MDC.put("traceId", traceId);
        
        HttpServletRequest request = 
            ((ServletRequestAttributes) RequestContextHolder.currentRequestAttributes()).getRequest();
        
        long startTime = System.currentTimeMillis();
        ApiLog apiLog = ApiLog.builder()
            .traceId(traceId)
            .uri(request.getRequestURI())
            .method(request.getMethod())
            .clientIp(getClientIp(request))
            .userAgent(request.getHeader("User-Agent"))
            .params(getRequestParams(request))
            .userId(getCurrentUserId())
            .build();
        
        log.info("API Request Started: {}", apiLog);
        
        try {
            Object result = joinPoint.proceed();
            long duration = System.currentTimeMillis() - startTime;
            
            ApiResponseLog responseLog = ApiResponseLog.builder()
                .traceId(traceId)
                .status(HttpStatus.OK.value())
                .duration(duration)
                .build();
            
            log.info("API Request Completed: {}", responseLog);
            
            return result;
        } catch (Exception e) {
            long duration = System.currentTimeMillis() - startTime;
            
            ApiErrorLog errorLog = ApiErrorLog.builder()
                .traceId(traceId)
                .errorCode("INTERNAL_ERROR")
                .errorMessage(e.getMessage())
                .stackTrace(getStackTrace(e))
                .duration(duration)
                .build();
            
            log.error("API Request Failed: {}", errorLog, e);
            throw e;
        } finally {
            MDC.clear();
        }
    }
}

五、數據庫與緩存日誌

5.1 慢查詢日誌配置

-- MySQL配置
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-queries.log';
SET GLOBAL long_query_time = 1; -- 超過1秒的查詢
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- PostgreSQL配置
ALTER SYSTEM SET log_min_duration_statement = '1000ms';
ALTER SYSTEM SET log_statement = 'none';
ALTER SYSTEM SET log_duration = 'off';

5.2 Redis監控日誌

# redis.conf
# 啓用慢查詢日誌
slowlog-log-slower-than 10000  # 10毫秒
slowlog-max-len 128

# 監控命令
loglevel verbose
logfile /var/log/redis/redis-server.log

六、全鏈路追蹤日誌

6.1 分佈式追蹤實現

@Component
public class TraceFilter implements Filter {
    
    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) 
            throws IOException, ServletException {
        
        HttpServletRequest httpRequest = (HttpServletRequest) request;
        String traceId = httpRequest.getHeader("X-Trace-Id");
        
        if (StringUtils.isEmpty(traceId)) {
            traceId = UUID.randomUUID().toString().replace("-", "");
        }
        
        MDC.put("traceId", traceId);
        
        // 向下遊服務傳遞traceId
        TraceContext.setTraceId(traceId);
        
        // 添加響應頭
        HttpServletResponse httpResponse = (HttpServletResponse) response;
        httpResponse.setHeader("X-Trace-Id", traceId);
        
        try {
            chain.doFilter(request, response);
        } finally {
            MDC.remove("traceId");
            TraceContext.clear();
        }
    }
}

七、日誌收集與分析系統

7.1 ELK Stack配置示例

# filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/*.log
  json.keys_under_root: true
  json.add_error_key: true
  
- type: log
  enabled: true
  paths:
    - /var/log/app/*.log
  fields:
    type: "application"
    environment: "production"

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  indices:
    - index: "nginx-%{+yyyy.MM.dd}"
    - index: "app-%{+yyyy.MM.dd}"

7.2 關鍵監控指標

{
  "monitoring_metrics": {
    "application": {
      "error_rate": "錯誤率超過1%告警",
      "p95_response_time": "P95響應時間超過500ms告警",
      "throughput": "每分鐘請求數",
      "active_users": "活躍用户數"
    },
    "infrastructure": {
      "cpu_usage": "CPU使用率超過80%告警",
      "memory_usage": "內存使用率超過85%告警",
      "disk_usage": "磁盤使用率超過90%告警",
      "jvm_gc": "Full GC頻率"
    }
  }
}

八、日誌最佳實踐

8.1 日誌級別使用規範

  • ERROR: 系統錯誤、業務異常,需要立即處理
  • WARN: 潛在問題,需要關注但不需要立即處理
  • INFO: 重要的業務流程日誌,用於審計和追蹤
  • DEBUG: 開發調試信息,生產環境謹慎使用
  • TRACE: 詳細的調用追蹤,生產環境通常關閉

8.2 敏感信息處理

@Component
public class SensitiveDataFilter extends AbstractJacksonFilter {
    
    private static final Set<String> SENSITIVE_FIELDS = Set.of(
        "password", "token", "secret", "creditCard", "ssn", "phone"
    );
    
    @Override
    public void serialize(Object value, JsonGenerator gen, SerializerProvider serializers) 
            throws IOException {
        String fieldName = gen.getOutputContext().getCurrentName();
        
        if (SENSITIVE_FIELDS.contains(fieldName.toLowerCase())) {
            gen.writeString("***MASKED***");
        } else {
            super.serialize(value, gen, serializers);
        }
    }
}

九、故障排查實戰示例

9.1 問題場景:API響應緩慢

排查步驟:

  1. 檢查Nginx訪問日誌

    grep "POST /api/v1/orders" /var/log/nginx/access_json.log | jq '.response_time' | sort -nr | head -5
    
  2. 根據traceId追蹤全鏈路

    # 查看相關日誌
    grep "trace-abc123-def456" /var/log/app/application.log
    grep "trace-abc123-def456" /var/log/nginx/access.log
    
  3. 分析數據庫慢查詢

    SELECT * FROM mysql.slow_log 
    WHERE start_time > NOW() - INTERVAL 10 MINUTE
    ORDER BY query_time DESC LIMIT 10;
    
  4. 檢查應用性能指標

    # 查看JVM狀態
    jstat -gcutil <pid> 1000 10
    
    # 查看線程狀態
    jstack <pid> | grep -A 5 "BLOCKED"
    

十、總結

一個完善的日誌系統是前後端分離項目的"黑匣子",它記錄了系統運行的每一個重要時刻。通過建立標準化的日誌規範、統一的收集機制和強大的分析平台,我們可以:

  1. 快速定位問題:平均故障恢復時間(MTTR)降低70%
  2. 性能優化:基於數據而非直覺進行性能調優
  3. 業務洞察:通過用户行為日誌優化產品功能
  4. 安全審計:滿足合規要求,追蹤異常操作

記住,好的日誌系統不是項目上線後才添加的附屬品,而是從一開始就需要規劃的基礎設施。在微服務和雲原生架構成為主流的今天,分佈式追蹤和結構化日誌已經成為高質量軟件系統的基本要求。

技術棧推薦

  • 日誌收集:Filebeat/Fluentd
  • 存儲分析:Elasticsearch + Kibana
  • 實時監控:Prometheus + Grafana
  • 分佈式追蹤:Jaeger/Zipkin/SkyWalking
  • 日誌管理:Loki + Grafana(輕量級替代方案)

在項目初期就投入資源建設完善的日誌體系,將為系統的長期穩定運行打下堅實基礎,最終實現"可觀測性"的工程目標——不僅知道系統是否在運行,更瞭解系統為何這樣運行。