Logging & Monitoring

At Authentify It, we place strong emphasis on observability, error tracking, and operational awareness, ensuring that both our engineering team and our business stakeholders have visibility into the health of the platform. Our monitoring stack combines industry-leading tools with lightweight operational processes to provide actionable insights in real-time.

1. Error Tracking & Debugging

We leverage Sentry as our primary error monitoring and debugging solution.

Error Tracking: Automatic capture of backend and frontend exceptions with rich context (stack traces, environment, release).
Session Replays: Enables engineers to replay real user sessions to understand the exact sequence of interactions leading to an error.
Performance Monitoring: Identifies bottlenecks in API response times and frontend performance.
Alerting: Critical errors are directly routed into our Slack engineering channels, ensuring immediate visibility and quick response times.

This allows the team to proactively detect, investigate, and resolve issues before they affect user experience at scale.

2. Uptime & Availability Monitoring

To ensure continuous platform reliability, Authentify It employs a multi-layered monitoring strategy combining both external uptime checks and infrastructure-level insights:

PingPing (External Uptime Monitoring):
- Continuous health checks on critical API endpoints.
- Instant downtime alerts delivered to Slack for rapid response.
- Historical availability reports are retained and can be shared for SLA tracking and investor transparency. (History available in this document menu or here).
- Proactive detection of endpoint degradations minimizes downtime risk and ensures high availability.
DigitalOcean Insights (Infrastructure Monitoring):
- Continuous monitoring of CPU, RAM, disk, and network usage at the container and application level.
- Configurable alerts trigger if the server exceeds certain thresholds, allowing the engineering team to address scaling or performance bottlenecks before they impact users.
- These insights provide a granular, real-time view of system health, complementing external uptime checks with infrastructure-level visibility.

This dual-layer approach ensures that both end-user availability and infrastructure health are continuously monitored, enabling Authentify It to maintain a resilient, investor-grade operational posture.

3. Business-Critical Notifications

Operational monitoring goes beyond technical errors. Authentify It integrates Fireblocks notifications directly into Slack, ensuring business-critical events are surfaced in real time:

Certificate claims by end users trigger instant notifications.
Brand-issued certificate creations are pushed to Slack, keeping the team aligned with on-chain and brand-facing activities.
Blockchain transaction retries (“try” monitoring): Every blockchain transaction attempt is monitored, with a maximum of 5 retries allowed. Notifications in Slack clearly indicate the attempt count. The goal is to always complete transactions on try 1, which serves as a real-time health indicator of the blockchain layer.

This integration ensures that both technical and non-technical stakeholders remain aware of mission-critical flows happening across the platform.

Example:

4. Application Logs & Infrastructure Logs

For lower-level visibility, Authentify It makes use of DigitalOcean App Platform logs combined with NestJS integrated logging.

Each service outputs structured logs using NestJS’ Logger service.
Engineers can filter and trace application events directly within the DigitalOcean console.
This provides an additional layer of observability when debugging application-specific behaviors or monitoring deployments.

5. Slack as the Central Operations Hub

By consolidating error alerts (Sentry), uptime checks (PingPing), and operational notifications (Fireblocks) into Slack, we ensure that the engineering team operates from a single source of truth. This minimizes alert fatigue while guaranteeing that actionable events are never missed.

PreviousDatabase NextError handling

Last updated 5 months ago

hashtag1. Error Tracking & Debugging

hashtag2. Uptime & Availability Monitoring

hashtag3. Business-Critical Notifications

hashtag4. Application Logs & Infrastructure Logs

hashtag5. Slack as the Central Operations Hub

1. Error Tracking & Debugging

2. Uptime & Availability Monitoring

3. Business-Critical Notifications

4. Application Logs & Infrastructure Logs

5. Slack as the Central Operations Hub