Monitoring ERP Systems in Production

Advertisement space

Most teams running Odoo in production know when the system is slow. What they rarely know is when it became slow, which requests caused it, and whether the problem affected business operations or just irritated a few users. That gap between feeling and evidence is what monitoring is supposed to close.

ERP observability is not the same as generic application monitoring. The signals that matter are tied to business operations, and the thresholds that matter are tied to business consequences.

Operational Monitoring Versus Business Monitoring

Operational monitoring answers whether the system is running. Business monitoring answers whether the system is working. Both matter, but teams usually invest heavily in the first and neglect the second.

Operational signals worth tracking:

HTTP error rates and response status codes across Odoo endpoints.
Odoo worker count, active versus idle, to detect pool saturation.
PostgreSQL connection utilization.
Memory, CPU, and disk usage per container.

Business signals worth tracking:

Queue job failure rate and average processing time by channel.
Cron job errors and time since last successful run.
Order confirmation latency from creation to confirmed state.
External integration failure rates for EDI, payments, and shipping APIs.

Operational signals tell you the system is degraded. Business signals tell you whether that degradation is affecting what the company cares about.

The Performance Signals That Actually Matter

Not every metric justifies attention. Averaging response time across all routes hides the worst offenders. A sale order confirmation that takes eight seconds is invisible when averaged alongside dozens of fast requests. Track the 95th and 99th percentile response times for the five to ten endpoints your business depends on most, and track them separately.

Worker saturation is the other high-value signal. Odoo runs on a fixed pool. When all workers are occupied with slow requests, new requests queue or fail. A saturated pool usually appears before users start complaining, and it is almost always caused by a small number of slow endpoints or a long-running scheduled action holding a connection.

PostgreSQL slow query logs are the most underused diagnostic tool in Odoo operations. Most ERP performance problems at the database layer are not caused by complex queries but by the same moderately expensive query running thousands of times per hour.

Building Grafana Dashboards That Tell a Story

A useful dashboard answers a question, not just displays data. The most effective Odoo dashboards are organized around operational workflows, not system components.

Three dashboards cover most production needs. A system health overview with worker status, error rate, and queue job summary is what an on-call engineer opens first. A performance dashboard with endpoint response times, slow query counts, and worker queue depth is what an engineer uses during incident investigation. A business health dashboard with queue job failure rate by channel, cron job status, and integration success rate is what a technical lead reviews daily.

The most common mistake is combining all of these into a single panel-dense dashboard that nobody reads.

Alert Thresholds Grounded in Business Impact

Generic thresholds generate noise. An alert that fires because CPU reached sixty percent during an accounting close is not useful. An alert that fires because the queue job failure rate for outgoing invoices exceeded ten percent for fifteen consecutive minutes is a real problem.

Thresholds worth setting:

Queue job failure rate above a defined percentage sustained for several minutes, by channel.
Any cron job that has not completed successfully within twice its expected interval.
95th percentile response time for sale order confirmation exceeding a business-defined threshold.
PostgreSQL slow query count per minute rising above a normal-operations baseline.
Worker pool occupancy above ninety percent for more than five minutes.

Closing the Gap Between Feeling and Evidence

The real cost of poor ERP observability is not time spent investigating incidents. It is decisions made without data. Teams add servers without knowing whether the problem is concurrency or query volume. Engineers optimize the wrong endpoints because averages hide outliers. Business stakeholders escalate based on anecdote because there are no dashboards to show them.

Good monitoring does not prevent all problems. It shortens the time between a problem appearing and the team understanding it well enough to act.

Written By

Hector Villarreal Ortega

Backend Engineer and Odoo Expert with 8+ years of experience specializing in Python, scalable system architecture, and high-availability applications. Passionate about building robust backend systems, optimizing performance, and contributing to …

Building REST Integrations on Top of Odoo

Integrating Odoo with external systems is not a configuration exercise — it is an engineering discipline. Whether you are syncing a Shopify storefront, connecting a 3PL warehouse, or exporting invoices to an accounting platform, the decisions you make in the first week will define how much pain you absorb in production for the next three years.

Docker Compose for Real ERP Environments

Docker Compose is sometimes dismissed as a development-only tool. For many small and medium ERP teams, it can be much more useful than that when it is applied with discipline.