Real-time metrics

The Analytics tab in a bot’s dashboard contains a section called Real-time Metrics. Here you’ll find important information that can give you insight into how hard a bot is working. If a bot is not responding quickly, these metrics will help you troubleshoot what might be causing the delay.

Historical context

In the past, all events from all bots were processed in a single queue. As the number of events increased, more Meya servers would be spun up to maintain a consistent throughput of events. These new servers required the events to re-enqueued resulting in high end-to-end latency. One side-effect of this architecture is that it was possible for a single bot to slow down the platform for all customers by generating a massive number of events through a broadcast or /start event.

A new paradigm

In the new architecture, all bots are isolated from each other. Each bot is allocated a certain number of “cores”, which can be thought of as additional instances of the bot, to process incoming events for that bot. This has two important implications:

  1. No bot can consume more than its allotted cores.
  2. Any increased latency generated by a particular bot will be limited in scope to that bot. In other words, no bot can increase the latency of another bot.

The real-time metrics section on the Analytics tab provides insight into how many cores a bot is consuming, as well as how many events are enqueued.

Important concepts

In order to interpret the metrics correctly, it is necessary to understand the following concepts.

Push versus pull events

Push events are those triggered by the Meya system, while pull events are triggered by the user.

Examples of push events:

  • Broadcast messages (i.e. triggering a flow for a group of users).
  • Start flow events (i.e. using the Meya API to start a flow for one or more users).

Examples of pull events:

  • User submits text
  • User clicks a button
  • User opens the Meya Web chat widget triggering a start_chat event.

Active cores

Active cores represent additional instances of your bot. A single bot instance can process many events very quickly, so normally this number will be 0, meaning no additional bot instances are required to process the incoming stream of events.

If many events are generated at the same time (often due to a large broadcast, or /start event), you may see the number of active cores increase as more bot instances are automatically spun up to handle the extra load.

As each bot instance completes the events it’s handling, the Meya platform looks at the number of queued events to see if that bot instance is still necessary. If not, the instance will gracefully shutdown. Eventually, all additional bot instances will end and active cores will be 0 again.

Number of cores

A finely-tuned algorithm determines the exact number of cores that should be allocated to each bot in order to maintain low latency on the Meya platform. All bots on the platform are allocated the same number of cores to ensure optimal platform performance for all customers. “Optimal performance” is defined as:

  • Low latency: latency is the time, in seconds, required to process an event.
  • High throughput: throughput is the number of events processed per second.

Buffered events

If enough events are generated within a short period of time (often due to a large broadcast, or /start event), it is possible that the rate of incoming events may still exceed the rate at which additional bot instances are able to process them, even when active cores are maximized. When this happens, the incoming events are enqueued.

You can see the number of enqueued events in the Buffered Events fields in both the Pull Cores and Push Cores sections.

Why are there separate cores and queues for push and pull events?

Recall that a pull event is generated when a user is actually interacting with the bot. In order to provide a good user experience the bot should respond to input as quickly as possible.

Push events on the other hand--like broadcasts and /starts--can, and most often do, occur when the user is not using the bot.

In other words, you can think of the pull cores and queues as a separate priority event queue for users who are currently interacting with the bot.