/* * Copyright 2013 LinkedIn Corp. All rights reserved * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. */ package com.linkedin.databus.client.pub.mbean; /** * Simplified/unified metrics for Databus consumers. Both inbound (from the relay or * bootstrap server) and outbound (to the consumer via callbacks) statistics are included. * * This interface is used both for per-partition (single source), non-exposed statistics * and for two kinds of exposed aggregates: single-source (single connection) and multi- * source (multiple connections). */ public interface UnifiedClientStatsMBean { /** GETTERS */ // TODO: track changes in canonical config fingerprint (e.g., checksum of key/value // pairs in sorted order)? (ideally would persist previous value, but could // also just expose CRC-32 as [unsigned?] int metric) /** * Number of partitions or tables currently bootstrapping. At the per-partition or * per-table (lowest) level, this is equivalent to a boolean: 0 if pulling from the * relay, 1 if pulling from the bootstrap server. For Espresso ("v3") bootstraps, which * limit concurrent bootstraps to one, the aggregate level will show the total number * of partitions/tables either bootstrapping or waiting to bootstrap; only one will * be making progress at any given time. (Oracle bootstraps may run in parallel.) */ // TODO: expand description to describe v2, v3, and client-load-balancing nuances public int getCurBootstrappingPartitions(); /** * Number of connections (per-table or per-partition) currently suspended, i.e., dead. * Though it is possible to manually resurrect a suspended connection via the JMX * console, by default they remain dead. The normal fix is to bounce the application. */ public int getCurDeadConnections(); /** * Number of errors returned by consumer callbacks (i.e., by application code). * All callbacks are considered, not just onDataEvent() ones. */ public long getNumConsumerErrors(); /** * Number of data events received by the client library from the relay and/or * bootstrap server. */ // [NOTE: consumers can trivially do their own callback counts if they care] public long getNumDataEvents(); /** * Median time interval, in milliseconds, between the time recently received * events were committed at the source database and the time they were received * by the Databus client library. This is a sampled approximation, weighted most * heavily toward events received in the past five minutes. When aggregated, this * is the median interval across all subscribed partitions. */ public double getTimeLagSourceToReceiptMs_HistPct_50(); /** * 90th-percentile time interval, in milliseconds, between the time recently received * events were committed at the source database and the time they were received * by the Databus client library. This is a sampled approximation, weighted most * heavily toward events received in the past five minutes. When aggregated, this * is the 90th-percentile interval across all subscribed partitions. */ public double getTimeLagSourceToReceiptMs_HistPct_90(); /** * 95th-percentile time interval, in milliseconds, between the time recently received * events were committed at the source database and the time they were received * by the Databus client library. This is a sampled approximation, weighted most * heavily toward events received in the past five minutes. When aggregated, this * is the 95th-percentile interval across all subscribed partitions. */ public double getTimeLagSourceToReceiptMs_HistPct_95(); /** * 99th-percentile time interval, in milliseconds, between the time recently received * events were committed at the source database and the time they were received * by the Databus client library. This is a sampled approximation, weighted most * heavily toward events received in the past five minutes. When aggregated, this * is the 99th-percentile interval across all subscribed partitions. */ public double getTimeLagSourceToReceiptMs_HistPct_99(); /** * Time interval, in milliseconds, between the receipt of the most recent event * and the current time. When aggregated, this is the max interval across all * subscribed partitions. */ public long getTimeLagLastReceivedToNowMs(); /** * Max time interval, in milliseconds, for the consumer application to process * callbacks. This is a sampled approximation, weighted most heavily toward events * received in the past five minutes. When aggregated, this is the corresponding * max across all subscribed partitions. Note that all callbacks are considered, * not just onDataEvent() ones. */ public double getTimeLagConsumerCallbacksMs_Max(); /** * Median time interval, in milliseconds, for the consumer application to process * callbacks. This is a sampled approximation, weighted most heavily toward events * received in the past five minutes. When aggregated, this is the corresponding * median across all subscribed partitions. Note that all callbacks are considered, * not just onDataEvent() ones. */ public double getTimeLagConsumerCallbacksMs_HistPct_50(); /** * 90th-percentile time interval, in milliseconds, for the consumer application to * process callbacks. This is a sampled approximation, weighted most heavily toward * events received in the past five minutes. When aggregated, this is the corresponding * 90th-percentile interval across all subscribed partitions. Note that all callbacks * are considered, not just onDataEvent() ones. */ public double getTimeLagConsumerCallbacksMs_HistPct_90(); /** * 95th-percentile time interval, in milliseconds, for the consumer application to * process callbacks. This is a sampled approximation, weighted most heavily toward * events received in the past five minutes. When aggregated, this is the corresponding * 95th-percentile interval across all subscribed partitions. Note that all callbacks * are considered, not just onDataEvent() ones. */ public double getTimeLagConsumerCallbacksMs_HistPct_95(); /** * 99th-percentile time interval, in milliseconds, for the consumer application to * process callbacks. This is a sampled approximation, weighted most heavily toward * events received in the past five minutes. When aggregated, this is the corresponding * 99th-percentile interval across all subscribed partitions. Note that all callbacks * are considered, not just onDataEvent() ones. */ public double getTimeLagConsumerCallbacksMs_HistPct_99(); /** MUTATORS */ void reset(); }