``metrix`` ========== ``metrix`` is a Python library for tracking metrics through streams, with configurable tagging, batching, aggregating, and outputting of individual elements. It's designed for handling multiple metrics collected individually at high rates with shared output destinations, such as a log file and/or database, especially in the case that outputting metrics is only required or desired at lower rates. installation ------------ This package hasn't yet been published to PyPi (working on it...), but is still readily installable via ``pip``: .. code-block:: bash $ python -m pip install git+https://github.com/bdewilde/metrix.git#egg=metrix As usual, this will also install direct dependencies -- ``streamz`` and ``toolz`` -- as well as a few other packages needed for them to work. overview -------- Users' entry into ``metrix`` is primarily through the :class:`MCoordinator ` class. It coordinates the flow of metric elements through one or multiple streams into one or multiple output destinations (called "sinks"), with an optional rate limit imposed before each sink to avoid any problematic pileups. Configuration is left up to users, since it's entirely dependent on specific contexts and use cases. There are a few key considerations: - the metrics to track: their names; any tags to apply to elements by default; how often to batch elements, either in time or number; and the aggregation(s) to be performed on batches of values - what to do with the aggregated metrics: where results should go, and if they should be submitted at throttled rates It's important to note that a single metric stream may output *multiple* aggregated values per batch, since each unique (name, aggregation, tags) combination is grouped together before sending them on to their destinations. example ------- Let's say we want to track the total number of articles published to a news site in 5-minute tumbling windows, as well as the total and average word counts per batch. We also want to tag articles by the section in which they're published, but this is only needed for the total published counts. Lastly, we'd like to log the aggregated results for diagnostic purposes. How can we do this with ``metrix``? .. code-block:: pycon >>> import logging, statistics >>> from metrix import MCoordinator, MStream, MSinkLogger >>> # configure MC with metric streams and sinks >>> mc = MCoordinator( ... mstreams=[ ... MStream( ... "n_articles", # metric name ... agg=sum, # function to get the total number per batch ... default_tags={"section": "NA"}, # default article tag, to avoid null values ... window_size=300, # aggregate every 300 seconds (5 minutes) ... ), ... MStream( ... "wc", # metric name (shorthand for "word count") ... agg=[sum, statistics.mean], # functions to get total and average values ... window_size=300, # same as n_articles, tho this isn't required ... ), ... ], ... msinks=[ ... MSinkLogger(level=logging.INFO), # log agg'd metrics at "info" level ... ], ... rate_limit=0.5, # impose a half-second rate limit for [technical reason] ... ) >>> # fake articles data (but let's pretend) >>> articles = [ ... {"text": "...", "section": "politics"}, ... {"text": "...", "section": "science"}, ... {"text": "..."}, # missing section! ... ... ... ] >>> # send metric elements for each article into corresponding streams via the MC >>> for article in articles: ... mc.send( ... "n_articles", # metric name ... 1, # metric value, to be aggregated with other values ... tags={"section": article["section"]} if article.get("section") else None, # optional tags ... ) ... mc.send("wc", len(article["text"].split())) >>> # five minutes later... (fake aggregated metrics data, but again, let's pretend) INFO:metrix.sinks:MElement(name='n_articles.sum', value=625, tags={'section': 'politics'}) INFO:metrix.sinks:MElement(name='n_articles.sum', value=290, tags={'section': 'science'}) INFO:metrix.sinks:MElement(name='n_articles.sum', value=35, tags={'section': 'NA'}) INFO:metrix.sinks:MElement(name='wc.sum', value=161690, tags=None) INFO:metrix.sinks:MElement(name='wc.mean', value=170.2, tags=None) With the optional ``networkx`` and ``graphviz`` dependencies installed, we can easily visualize what the corresponding collection of metric streams and sinks looks like: .. code-block:: pycon >>> mc.stream.visualize(filename=None) .. image:: _static/overview-stream.png :scale: 75 % :align: center :alt: Network diagram of metric streams and sinks. performance ----------- ``metrix`` provides features and an API tailored to a particular use case -- "metrics tracking through streams" -- but under the hood, ``streamz`` and ``toolz`` do the heavy lifting. As such, this package's performance is largely dependent on theirs. According to `the docs `_, ``streamz`` adds microsecond overhead to typical Python operations. In practical terms, ``metrix`` can process up to about 7500 metric elements per second (not including any I/O costs associated with sinks, such as network connections or disk writes). This throughput doesn't vary much by the complexity of the input or output streams; it's dominated by the total number of messages sent in. .. toctree:: :maxdepth: 2 :caption: API Reference: api_reference/coordinator api_reference/stream api_reference/sinks api_reference/element indices and tables ------------------ * :ref:`genindex` * :ref:`modindex` * :ref:`search`