Time Series Database (Python Rewrite)

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Time Series Database (Python Rewrite)

      Hallo Zusammen,

      ich habe den Bloonix-Server in Python 3 soweit fertig, dass ich nun an den Punkt gekommen bin, dass ich mir Gedanken um eine Alternative zu Elasticsearch machen muss.

      Die Daten, die derzeit in Elasticsearch landen, sind zeitbasierte Daten, sprich die Metriken von Checks sowie Events als auch der debug Output von Checks (mtr, http header, etc.).

      Bislang habe ich ein paar interessante Alternativen gefunden (aber noch nicht näher angesehen):

      • Riak TS
      • Cassandra
      • DynamoDB
      • HBase / OpenTSDB
      • InfluxDB

      Gibt es jemanden, der mit der einen oder anderen Software schon Erfahrung sammeln konnte und hier etwas darüber erzählen möchte?

      Wer darüber hinaus noch Ideen hat, dann nur raus damit. Wichtig Punkte sind:
      • NoSQL
      • Sharding
      • Replikation
      • Optimal zur Speicherung von Time Series Data als JSON String, Beispiel: {"time":1483733028000, "host_id": 10, "service_id": 20, "plugin_id": 30, "data": {"cpu_user": .....}}
      Viele Grüße
      Jonny
    • Johnny,
      Why not consider using the database (MySQL/Postgres) and moving to something else only when they prove to be unsuitable?
      I am sure you'd find they are good enough for bloonix's needs.

      Vividcortex, which injects more data at a finer granularity (1s) than bloonix went with a MySQL solution.

      Here are some links to their reasoning about requirements of a time series database and the solution they settled on:

      xaprb.com/blog/2014/06/08/time-series-database-requirements/

      cdn2.hubspot.net/hubfs/498921/…-Series_Data_In_MySQL.pdf
    • Then Vividcortex makes it wrong. :)

      A setup with 1000 hosts and each 40 services, interval 60 seconds and a history of 365 days = 21 billions rows.

      Believe me, MySQL/PostgreSQL is no option. I run tests with billions of rows and it didn't work.... a locking horror and the replication hangs thousands of seconds behind master.

      Lots NoSQL databases are much faster and has a very good sharding and replication functionalety.
    • 21 billion rows representing 365 days is not a problem for either MySQL or postgresql when you take into account how the data is used and design the solution accordingly.

      For example the default chart data displays 3 hours of results, if you list how the data is used and the characteristics a solution must have, you can design a relational database-backed solution tuned for those needs. For example, all 365 days and 21 billion rows of data will not be in a single table. There is no good reason for the schema to have that data in a single table. You would partition the data by time range so each query would access a small amount of data. Depending on the usage characteristics there are many other design choices that would make it even more performant.

      It would be useful to know the schema and design of the solution you tested that didn't work with billions of rows. It would be even more useful to know the characteristics you've determined are useful for a time-series database for bloonix. If you do that, I can come up with a relational database-backed solution that we can test and compare with others. Listing the properties the solution must have (independent of technology) allows us to evaluate how potential solutions meet those characteristics.


      Baron Schwartz - the CEO of vividcortex and the architect of their solution - is a leading MySQL (and database expert). He was the main author of High Performance MySQL, and you are probably familiar with maatkit (the highly useful perl set of tools for mysql which he wrote admin-magazine.com/Archive/201…r-database-administrators ) later renamed percona toolkit (percona.com/software/database-tools/percona-toolkit ).
    • Hey dr3ad, I think I was a bit too hasty with my statement :)

      In the default setup, metrics are indeed stored in MariaDB/PostgreSQL. The tables for events and metrics are managed by the Bloonix server via partitions, i.e. new range partitions are automatically created and older partitions are automatically deleted. In addition, the metrics can be outsourced to another database, i.e. you can physically separate the Bloonix schema and the metrics if you wish.

      TimescaleDB will probably be another alternative on offer.

      Regards
      Jonny