LogicMonitor healthcheck, Logicmonitor portal, LogicMonitor Professional Services, Monitoring

ReportMagic’s LogicMonitor HealthCheck is live

Panoramic Data are pleased to announce the General Availability of our LogicMonitor HealthCheck!

6 months in the making, and with 150+ individual checks using 100,000+ automated actions, this comprehensive report presents a series of recommendations against best practices.  See statistics, issues, errors and compliance information, and prioritized recommendations for change.

Our HealthCheck report:

  • Reviews your LogicMonitor portal settings, focusing on security, reliability and best practice compliance
  • Includes information about Device and Collector health, as well as entity grouping standards
  • Assesses your usage of LogicMonitor functionality
  • Provides you with wide-ranging feedback identifying issues and suggested remedies
  • Benefits you by giving you a more successful and stress-free experience of the LogicMonitor product
  • Includes a comprehensive, interactive Device and Alert Analytics spreadsheet
  • Uses “Heatmaps” to draw the eye into each actionable recommendation, allowing the reader to skip over purely informational items

See here for more details and to request a report.

CPU Usage, Monitoring

Monitoring EC2 T2.Small and T2.Micro instances

We ran into an issue recently where the CPU of a T2.Small instance in Amazon was using 100% CPU for a period and then dropping down to a consistent 20%.

Having a high trigger interval on the CPU usage meant we were not alerted till after it was at 100% for a fair while and then as it dropped down to 20% and the alert quickly cleared – looking at the graphs it looked good – a nice even 20%. What we did not realise at the time was that it was now 20% due to Amazon throttling it to a base level of 20%. When running TOP on the machine, we noticed it was in fact running at a 100% as far as the OS was concerned. Amazon throttles it when your instance runs out of CPU Credits and if you are only monitoring your CPU Usage you are not going to see the issue.

What is a CPU credit?

So what is a CPU Credit? From Amazon’s help pages: “A CPU Credit provides the performance of a full CPU core for one minute. Traditional Amazon EC2 instance types provide fixed performance, while T2 instances provide a baseline level of CPU performance with the ability to burst above that baseline level. The baseline performance and ability to burst are governed by CPU credits.

One CPU credit is equal to one vCPU running at 100% utilization for one minute. Other combinations of vCPUs, utilization, and time are also equal to one CPU credit; for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes.

So every minute you spend with one vCPU running full blast takes a credit from your bank, when it hits zero, very quickly in two CPU instances you find yourself throttled to 10,15 or 20% (depending on which instance type you have). So in order to monitor CPU usage properly you need to monitor your CPUCreditBalance, which luckily AWS reports through Cloudwatch.

The solution

We added two new datapoints to our existing EC2 datasource – CPUCreditBalance and CPUCreditUsage. The second one is more of an interesting rather than useful metric as it simply shows the rate at which you are using or earning CPU credits. But setting alerting on the CPUCreditBalance allows us to know that Amazon is going to throttle us before they do.

The first image below shows the misleading CPU Usage and the second shows clearly that we ran out of CPU credit.

cpu usage cpu credit
Logicmonitor DataSources, Monitoring

SSL errors and alerting using a LogicMonitor DataSource

SSL/TLS is a deceptively simple technology. It is simple to deploy, and it just works. Except the truth is – it does not really work, and it is not easy to deploy correctly! To ensure that SSL provides the necessary security, you have to put effort into properly configuring your servers.

For example, consider the  POODLE attack (Padding Oracle On Downgraded Legacy Encryption), a man-in-the-middle exploit taking advantage of Internet and security software clients’ fallback to SSL V3. An attacker could successfully exploit this vulnerability by making no more than 256 SSL 3.0 requests to reveal one byte of encrypted messages. But the time taken to check all sites under your control can quickly mount up and become a task that you leave for another day, which, in IT, means someday….

So we created a LogicModule, SSL Test, which checks your sites for certain vulnerabilities, and alerts you by email, text or voice using LogicMonitor. At the time of publication, it checks for Beast, Logjam, Freak, Heartbleed,  Luckyminus20, Debian Flaw, OpenSslCcs, drown, Known DH primes and poodle Attacks vulnerabilities. It also checks the SSL certificate matches the address.

An example alert is shown here:-

  • ID: LMD12345
  • This server, www.yourwebsite.co.uk, is vulnerable to the POODLE attack. If possible, disable SSL 3 to mitigate.

By alerting you to the fact and letting you know how to deal with it, you are saving time having to trawl through RSS feeds and security updates. You need to manually add each website you want to check as an instance in LogicMonitor.

Using SSLTest

To do this :

  1. Select a host in LogicMonitor, (it doesn’t matter which one as it is just a placeholder for the DataSource; the actual check is done from the Collector).
  2. Click the down arrow shown here:
    one
  3. Select Add monitored instance then fill out the various required values:
    two

And that is it!