VMware data to Elasticsearch

r a v
3 min readMar 11, 2019

In the previous post(link below) I mentioned that we started ingesting data from our VMware and Storage environments to the Elastic stack.

Here’s a brief outline of how we did it:

The team were using RVTools to gather VMware data for quite some time. The data is then downloaded as spreadsheets and then compared with previous data to create slides.

Big shoutout to Rob de Veij for the excellent app. If you’re interested there’s actually another excellent app RVTools analyzer too.

So to cut down the manual work and get a historical view too, I formulated the above approach.

RVTools is ran using a batch script everyday and the csv files are copied to the central log server via winscp. RVTools provides a way to export spreadsheets separately for each module like vinfo, vtools and so on.

RVTools.exe -u rvuser -p _RVToolsPWDVfhfRyZINLPDMSnKYkt92c= -s vcenter1 -c ExportvInfo2csv -d D:\\rvreports -f vinfo.csv

In the above command we used the password encryption utility provided by RVTools to encrypt our password. So it’s safe to automate by providing the password in the batch script.

Once the files are copied over to the central log server a cron job kicks in that runs a python script to work on the collected data. It removes some columns, fills some blanks and then the wrapper shell script sends the files to logstash for ingestion to Elasticsearch.

The logstash filter consists of a series of csv and mutate filters to parse and convert the data. For example the below snippet is for the VM cluster data.

<SNIP>
} else if [program] == "VM_CLUS" {
csv {
separator => ","
columns => ["DC","Cluster","Num_ESX","Num_CPUs","Num_Cores","Cpu_usg_pct","avg_cpu_usage_pct","max_cpu_usage_pct","esx_mem_avail_Gb","Mem_usg_pct","avg_mem_usage_pct","max_mem_usage_pct","Num_VMs","VMs_per_core","vCPUs_per_core","Num_VCPUs","mem_vm_alloc_gb","prov_disk_gb","util_disk_gb"]
}
mutate {
convert => {
"Num_ESX" => "integer"
"Num_CPUs" => "integer"
"Num_Cores" => "integer"
"Cpu_usg_pct" => "float"
<SNIP>

Once the data is in Elasticsearch we’ve created dashboards for

  • VMware inventory segregated by DC, ESX, cluster and OS details
  • #VMs and #ESXs details as trends and data tables
  • datastore entitled, provisioned and used details and trends
  • We used scripted fields to calculate a data table to project the number of VMs that can be created with the remaining memory and cores. We assumed an average of 8gb per VM to calculate this.

Now that we’re talking about VMware let’s talk about how we integrated the vcenter and ESX logs too.

We forwarded the vcenter logs via syslog to our central log server which also hosts logstash. We used a vcenter filter to work on the events.

We did something similar for the ESX logs too. However here I forked the filter from sexilog, another excellent open-source project. I tweaked it a bit to suit our needs. Here’s the link to our filter if anyone’s interested.

Do feel free to let me know your views in the comments below.

--

--