VMware Performance Community
rajanyadav
Contributor
Contributor

VMMark 1.1.1 issue while running fileserver test

Hi Everyone,

I am trying to run VMmark 1.1.1 (I know its retired but it serves the purpose ) . Not only that I am running just the fileserver test. I have the fileserver VM and the client setup and everythng works fine.

My questions is

Can we rerun the test right after it finishes?

Here is what I see

I made changes to the CONFIG file to just run the fileserver test and that too only for a few minutes

The test runs fine and I can see results files under c:\vmmar\results on my client0

When test is running , I see the following on fileserver0

fileserver:~ # ps -eaf|grep -i dbench
root      3564  1819  0 22:20 ?        00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000  > /home/fileserver/dbench/src/lock_mem.out
root      3565  3564  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/lock_mem 25000
root      3567     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3568     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3569     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3570     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3571     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3572     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3573     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3574     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3575     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3576     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3577     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3578     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3579     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3580     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3581     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3582     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3583     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3584     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3585     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3586     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3587     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3588     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3589     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3590     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3591     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3592     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3593     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3594     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3595     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3596     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3597     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3598     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3599     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3600     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3601     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3602     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3603     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3604     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3605     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3606     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3607     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3608     1  0 22:20 ?        00:00:00 /home/fileserver/dbench/src/dbench -c /home/fileserver/dbench/src/client_plain.txt -p 1066 -l 1000 45 client0
root      3692  3623  0 22:31 pts/2    00:00:00 grep -i dbench

But even after the STAXMon shows the test has finished , (logs show up properly in results)there is some dbench related process running on the fileserver0 vm

ileserver:/home/fileserver/dbench/src # ps -eaf| grep -i dbench

root      5984  3967  0 17:25 ?        00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000  > /home/fileserver/dbench/src/lock_mem.out

root      5985  5984  0 17:25 ?        00:00:00 /home/fileserver/dbench/src/lock_mem 25000

root      6116  5926  0 18:10 pts/1    00:00:00 grep -i dbench

Question is why are these two processes still runing and what do they mean?

Is it normal/expected behavior?

Reason I ask is because right after my 1st successful run, I re-run the same test case and although the test runs fine, I see much lower numbers in my results

I also see the after the second test has finished, I now have

fileserver:/home/fileserver/dbench/src # ps -eaf| grep -i dbench

root      5984  3967  0 17:25 ?        00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000  > /home/fileserver/dbench/src/lock_mem.out

root      5985  5984  0 17:25 ?        00:00:00 /home/fileserver/dbench/src/lock_mem 25000

root      6132  3967  0 18:10 ?        00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000  > /home/fileserver/dbench/src/lock_mem.out

root      6133  6132  0 18:10 ?        00:00:00 /home/fileserver/dbench/src/lock_mem 25000

root      6256  5926  0 18:28 pts/1    00:00:00 grep -i dbench

My fileserver vm's performance goes down, performance chart shows high CPU usage and a 3rd run of the test fails with atimeout error

Any insight?

0 Kudos
30 Replies
jpschnee
VMware Employee
VMware Employee

Hi,

Are you using the harness to initiate each run?  During the restore process it should kill the lockmem processes left over from the previous run.  If you're not seeing that then you might want to do a run with DEBUGFLAG=1 set in your VMmark.config file.  Afterwards, zip up and attach the results folders (for your 1st and 2nd runs) and I'll take a look and see if anything stands out.

-Joshua
0 Kudos
rajanyadav
Contributor
Contributor

Hi jpschnee,

Yes, I am using the Harness. So I also looked in the fileserver_functions.xml and added some debug statements, the scripts seems to call /home/fileserver/kill.sh twice to make sure it starts with a cleaned up environment and also after it is done, but the lock_mem never gets killed. If I issues that same command, (/home/fileserver/kill.sh) from commandline of the fileserver VM after my test finishes, then it works well.

I will send the zipped results file soon, but I have another question, since I am running the subset anyway, for experimental purpose for now, is it OK to say that if I do some manual cleanup, the test run is still valid?

0 Kudos
jpschnee
VMware Employee
VMware Employee

If the same command is not working from the harness but is working from the command line, it's likely you have a permissions issue.  That said, since this is for your own testing on a retired benchmark, I see no reason why you can't just work around the issue by doing a manual cleanup of those processes if you don't feel like debugging further.

-Joshua
0 Kudos
rajanyadav
Contributor
Contributor

Sorry ,

Forgot to mentione that I did run with the DEBUG flag set to 1 in teh CONFIG file.

0 Kudos
rajanyadav
Contributor
Contributor

Quick Questions Josh

Is it OK to run /home/fileserver/kill.sh manually if the harness is not able to kill lock_mem? Are my test runs still valid?

0 Kudos
jpschnee
VMware Employee
VMware Employee

Yes, that should be fine.  If all your doing is running the kill.sh script between runs, you should be getting valid data.

-Joshua
0 Kudos
rajanyadav
Contributor
Contributor

C:\Documents and Settings\Administrator.>staf fileserver0 process start shell command "/home/fileserver/kill.sh" wait stderrtostdout returnstdout workdir "/home/"
Response
--------
{
  Return Code: 0
  Key        : <None>
  Files      : [
    {
      Return Code: 0
      Data       :
    }
  ]
}

But the lock_mem process does not get killed.

<script>
        pnKillLockmem = "Tile %u: KillLockmem" % tilenumber
      </script>
      <process name='pnKillLockmem'>
        <location>'%s' % tileserver </location>
        <command mode="'shell'">'/home/fileserver/kill.sh'</command>
      </process>
      <call-with-map function="'CheckProcess'">
       <call-map-arg name="'processId'">"%s" % pnKillLockmem</call-map-arg>
      </call-with-map>
      <!-- Rajan's Debug Code-->
      <script>
        returnCode = RC
        Result = STAXResult
      </script>
      <call function="'debugSTAXUtilLogAndMsg'">
        'Info: Result after clean any locked memory = %s on %s, RC = %d' % (Result, tileserver, returnCode)
      </call>

20111208-10:59:13    Info: Result after clean any locked memory = None on fileserver0, RC = 0
20111208-11:07:36    Info: Cleanup ClientWorkdir = C:\\vclient0\fileserver
20111208-11:07:37    Info: FS Copy file: fileserver0.wrf Returned: RC = 0, STAFResult =
20111208-11:07:37    Info: Stop Timer: client0 : stop workload mstmr0 using SIGTERM
20111208-11:07:37    Info: Cleaning up STAF System Vari

0 Kudos
jpschnee
VMware Employee
VMware Employee

what happens if you just run

#staf fileserver0 process start shell command "ps -aef | grep -i lock_mem" wait stderrtostdout returnstdout workdir "/home/"

-Joshua
0 Kudos
rajanyadav
Contributor
Contributor

C:\Documents and Settings\Administrator.GEN-VCS170>staf fileserver0 process star
t shell command "ps -eaf|grep -i lock_mem" wait stderrtostdout returnstdout work
dir "/home/"
Response
--------
{
  Return Code: 0
  Key        : <None>
  Files      : [
    {
      Return Code: 0
      Data       : root      6364  3967  0 12:26 ?        00:00:00 sh -c ps -eaf
|grep -i lock_mem
root      6366  6364  0 12:26 ?        00:00:00 grep -i lock_mem

    }
  ]
}

0 Kudos
jpschnee
VMware Employee
VMware Employee

Well that's not showing any active lock_mem processes... 

-Joshua
0 Kudos
rajanyadav
Contributor
Contributor

Its still there though

fileserver:~ # ps -eaf | grep -i dbench
root      6248  3967  0 12:20 ?        00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000  > /home/fileserver/dbench/src/lock_mem.out
root      6249  6248  0 12:20 ?        00:00:00 /home/fileserver/dbench/src/lock_mem 25000
root      6369  5878  0 12:31 pts/0    00:00:00 grep -i dbench
0 Kudos
rajanyadav
Contributor
Contributor

Also, When I run VMmark-report-ESX.sh

/reportingtools # ls
Readme.txt              VM-name-map.txt         VMmark-report-ESX.sh    VMmarkConfigChecker.pl  convert2html.pl         disclosure.html         run_report_script.sh    tilescore2html.sh
/reportingtools # ./VMmark-report-ESX.sh
VMmark Reporting Script 1.1.1
Preparing files: \ Error running service --status-all or writing to /var/tmp/service.356219.txt (err 127).
Continuing...
Preparing files: | Error running dmidecode or writing to /var/tmp/dmidecode.356219.txt (err 127).
Continuing...
Waiting for background commands: -
Creating tar archive ...
tar: removing leading '/' from member names
File: /var/tmp/ESX-report.tgz
Please attach this file when submitting VMmark results.
To see the files collected, run: tar -tzf /var/tmp/ESX-report.tgz
Done.

I see some errors. What do they mean?

How do I read the results?

Thanks  a lot for your help

0 Kudos
jpschnee
VMware Employee
VMware Employee

Well that doesn't make sense because the grep command I had you try should have picked up the same entry as the dbench grep you tried locally. 

what happens if you just run (from the prime client)

#staf fileserver0 process start shell command "ps -aef | grep -i dbench" wait stderrtostdout returnstdout workdir "/home/"

Are you sure that there is only one VM up and running (IE not two fileserver VMs)?

Finally the VMmark-report-ESX.sh isn't meant to be run from the prime client or a VM, it's meant to be run by the harness on the ESX host and is used for generating the necessary data to do a submission.

-Joshua
0 Kudos
rajanyadav
Contributor
Contributor

C:\Documents and Settings\Administrator.GEN-VCS170>staf fileserver0 process star
t shell command "ps -aef | grep -i dbench" wait stderrtostdout returnstdout work
dir "/home/"
Response
--------
{
  Return Code: 0
  Key        : <None>
  Files      : [
    {
      Return Code: 0
      Data       : root      6801  3967  0 13:29 ?        00:00:00 sh -c /home/f
ileserver/dbench/sr
root      6802  6801  0 13:29 ?        00:00:00 /home/fileserver/dbench/src/lock

root      6919  3967  0 13:35 ?        00:00:00 sh -c ps -aef | grep -i dbench
root      6921  6919  0 13:35 ?        00:00:00 grep -i dbench

    }
  ]
}

This is again after a successful run.

Yes, there is only one fileserver0. I am using static ips in my STAF.cfg and hosts file

I ran the VMMark-report-ESX on the esxi only. It produced the ESX-report.tgz but with some error. I wanted to know whether few errors are ok and also how to read the report (I am interested in the performance numbers )

0 Kudos
jpschnee
VMware Employee
VMware Employee

Why don't you try adding an extra line to the top of the kill.sh that outputs something to stdout and see if that shows up when you run the staf command locally?

Maybe something like:

<begin snip>

!/bin/sh
#
# Run script on FileServer VM's to release shared memory held by lock_mem
# and halt dbench processes
#

uptime

<end snip>

Then from the prime client run the command from before:

staf fileserver0 process start shell command "/home/fileserver/kill.sh" wait stderrtostdout returnstdout workdir "/home/"

As far as the report script, I expect those errors are a result of that script being out of date.  There's a newer one that is used for VMmark 2 that you could probably use but it doens't generate a report in the sense that it's going to produce a text file for you to read.  It grabs a bunch of the data off the system for a VMmark submission.  You have to go through it manually (or write a script) to look at the data.

-Joshua
0 Kudos
rajanyadav
Contributor
Contributor

C:\Documents and Settings\Administrator.GEN-VCS170>staf fileserver0 process star
t shell command "/home/fileserver/kill.sh" wait stderrtostdout returnstdout work
dir "/home/"
Response
--------
{
  Return Code: 0
  Key        : <None>
  Files      : [
    {
      Return Code: 0
      Data       :   2:00pm  up   2:16,  2 users,  load average: 0.00, 0.02, 0.8
5

    }
  ]
}

0 Kudos
rajanyadav
Contributor
Contributor

fileserver:~ # cat /home/fileserver/kill.sh
#!/bin/sh
#
# Run script on FileServer VM's to release shared memory held by lock_mem
# and halt dbench processes
#
uptime
#kill -9 `ps -aef | grep -i run_dbench | awk '{print $2}'`
for i in `ps -aef | grep -i tbench_srv | awk '{print $2}'`
do
  kill -9 $i
done
for i in `ps -aef | grep -i lock_mem | grep -v grep | awk '{print $2}'`
do
  kill -9 $i
done
for i in `ps -aef | grep -i client_plain | grep -v grep | awk '{print $2}'`
do
  kill -9 $i
done
fileserver:~ # ps -eaf | grep -i dbench
root      6801  3967  0 13:29 ?        00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000  > /home/fileserver/dbench/src/lock_mem.out
root      6802  6801  0 13:29 ?        00:00:00 /home/fileserver/dbench/src/lock_mem 25000
root      6947  5878  0 14:02 pts/0    00:00:00 grep -i dbench
0 Kudos
jpschnee
VMware Employee
VMware Employee

OK, good so the script is being run.  I would now add echo statements to the script so that it output per each loop.  IE "echo 'killing $i'".

-Joshua
0 Kudos
rajanyadav
Contributor
Contributor

C:\Documents and Settings\Administrator.GEN-VCS170>staf fileserver0 process star
t shell command "/home/fileserver/kill.sh" wait stderrtostdout returnstdout work
dir "/home/"
Response
--------
{
  Return Code: 0
  Key        : <None>
  Files      : [
    {
      Return Code: 0
      Data       : Hello there
Hello again
Goodbye

    }
  ]
}

0 Kudos