Hi Everyone,
I am trying to run VMmark 1.1.1 (I know its retired but it serves the purpose ) . Not only that I am running just the fileserver test. I have the fileserver VM and the client setup and everythng works fine.
My questions is
Can we rerun the test right after it finishes?
Here is what I see
I made changes to the CONFIG file to just run the fileserver test and that too only for a few minutes
The test runs fine and I can see results files under c:\vmmar\results on my client0
When test is running , I see the following on fileserver0
But even after the STAXMon shows the test has finished , (logs show up properly in results)there is some dbench related process running on the fileserver0 vm
ileserver:/home/fileserver/dbench/src # ps -eaf| grep -i dbench
root 5984 3967 0 17:25 ? 00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000 > /home/fileserver/dbench/src/lock_mem.out
root 5985 5984 0 17:25 ? 00:00:00 /home/fileserver/dbench/src/lock_mem 25000
root 6116 5926 0 18:10 pts/1 00:00:00 grep -i dbench
Question is why are these two processes still runing and what do they mean?
Is it normal/expected behavior?
Reason I ask is because right after my 1st successful run, I re-run the same test case and although the test runs fine, I see much lower numbers in my results
I also see the after the second test has finished, I now have
fileserver:/home/fileserver/dbench/src # ps -eaf| grep -i dbench
root 5984 3967 0 17:25 ? 00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000 > /home/fileserver/dbench/src/lock_mem.out
root 5985 5984 0 17:25 ? 00:00:00 /home/fileserver/dbench/src/lock_mem 25000
root 6132 3967 0 18:10 ? 00:00:00 sh -c /home/fileserver/dbench/src/lock_mem 25000 > /home/fileserver/dbench/src/lock_mem.out
root 6133 6132 0 18:10 ? 00:00:00 /home/fileserver/dbench/src/lock_mem 25000
root 6256 5926 0 18:28 pts/1 00:00:00 grep -i dbench
My fileserver vm's performance goes down, performance chart shows high CPU usage and a 3rd run of the test fails with atimeout error
Any insight?
Hi,
Are you using the harness to initiate each run? During the restore process it should kill the lockmem processes left over from the previous run. If you're not seeing that then you might want to do a run with DEBUGFLAG=1 set in your VMmark.config file. Afterwards, zip up and attach the results folders (for your 1st and 2nd runs) and I'll take a look and see if anything stands out.
Hi jpschnee,
Yes, I am using the Harness. So I also looked in the fileserver_functions.xml and added some debug statements, the scripts seems to call /home/fileserver/kill.sh twice to make sure it starts with a cleaned up environment and also after it is done, but the lock_mem never gets killed. If I issues that same command, (/home/fileserver/kill.sh) from commandline of the fileserver VM after my test finishes, then it works well.
I will send the zipped results file soon, but I have another question, since I am running the subset anyway, for experimental purpose for now, is it OK to say that if I do some manual cleanup, the test run is still valid?
If the same command is not working from the harness but is working from the command line, it's likely you have a permissions issue. That said, since this is for your own testing on a retired benchmark, I see no reason why you can't just work around the issue by doing a manual cleanup of those processes if you don't feel like debugging further.
Sorry ,
Forgot to mentione that I did run with the DEBUG flag set to 1 in teh CONFIG file.
Quick Questions Josh
Is it OK to run /home/fileserver/kill.sh manually if the harness is not able to kill lock_mem? Are my test runs still valid?
Yes, that should be fine. If all your doing is running the kill.sh script between runs, you should be getting valid data.
C:\Documents and Settings\Administrator.>staf fileserver0 process start shell command "/home/fileserver/kill.sh" wait stderrtostdout returnstdout workdir "/home/"
Response
--------
{
Return Code: 0
Key : <None>
Files : [
{
Return Code: 0
Data :
}
]
}
But the lock_mem process does not get killed.
<script>
pnKillLockmem = "Tile %u: KillLockmem" % tilenumber
</script>
<process name='pnKillLockmem'>
<location>'%s' % tileserver </location>
<command mode="'shell'">'/home/fileserver/kill.sh'</command>
</process>
<call-with-map function="'CheckProcess'">
<call-map-arg name="'processId'">"%s" % pnKillLockmem</call-map-arg>
</call-with-map>
<!-- Rajan's Debug Code-->
<script>
returnCode = RC
Result = STAXResult
</script>
<call function="'debugSTAXUtilLogAndMsg'">
'Info: Result after clean any locked memory = %s on %s, RC = %d' % (Result, tileserver, returnCode)
</call>
20111208-10:59:13 Info: Result after clean any locked memory = None on fileserver0, RC = 0
20111208-11:07:36 Info: Cleanup ClientWorkdir = C:\\vclient0\fileserver
20111208-11:07:37 Info: FS Copy file: fileserver0.wrf Returned: RC = 0, STAFResult =
20111208-11:07:37 Info: Stop Timer: client0 : stop workload mstmr0 using SIGTERM
20111208-11:07:37 Info: Cleaning up STAF System Vari
what happens if you just run
#staf fileserver0 process start shell command "ps -aef | grep -i lock_mem" wait stderrtostdout returnstdout workdir "/home/"
C:\Documents and Settings\Administrator.GEN-VCS170>staf fileserver0 process star
t shell command "ps -eaf|grep -i lock_mem" wait stderrtostdout returnstdout work
dir "/home/"
Response
--------
{
Return Code: 0
Key : <None>
Files : [
{
Return Code: 0
Data : root 6364 3967 0 12:26 ? 00:00:00 sh -c ps -eaf
|grep -i lock_mem
root 6366 6364 0 12:26 ? 00:00:00 grep -i lock_mem
}
]
}
Well that's not showing any active lock_mem processes...
Its still there though
Also, When I run VMmark-report-ESX.sh
I see some errors. What do they mean?
How do I read the results?
Thanks a lot for your help
Well that doesn't make sense because the grep command I had you try should have picked up the same entry as the dbench grep you tried locally.
what happens if you just run (from the prime client)
#staf fileserver0 process start shell command "ps -aef | grep -i dbench" wait stderrtostdout returnstdout workdir "/home/"
Are you sure that there is only one VM up and running (IE not two fileserver VMs)?
Finally the VMmark-report-ESX.sh isn't meant to be run from the prime client or a VM, it's meant to be run by the harness on the ESX host and is used for generating the necessary data to do a submission.
C:\Documents and Settings\Administrator.GEN-VCS170>staf fileserver0 process star
t shell command "ps -aef | grep -i dbench" wait stderrtostdout returnstdout work
dir "/home/"
Response
--------
{
Return Code: 0
Key : <None>
Files : [
{
Return Code: 0
Data : root 6801 3967 0 13:29 ? 00:00:00 sh -c /home/f
ileserver/dbench/sr
root 6802 6801 0 13:29 ? 00:00:00 /home/fileserver/dbench/src/lock
root 6919 3967 0 13:35 ? 00:00:00 sh -c ps -aef | grep -i dbench
root 6921 6919 0 13:35 ? 00:00:00 grep -i dbench
}
]
}
This is again after a successful run.
Yes, there is only one fileserver0. I am using static ips in my STAF.cfg and hosts file
I ran the VMMark-report-ESX on the esxi only. It produced the ESX-report.tgz but with some error. I wanted to know whether few errors are ok and also how to read the report (I am interested in the performance numbers )
Why don't you try adding an extra line to the top of the kill.sh that outputs something to stdout and see if that shows up when you run the staf command locally?
Maybe something like:
<begin snip>
uptime
<end snip>
Then from the prime client run the command from before:
staf fileserver0 process start shell command "/home/fileserver/kill.sh" wait stderrtostdout returnstdout workdir "/home/"
As far as the report script, I expect those errors are a result of that script being out of date. There's a newer one that is used for VMmark 2 that you could probably use but it doens't generate a report in the sense that it's going to produce a text file for you to read. It grabs a bunch of the data off the system for a VMmark submission. You have to go through it manually (or write a script) to look at the data.
C:\Documents and Settings\Administrator.GEN-VCS170>staf fileserver0 process star
t shell command "/home/fileserver/kill.sh" wait stderrtostdout returnstdout work
dir "/home/"
Response
--------
{
Return Code: 0
Key : <None>
Files : [
{
Return Code: 0
Data : 2:00pm up 2:16, 2 users, load average: 0.00, 0.02, 0.8
5
}
]
}
OK, good so the script is being run. I would now add echo statements to the script so that it output per each loop. IE "echo 'killing $i'".
C:\Documents and Settings\Administrator.GEN-VCS170>staf fileserver0 process star
t shell command "/home/fileserver/kill.sh" wait stderrtostdout returnstdout work
dir "/home/"
Response
--------
{
Return Code: 0
Key : <None>
Files : [
{
Return Code: 0
Data : Hello there
Hello again
Goodbye
}
]
}