VMware Performance Community
kfranks
Contributor
Contributor
Jump to solution

Mailserver workload startup issue - 'unpack sequence too long'

All,

I'm experiencing a problem trying to get my first tile running. In the starting workload setup sequence, while starting the mailserver I see:

Screen shot 2012-01-09 at 10.39.04 AM.png

Any ideas?

Thanks,

-Kirk

0 Kudos
1 Solution

Accepted Solutions
jpschnee
VMware Employee
VMware Employee
Jump to solution

Hi Kirk,

Can you attach your VMMARK2.config file?  Also, what versions of STAF and STAX are you using?  Finally, can you confirm that on your client0 c:\vclient0\mailserver\vmmark2template.xml exists and that it is in that exact path?

Thanks,

-Joshua

View solution in original post

0 Kudos
19 Replies
jpschnee
VMware Employee
VMware Employee
Jump to solution

Hi Kirk,

Can you attach your VMMARK2.config file?  Also, what versions of STAF and STAX are you using?  Finally, can you confirm that on your client0 c:\vclient0\mailserver\vmmark2template.xml exists and that it is in that exact path?

Thanks,

-Joshua
0 Kudos
kfranks
Contributor
Contributor
Jump to solution

Hi,

My VMMARK2.config is attached and I have:

  STAF @ 3.4.7

  STAX @ 3.4.5

Screen shot 2012-01-09 at 2.32.33 PM.png

Regards,

-Kirk

0 Kudos
jpschnee
VMware Employee
VMware Employee
Jump to solution

At first glance it appears your VMmark2.config file has the following:

MailServer/MailQualifier="V7MX1.VTG.VCE.COM"

This needs to be only 3 parts : for example "VTG.VCE.COM".

Can you try changing that and see what happens on your next run?

-Joshua
kfranks
Contributor
Contributor
Jump to solution

Thanks! The process did indeed get much further, but sadly it did not complete.

In the messages log, I see some trust level issues. It shows several operations needing level 4 or 5, and it seems to be set to 3. I could swear I set that to 5, I'll have to check on that. There's an OS customization NOT found Info Error, that it seemed to pass through. But it stopped on 'OlioDB Tile0 failed setup : Error Copying ConfigOlioJSPdb.txt   ...   with '??? Tile0 failed setup' errors for the other VMs.

Does this seem indicative of something I failed to configure properly?

Regards,

-Kirk

0 Kudos
jpschnee
VMware Employee
VMware Employee
Jump to solution

Glad to hear it is getting past that issue.

You should probably go back through and confirm your staf trust settings on both the VMs and the clients.  For the OS customization not found error, you should probably verify that your vCServer indeed has your created OS customization file and that it is exactly named in your VMmark2.config.  The error copying ConfigOlioJSPdb.txt is likely additional issues with staf permissions.

-Joshua
kfranks
Contributor
Contributor
Jump to solution

I fixed the staf trust problems and I was able to get the (4) tests to run and show passes. I did see about four 'ds2webdriver.exe has stopped working' errors and a couple more that didn't name what stopped working. I also didn't get a result after it finished. I think I'm pretty close now to having it work correctly.

Any thoughts on the webdriver problem?

Thanks,

-Kirk

0 Kudos
jpschnee
VMware Employee
VMware Employee
Jump to solution

It's hard for me to say without additional information. 

Do you see WRF files in your results directory?  You might try zipping up your results folder and attaching it here for me to review. 

-Joshua
0 Kudos
kfranks
Contributor
Contributor
Jump to solution

I did see a lot of WRF files in the results directory (zip file attached).

Thanks,

-Kirk

0 Kudos
jpschnee
VMware Employee
VMware Employee
Jump to solution

Kirk,

Based on on what I see within your results directory, you don't seem to have setup the DS2 database.  I'd recommend going back through the benchmarking guide and making sure the individual workloads have been configured as defined.

-Joshua
0 Kudos
kfranks
Contributor
Contributor
Jump to solution

Yes, you are correct. In my haste to make sure the Windows VMs were complete I had neglected to see the database creation instructions for DS2. I am using the OVFs and when I saw the note saying I could skip the Olio section, I assumed (wrongly) that it applied to the DS2 machines as well.

Thanks,

-Kirk

0 Kudos
kfranks
Contributor
Contributor
Jump to solution

Thanks for your help!

I figured out the issue with ds2webdriver.exe processes stopping. Since I renamed by Olio and DS2 database machines, I had to add references to their default names (i.e. DS2-DB and Olio-DB) into the appropriate web machine hosts files.

The test execution phase is completing without errors now. And now I'm looking at a report phase 'initial SSH connection timed out'  issue to the tile's hosts. I had enabled ssh in the hosts but for some reason VMmark is not getting in. I'm searching for the place the password is specified.

Regards,

-Kirk

0 Kudos
jpschnee
VMware Employee
VMware Employee
Jump to solution

The harness expects that passwordless ssh has been configured from the prime client to all hosts under test. The benchmarking guide should have instructions to help you set this up.

Thanks,

-Joshua

-Joshua
0 Kudos
kfranks
Contributor
Contributor
Jump to solution

I've enabled the passwordless ssh to the hosts. I'm debugging reporting now, running short (15 min) tests with reporting turned on. I'm seeing an error in the output file saying "Error: could not resolve start-time. Results data missing from one or more *.wrf files or time on clients not synchronized". The result of a full-length run indicates the same error.

I don't know what a complete wrt file set comprises, there's no list of what's expected in the Guide. The VMs and the hosts are sync'd, I checked them all before this run. I see the records in the Host message logs showing UTC timestamped records. According to what I've read about 4.1, it is supposed to be that way. I've attached my latest run's fileset, less the two host tgz files because they're too large to send.

What exactly is the harness looking at to determine the start time? The output file shows this to be something's epoch time (Start_time 0 : Wed Dec 31 16:00:00 1969), which doesn't seem right.

Thanks,

-Kirk

0 Kudos
jpschnee
VMware Employee
VMware Employee
Jump to solution

Kirk,

Your client seems to be having trouble making the ds2web connections (see the threads aborting in the DS2Web*wrf files).  You might try reviewing the section on setting up your hosts file, proxy settings and applying the ProxySettingsPerUser.reg file as a first pass at debugging.

"Error: could not resolve start-time. Results data missing from one or more *.wrf files or time on clients not synchronized"

You'll see this message whenever there isn't actual data in the wrf files, it's not only start time.

-Joshua

-Joshua
0 Kudos
kfranks
Contributor
Contributor
Jump to solution

Joshua,

I modified all of the hosts files to essentially contain:

192.168.201.30 Client0
192.168.201.31 Mailserver0
192.168.201.32 Standby0
192.168.201.33 OlioDB0 Olio-DB
192.168.201.34 OlioWeb0
192.168.201.35 DS2DB0 DS2-DB
192.168.201.36 DS2WebA0
192.168.201.37 DS2WebB0
192.168.201.38 DS2WebC0
Next I ran a full length test to see if the behavior was different that that of a 15 minute test, and it is. I see results in the webB and webC .wrt files. In each results are preceeded by several iterations in which the threads timeout (B had success on the 9th iteration; C was at 5). Not all iterations from that point are successful and there is no apparent pattern to the successes and failures, they appear random. webA does not contain any results, but I suspect that since it is a short burst of activity ran periodically, there is not enough time in the activity stream for it to ever be successful connecting threads.
I've looked at the DB's messages log but don't see anything in it to correlate to. I was wondering if there's a way to extend the thread timeout value (temporarily) to see what affect that may have. Is there anything that can inhibit the DB from accepting the connections promptly? These VMs were stood up from the templates.
My tile runs in a cluster of two blades (B200 M2s with 32GB RAM each) and the client is on another blade outside of the workload cluster.

Any thoughts?

Thanks,

-Kirk

Message was edited by: kfranks

0 Kudos
RebeccaG
Expert
Expert
Jump to solution

Hi Kirk,

When the client is timing out connecting to DS2Web, it's usually indicative of a configuration issue in the hosts files or proxy settings as Josh has mentioned.

When this problem crops up, we will usually see some intermittant connection to the DS2Web VMs anyway. Because a successful connection seems to occur randomly with a small probability, you've only seen it happen on the long run so far. I don't believe increasing the timeout would help here. However, once this root issue has been fixed, the client should connect very reliably. Have you also checked that the client is not using a proxy server, and applied ProxySettingsPerUser.reg?

Rebecca

0 Kudos
kfranks
Contributor
Contributor
Jump to solution

Rebecca,

The client's hosts file looks solid, and the VMmark VM entries are identical across all VMmark VM hosts files. I can ping from the client to the DS2Web VMs by name, with low latency. I can STAF ping the DS2Web VMs from the client, and they respond quickly. And there's no proxy configured in my client.

Here's a screenshot of the client's hosts file (all VM hosts files contain copies of this mapping),

Screen shot 2012-01-19 at 1.56.29 PM.png

conventional pings from the client to the DS2 VMs,

Screen shot 2012-01-19 at 2.31.40 PM.png

STAF pings from the client to the DS2 VMs

Screen shot 2012-01-19 at 2.38.49 PM.png

and the proxy setting on the client.

Screen shot 2012-01-19 at 2.35.31 PM.png

Is there a way to script one of these connect commands so I can capture latency information?

And are there other things I should be looking at?

Thanks,

-Kirk

0 Kudos
jamesz08
VMware Employee
VMware Employee
Jump to solution

Uncheck "Automatically Detect Settings" for the proxy on your client.  That can cause intermittent issues.  Also make sure that your .net installation on the client is updated, you can just run Windows update if your client has internet access.

0 Kudos
kfranks
Contributor
Contributor
Jump to solution

Unchecking the automatic detection box solved the problem. My tile seems to be producing results in the wrt files, at least for a short run.

Thanks,

-Kirk

0 Kudos