VMware Cloud Community
exponent
Contributor
Contributor

PXE Fails TFTP Download

Hello, we are having an issue with PXE booting in VMware using a large than 1432 RamDiskTFTPBlockSize.  We have tried using the following settings for the block size: 2048, 4096, 8192, 16384 and they all fail with error code 0xC0000001.  When using the E1000 NIC, the info on the error states: "A required device isn't connected or can't be accessed".  When using VMXNet3, the info on the error states "File: \Windows\System32\boot\winload.efi; Info: The application or operating system couldn't be loaded because a required file is missing or contains error".  This specific test was done with UFEFI, but we get the same results when using BIOS (just always get the first error, not the winload.efi).

We need to increase the TFTP Block Size because some of our other NICs can't perform adequately with the default setting and require an increased block size (and many other systems get a benefit from it as well).  We are using System Center Configuration Manager 2012 R2 as our PXE server (running on Windows Server 2008 R2, but we have also tried Windows Server 2012 R2 with the same result).

I found a similar thread, not sure if any relation: https://communities.vmware.com/thread/461323?z=TvieFa

Any thoughts?  Thanks!

0 Kudos
12 Replies
patpat
Contributor
Contributor

Your problem is not related at all to the quoted link; you should probably erase your post on that thread.

If RamDiskTFTPBlockSize produces a packet bigger than the MTU that implies IP fragmentation and many PXE clients out there

do not like that.

if I were you I'd check MTU=1500, I would not touch RamDiskTFTPBlockSize but I would set RamDiskTFTPWindowSize= 4, 8, or 16 ;

sure you will speed-up WDS/MDT/SCCM TFTP transfers.

Best,

Patrick

0 Kudos
exponent
Contributor
Contributor

Hello, sorry for the slow replies, for some reason the communities did not email me that post was updated.

Patrick, I have adjusted the post.  Also I tried setting RamDiskTFTPWindowSize to 4, 8, and 16 and saw no improvement on my physical machines trying to PXE (still takes 10 minutes plus/times out).  The physical clients only seem to work when RamDiskTFTPBlockSize is increased (I can't find any other way to fix them and multiple software vendors support team with the issue tell me I have to increase the BlockSize to resolve the problems), but this of course breaks VMware.  I have read other posts of people using VMNet3 adapters changing RamDiskTFTPBlockSize without issue in VMware, but not sure what is different for them vs. my very default environment; or if something has changed in newer versions of VMware.  I also tried configuring my TFTP server and my Test VM on a isolated separate Virtual Switch, so that I could safely increase MTU and that also made no difference.  I've used about 10 different PXE clients (Hyper-V, Dell, HP, Wyse, ASUS, 10ZiG, etc.) and so far VMware is the only client to have problems with this.  Any other thoughts/ideas?  I've had a case open with VMware support for a month now, with no progress...

Rachelsg, thanks for the reply, but unfortunately those links have no relevance to this issue.

Thanks!

0 Kudos
patpat
Contributor
Contributor

Try running Wireshark in your server and perform a traffic capture, next analyze the TFTP transfers.

Best,

Patrick

0 Kudos
exponent
Contributor
Contributor

Patrick, thanks for the reply.  We did as you suggested with VMware support.  Unfortunately this took a month to get a answer back from very unresponsive VMware Support....

Their statement was that IP reassembly is not implemented in PXE code and that ESX PXE does not support jumbo frames.  This is basically stating that is impossible to increase the TFTP block size above the default with VMware.  This sounds like a significant limination in VMware's implementation of PXE?  Especially since every other PXE implementation I have come across from other vendors does not have this issue (Microsoft, HP, Dell, IBM, etc.).

You have any suggestions of any good workarounds?  Or I am stuck waiting for when/if VMware finally fixes this?  Thanks!

0 Kudos
jabsolutions
Contributor
Contributor

I know this is a few months later, but if you are still having this issue there is a workaround I found in another forum.

VMware PXE Limitations | bcTechNet


The following article provides indepth details but essentially you need to:

1) Modify SMSPXE.DLL as mentioned in Patrick’s reply (modify HEX sequence of BA07000035 to BA08000035)

2) Create the following DWORD registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SMS\DP\RamdiskTFTPBlockSize and set the value to 4.


Hope this helps others.

0 Kudos
patpat
Contributor
Contributor

Let me add that what I've described on that link is really an SCCM hack.

Depending on the version of the SMSPXE.DLL (32 or 64) the sequence to modify can be different.
Hopefully MS will fix this soon.


The bottom line of this problem is that blocksize is not a good strategy for improving TFTP speed; windowsize is

Best,

Patrick

0 Kudos
jlaprade
Contributor
Contributor

After modifying this DLL and adding the registry key, WDS won't start. No errors, it just sits at starting. Going on 30 minutes now and I've rebooted multiple times. 😕
Looks like I'll have to go back to the e1000 nic and the original DLL

0 Kudos
patpat
Contributor
Contributor

If WDS does not start that tells me you didn't perform the mod correctly.

Editing binaries correctly requires lot of attention... 

0 Kudos
jlaprade
Contributor
Contributor

I'm fairly familiar with hex editors and I am sure I performed it correctly. The problem seems larger than the modified DLL. Same issue after putting the old DLL back. There is something seriously wrong with my DP and it looks like I may need to reinstall WDS. Looking into this now.
I think halting the WDS service to put in the new dll was just the catalyst to the problem. 

0 Kudos
patpat
Contributor
Contributor

Please let's avoid confusion; your issue has nothing to do with stopping/re-starting the WDS service
nor "properly" performing the mentioned mod either.
I think you have to look somewhere else; i.e. some antivirus layer quietly preventing the execution of an "altered" dll component.

0 Kudos
jlaprade
Contributor
Contributor

You are correct.  My issue is not related to this particular change. It came from a corruption of wds and my dp that was not made clear until I restarted the service. If I could delete the post I would.  If an admin or mod would like to delete my post so as not to cause confusion,  please feel free.

0 Kudos