Hello,
I’ve succeeded to install the BDE 2.2 in my vsphere 6 environment. But I’m running into the bootstrap error while deploying a MapR 4.1 cluster. The error occurs on all nodes except the mysql node. The error message in the vsphere web client shows as following:
[2015-07-29T13:33:59.093+0000] Cannot bootstrap node test-Master-0.
Can't find any nodes which provide mapr_historyserver. Did any node provide mapr_historyserver? Or is the Chef Solr Server down?
SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.
PS : The Serengeti server has no Internet access. I configured a local YUM repository on a CentOS physical server on the same vLan as the cluster nodes.
I attached the related log files for analyzing…
Thanks for your help !
Hi ztwy,
Please follow this to apply the patch on a fresh installation of BDE 2.2:
Login BDE Server as user serengeti, then run this command :
find /opt/serengeti/www/specs/Ironfan/mapr -name *.json | xargs sed -i '/mapr_resourcemanager/ a "mapr_historyserver",'
sed -i 's|^yum_package name| yum_package name do retries 8; retry_delay 10; end|' /opt/serengeti/chef/cookbooks/mapr/recipes/prereqs.rb
knife cookbook upload -a -V
These fix works on both CLI and GUI.
You can contact BDE support team .hadoop-support@vmware.com or file a SR for a formal patch. For this specific bug, I think the above solution is enough.
Are you using CLI for GUI to create the MapR 4 cluster ? If using CLI, you need to add a role 'mapr_historyserver' into the roles of master node group in the cluster spec, then create a new cluster. If using GUI, I will check whether it's a bug that mapr_historyserver is not added into the GUI cluster spec.
BTW did you use 'config-distro.rb' to add the MapR distro and what's the full command ? I want to know whether the distro version is something like 4 or 4.1 or 4.1.0 ?
Hi jessehuvmw,
Thanks for your reply.
I'm using the GUI to create a new MapR cluster.
Yes I use the 'config-distro.rb' to add the MapR distro. The full command likes following :
config-distro.rb --name mapr --vendor MAPR --version 4.1 --repos http://local_yum_repo_server_ip/mapr/4/mapr.repo
What is the difference between the version 4.1 and 4.1.0 ? In the command, I used "4.1" as the version discribed in the BDE document, but my local repo is built from the Mapr 4.1.0 :
[maprtech]
name=MapR Technologies
baseurl=http://package.mapr.com/releases/v4.1.0/redhat/
enabled=1
gpgcheck=0
protect=1
[maprecosystem]
name=MapR Technologies
baseurl=http://package.mapr.com/releases/ecosystem/redhat
enabled=1
gpgcheck=0
protect=1
Could it be the cause ?
Thanks
Hi ztwy,
I confirm this is a BDE bug. You can login BDE Server as user serengeti, then run this command to fix it :
find /opt/serengeti/www/specs/Ironfan/mapr -name *.json | xargs sed -i '/mapr_resourcemanager/ a "mapr_historyserver",'
And for MapR 4 distro, 4.1 and 4.1.0 is both OK when adding the distro, but not 3.
Please file a bug and track it in v2.2 and master branch. Thanks,
Hi jessehuvmw,
Thanks to your fix, I got over the "mapr_historyserver" issue, but I still stuck on the Bootstrap failure with the following error :
[2015-07-30T11:08:11.360+0000] Cannot bootstrap node test-Master-0.
yum_package[mapr-core] (mapr::prereqs line 105) had an error: Timeout::Error: execution expired
SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.
Here are the output files :
Thanks for your help.
mapr-core package has 246M which might cause the 'yum install mapr-core' execution timeout on the cluster nodes. Could you try resume the cluster creation by clicking on 'Resume Deployment' ? If this doesn't help, I will send a patch to increase the default timeout.
mapr-core-4.1.0.31175.GA-1.x86_64.rpm 26-Mar-2015 19:21 2.4K
mapr-core-internal-4.1.0.31175.GA-1.x86_64.rpm 26-Mar-2015 19:21 246M
After the failure, I noted in each node, there is just the mapr-core-internal package installed :
[root@bde-809447-test-mysql-0-mapred ~]# rpm -qa | grep mapr
mapr-core-internal-4.1.0.31175.GA-1.x86_64
I tried to resume the cluster creation. On the zookeeper nodes, I got :
[2015-07-30T14:28:32.196+0000] Unable to run command 'execute[config MapR]' on node test-zookeeper-0. SSH to this node and run the command 'sudo chef-client' to view error messages.
On the other nodes I got :
[2015-07-30T14:28:39.891+0000] Cannot bootstrap node test-Master-0.
ruby_block[wait_for_mysql_server] (mapr::config_metrics line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-Master-0.
SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.
regards
the mysql node should not install mapr-core. ould you create the mapr cluster in BDE CLI? ssh to bde server as serengeti run 'serengeti' > connect enter vcenter user and password > cluster create --name mapr4 --distro mapr
In mysql node, the following packages were installed :
[root@bde-809447-test-mysql-0-mapred ~]# rpm -qa | grep mapr
mapr-core-internal-4.1.0.31175.GA-1.x86_64
mapr-core-4.1.0.31175.GA-1.x86_64
mapr-hadoop-core-2.5.1.31175.GA-1.x86_64
mapr-mapreduce1-0.20.2.31175.GA-1.x86_64
mapr-mapreduce2-2.5.1.31175.GA-1.x86_64
I will try the CLI and let you know the result.
Hi Jesse Hu,
I just tried the CLI cluster creation. I created a mapr cluster with 3 zookeeper, 1 master, 1 mysql, 2 worker, 2 client. During the creation, 5 of 9 vms ended up with Bootstrap error (1/1 master, 1/1 mysql, 1/2 worker, 2/2 client). The 4 other vms ended up with "service ready" status. Here is the full output of the cluster creation :
FAILED 80%
node group: mysql, instance number: 1
roles:[mapr_mysql_server]
NAME IP STATUS TASK
----------------------------------------------------
test-mysql-0 10.192.200.159 Bootstrap Failed
node group: zookeeper, instance number: 3
roles:[mapr_zookeeper]
NAME IP STATUS TASK
-----------------------------------------------------
test-zookeeper-0 10.192.200.151 Service Ready
test-zookeeper-1 10.192.200.154 Service Ready
test-zookeeper-2 10.192.200.156 Service Ready
node group: master, instance number: 1
roles:[mapr_cldb, mapr_resourcemanager, mapr_nfs, mapr_webserver, mapr_fileserver, mapr_historyserver, mapr_metrics]
NAME IP STATUS TASK
-----------------------------------------------------
test-master-0 10.192.200.158 Bootstrap Failed
node group: worker, instance number: 2
roles:[mapr_nfs, mapr_fileserver, mapr_nodemanager]
NAME IP STATUS TASK
-----------------------------------------------------
test-worker-0 10.192.200.155 Bootstrap Failed
test-worker-1 10.192.200.157 Service Ready
node group: client, instance number: 2
roles:[mapr_pig, mapr_hive, mapr_client]
NAME IP STATUS TASK
-----------------------------------------------------
test-client-0 10.192.200.152 Bootstrap Failed
test-client-1 10.192.200.153 Bootstrap Failed
The failed nodes: 5
----------------------------------------------------------------------------
[NAME] test-mysql-0
[STATUS] Bootstrap Failed
[Error Message] [2015-07-31T10:28:32.764+0000] Cannot bootstrap node test-mysql-0.
yum_package[mapr-core] (mapr::prereqs line 105) had an error: Chef::Exceptions::Exec: returned 1, expected 0
SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.
----------------------------------------------------------------------------
[NAME] test-master-0
[STATUS] Bootstrap Failed
[Error Message] [2015-07-31T10:48:21.599+0000] Cannot bootstrap node test-master-0.
ruby_block[wait_for_mysql_server] (mapr::config_metrics line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-master-0.
SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.
----------------------------------------------------------------------------
[NAME] test-worker-0
[STATUS] Bootstrap Failed
[Error Message] [2015-07-31T10:42:19.916+0000] Cannot bootstrap node test-worker-0.
ruby_block[wait_for_zookeeper_nodes] (mapr::startup line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-worker-0.
SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.
----------------------------------------------------------------------------
[NAME] test-client-0
[STATUS] Bootstrap Failed
[Error Message] [2015-07-31T10:32:34.070+0000] Cannot bootstrap node test-client-0.
ruby_block[wait_for_zookeeper_nodes] (mapr::startup line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-client-0.
SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.
----------------------------------------------------------------------------
[NAME] test-client-1
[STATUS] Bootstrap Failed
[Error Message] [2015-07-31T10:31:12.517+0000] Cannot bootstrap node test-client-1.
ruby_block[wait_for_zookeeper_nodes] (mapr::startup line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-client-1.
SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.
----------------------------------------------------------------------------
cluster create failed: Task execution failed: Bootstrapping cluster test failed.
It seems the error is from the mysql ?
Yes. The mysql node failed due to "yum_package[mapr-core] (mapr::prereqs line 105) had an error" could you SSH to the mysql node (as user serengeti) and run 'sudo yum install mapr-core' ? It might failed with 'Timeout Error', then it probably means the network speed is not faster enough between the mysql node and the yum server.
I ran the 'sudo yum install mapr-core' without any issue. (see the attached screenshot) It took about 3-4 minutes to download/install all the packages.
BTW All the nodes including the mysql node and my local yum server are on the same vlan network (1 Gb)
3-4 minutes is a little long which might cause the yum timeout error. I haven't met the timeout issue in my 10Gb vlan.
You can try to resolve this issue like this:
login BDE server as user serengeti and run command:
sed -i 's|^yum_package name| yum_package name do retries 8; retry_delay 10; end|' /opt/serengeti/chef/cookbooks/mapr/recipes/prereqs.rb
knife cookbook upload -a -V
retry creation of the failed mapr cluster via 'cluster create --name <cluster_name> --resume'
this tells the chef-client to take 8 retries (with 10 seconds interval) when installing the mapr-core package.
The packages have been installed for the previous test. Should i remove them from the mysql node before resume the cluster creation?
no need to remove it. mysql node will install mapr-core package to use a sql file in it. sorry for the confusion.
Hi Jesse Hu,
With your fix bellow, the cluster creation ended up with success :
sed -i 's|^yum_package name| yum_package name do retries 8; retry_delay 10; end|' /opt/serengeti/chef/cookbooks/mapr/recipes/prereqs.rb
knife cookbook upload -V (I have to add the -a parameter to execute this command)
Could we have a status update on all "Bootstrap failed" issues I have met on this topic? If I reinstall the BDE 2.2 from scratch, which fixes should I apply ? These fixes works also on GUI ? Will be a official patch available soon ?
Thanks
Hi ztwy,
Please follow this to apply the patch on a fresh installation of BDE 2.2:
Login BDE Server as user serengeti, then run this command :
find /opt/serengeti/www/specs/Ironfan/mapr -name *.json | xargs sed -i '/mapr_resourcemanager/ a "mapr_historyserver",'
sed -i 's|^yum_package name| yum_package name do retries 8; retry_delay 10; end|' /opt/serengeti/chef/cookbooks/mapr/recipes/prereqs.rb
knife cookbook upload -a -V
These fix works on both CLI and GUI.
You can contact BDE support team .hadoop-support@vmware.com or file a SR for a formal patch. For this specific bug, I think the above solution is enough.
Ok, I just tried the cluster creation with the GUI, it works. Thank you for the information.