I inherited 3 Dell Poweredge 2950, 60 Dell Poweredge 1950, 3 Dell Powervault MD3000, some racks, power unigs, several switches and a few boxes of cables and wires. These are 17 months old, but have never been used. There has been an on-going hassle getting sufficient power for these units. Its almost sickening. I just came into this in the very end, but I've sat through enough meetings to gag a maggot.
But it is worked out now. On November 25, we moved the systems into a server room in the KU Research & Graduate Studies unit, which has been most gracious 🙂
I test-installed Rocks 5.2 on several small test systems (ordinary PCs) in preparation for the big show. I learned that DOES NOT work using Centos 5.4 as the OS, but it does work with either the Rocks-supplied OS Centos 5.2 or the Centos 5.3 distribution disks.
I brought back one 2950, one 1950, and one MD3000 to my office in order to do some testing. Wow, these are loud. Unbelievable, really. I can barely hear Jimmy Hendrix with the volume all the way up.
The first problem for me has been figuring out "what is Dell's problem and what is my problem." The 2950 has a Remote Administration device with a separate ethernet connection. I gather that, if I can make that work, then I'll be able to reboot the system remotely when it hangs. That must be a priority for people who run Windows servers. Frankly, I've never hung a Linux Server in 10 years. Nevertheless, I want to make the most of the hardware. So I used the Dell Service tags to go looking for information on updates. Surely, there are bios and other updates required.
The Dell website is a complicated tangle of update scripts, jargony named things I don't understand, vague advice and weird warnings. I understand the RedHat RPM packaging system very well, but Dell pages are written for people who don't understand much of anything, except that they launch off into jargony technical abuse every other paragraph. Just getting the firmware updates together is a formidable task.
Finally, I think I figured out the minimum necessary elements have been collected into a single DVD.
1. SUU (version 6.1.1) is a collection of Server Updates, bios, firmware and such.
That's not bootable. In order to use that, one needs a boot disk, the name of which Dell seems to change every few months. The current version is called
2. SMTD (the version I downloaded is OM_6.1.0_SMTD_A00.iso)
One of the really confusing things is that the bios & firmware can be updated using the SMTD disk BEFORE the Linux OS is installed, but Dell also provides individualized DUP (Dell Update Packages). I am hoping for the best that, if I miss some firmware updates with the SMTD/SUU disk combination, then I can get them after installing Rocks.
The MD3000 apparently needs a separate firmware & driver update and it can only be installed after the OS is running.
In the SMTD, the options are pretty obvious. I approved all the suggested bios & firmware upgrades. I was a little bothered by the options on installing the OS. They, of course, don't have Rocks or Centos, and so choosing an OS leads to a blind alley in which Dell's SMTD wants to partition my hard disk. I had to back out of that because Rocks does not support LVM (logical volume management), which is what Dell uses by default. I checked with a Dell rep online and he said it is OK because the SMTD is really only needed for MS WIndows installs because that OS needs driver upgrades before installation, but Linux does not.
I wish I could get RedHat 5.3 disks, but I'm ashamed to ask my tech support for them. After RH 5.4 was released, one of our systems malfunctioned because of an automatic RPM update, and I needed the RH5.4 ISO (disk) file "right away". After about 2 weeks, tech support uploaded it for me (a full 6 weeks after RedHat had released it). Now I can get 5.4, but they removed 5.3 from their server. (complain, complain!). I'd need the RedHat Advanced Server disk, and I'm afraid we only get the Enterprise Server, so maybe it is not so bad to ignore them.
Nevertheless, the Rocks system uses Centos as its default distribution, and it may be I buy trouble by using the authentic RedHat. I'm a little worried that Centos is not in the list of "supported distributions" from Dell and so their customized storage drivers won't install. But there is some promising chatter in the Dell community site, and the Dell software repository uses RedHat and Centos interchangeably in at least one spot.
One trouble I have is that the local DHCP server is not configured to give me an IP number on this 2950 system. I'm afraid that is slowing firmware updates. It appears the SMTD/SUU firmware update process tryies to go on the Internet to check for updates. It does that even though in the configuration, I checked the "use disk" option for firmware.
Oh, well. It dies at 5% completion, "firmware deployment failed". Reboot required.
Lets see what happens if I plug in a live ethernet cable into one of the 5 ethernet jacks. I'm guessing which one is the "live" one.
Interesting. After rebooting into the SMTD, the firmware configuration panel shows that some drivers have been updated since the first time I tried this. Hopefully, that means some changes were actually applied the first time, even though the "Dell Systems Build and Upgrade Utility" never moved off 5% completion, indication "Collecting Server Info: Checking For Firmware Updates". Its frustrating because the configurator already did the required checking and told me what I need to do. The un-hopeful interpretation would be that the first try fouled the firmware updates.
On the second try, it stuck again at 5% for a long time: "checking for firware updates." That's the same place it died before. But, what light through yonder window breaks! Screen says "attempting to update BIOS" and then it rebooted.
Its not a very smart update process because the SUU nonbootable driver disk is still in the CD, so when the system restarts it just stalls saying there is no operating system. So I put the SMTD back in & reboot. Then I re-run the Dell Systems Build and Upgrade Utility again and most of the firmware has been updated. Still 2 ethernet cards need firmware. Awesome, here we go again. Wait at 5% done for 10 more minutes. After that, it rebooted and seems OK.
What a hassle. They want me to do this for each and every one of the 63 blades? Cut me some slack.
I'll Continue with another post, Cluster Journal Entry 2 (for originality)