The redo log is corrupted. If the problem persists, discard the redo log.

Yesterday about an hour before the end of my work day one of our critical servers fell over and was displaying the following message in the vSphere client.

The redo log of VisualSVNServer_1-000001.vmdk is corrupted. If the problem persists, discard the redo log.

The error message refers to a redo log, but this is legacy VMware terminology. VMware have from ESXi 3.1 started to use the term snapshot to mean the same thing but for some reason the error messages still use the old term.

The server was named Subversion and was a VisualSVN Server.

There was a snapshot dated from 15th December 2013 in the Snapshot manager for the Subversion VM so returning to this snapshot would have meant returning to a point several weeks ago and then trying to import the backup of the repository that was made the night of 29th January.

The underlying cause of the corruption cannot be definitively determined but I think was due to the amount of disk activity on the physical disk that constitutes datastore 3_2 on the host server S003-ESXi. This caused the system to fail to write to the log and to create updated delta disks which contain all the changes to the disks since the point of the snapshot.

I believe that if there had not been a snapshot the data corruption probably wouldn’t have happened. I have since educated staff that taking snapshots in vSphere is really not the same as backing up the server and they shouldn’t be doing it on the Subversion server at all.

I resolved the issue with Subversion by carrying out the following steps.

I clicked OK to the error message in the slim hope that the VM could overcome the glitch itself upon a simple reboot.

This didn’t work. So I started the process of backing up the VM by forcing a shutdown of the machine by virtually cutting off the power and then making a copy of the virtual machine folder on the datastore.

Whilst the copy process was going I checked Virtual Machine Logs, vmware-3.log was completely corrupt and the vmware.log was showing some corruption.

The copy process took over an hour as it was 150GB in total size. Mostly due to the two virtual disks the first VisualSVNServer.vmdk which constitutes the C: drive of the server is 40GB and the second VisualSVNServer_1.vmdk which is the E: drive is 100GB.

Having made a copy of everything I attempted to fix the snapshots. I made sure that there was sufficient space on the datastore and then using Snapshot Manager in vSphere created a new snapshot of the Subversion VM.

This operation was successful, so I then tried to commit the changes and to consolidate the disks. This worked for VisualSVNServer.vmdk merging all the changes, but not entirely for VisualSVNServer_1.vmdk, however it did reduce the size of the delta disks significantly meaning that there was likely to be only minimal data lost.

Nothing more could be done through the vSphere client so I then started a process of trying to manually consolidate the following disks into a single disk.
VisualSVNServer_1.vmdk
VisualSVNServer_1-000001.vmdk
VisualSVNServer_1-000002.vmdk

Enabled SSH on the host server s003-esxi.

Using PuTTY I logged into the command line of the host and changed the directory to the relevant directory that contained the virtual machine files for Subversion /vmfs/volumes/Datastore3_2/VisualSVNServer

Then ran the command ls *.vmdk –lrt to display all virtual disk components.

Then starting with the highest number snapshot ran the following command to clone the disk in a way that would merge the delta disks into a copy of the main disk.

vmkfstools –i VisualSVNServer_1-000002.vmdk VisualSVNServer-Recovered_1.vmdk

This process took another hour or so as it was trying to create a 100GB file.

This failed with the following error message displayed:

Failed to clone disk: Bad File descriptor (589833)

Then starting with the next highest number snapshot I ran command to clone the disk without the most recent changes.

vmkfstools –i VisualSVNServer_1-000001.vmdk VisualSVNServer-Recovered_1.vmdk

This process again took about hour as again it was trying to create a 100GB file.

Again this failed with the following error message displayed:
Failed to clone disk: Bad File descriptor (589833)

Abandoned the idea of merging the disks I removed the VM from the inventory in vSphere and then moved all but the following files into a separate folder.
VisualSVNServer.nvram
VisualSVNServer.vmx
VisualSVNServer.vmdk
VisualSVNServer_1.vmdk

I could then recreate the VM from these files. I downloaded the file VisualSVNServer.vmx which is the virtual machine’s configuration file and stores the settings regarding the virtual devices that make up a virtual machine. I edited the file to change all references to VisualSVNServer_1-000002.vmdk to VisualSVNServer_1.vmdk so that the machine could be booted up ignoring the delta disks and any data they might contain.

Added the VM back into the inventory and then booted up the machine. It booted up fine, checked the E: drive and there appeared to be data written to the disk all the way up to the time that the server fell over so it appeared that there was minimal if any data lost.

Thanks to XtraVirt for the necessary steps.

Microsoft renames SkyDrive to OneDrive

Following the threat of legal action from BSkyB for trademark infringement for the use of the name SkyDrive Microsoft came to a settlement whereby they agreed to change the name of the service.

I speculated at the time that Microsoft might have been thinking about rebranding the service anyway as they didn’t seek to fight the case at all. It’s been a while but they have announced that they are changing the name of the service to OneDrive.

I think that it is a very good change particularly the tag line ‘OneDrive for Everything in Your Life’. Because it is true that as the number of devices a person has increases the more necessary a single repository for important files which can be seamlessly accessed from any of the devices becomes.

I’ve already started the process of storing all my photographs online in SkyDrive as a backup for my home PC, but being able to access them on my phone is great and will likely become something that I’ll be wanting to do more often once I become a father this April.

Free Microsoft Virtualization Training for VMware IT Professionals

Microsoft are really pushing the idea that system administrators that have VMware experience should become bilingual in server virtualisation and get up to speed on Hyper-V too. So following on from the Microsoft Virtualization for VMware Professionals Jump Start, a year or so ago, comes Free Microsoft Virtualization Training for VMware IT Professionals. December 11th from 9am – 12.30pm PST (5pm – 8.30pm GMT)

Get the edge in your technical career! Attend the online Virtualization IT Camp for VMware IT professionals and expand your virtualization skills. Seasoned experts will demonstrate key scenarios and cover equivalent technologies from Microsoft and VMware. Here’s your chance to upgrade your Microsoft Virtualization skills for FREE.

I consider myself already fairly bilingual as I have a Windows Server 2012 Hyper-V host at work with a couple of production servers on it now to go with the 108 virtual machines on our VMware infrastructure. I passed the Windows Server 2008 R2, Server Virtualization exam a couple of years ago when Microsoft was giving away free exam vouchers for it.

Plus I attended the Server Virtualization w/ Windows Server Hyper-V & System Center Jump Start online last month. Just need now to schedule the 74-409 exam whilst my free exam voucher is still valid. The vouchers can still be obtained here (Limited availability)

New virtual server in DMZ not accessible

​I had created a new server as a test environment for a new client of the company and configured it to reside in the DMZ with an external IP address so that people at the client could test the system from their location.

I tested connectivity to this new IP address and the server was connectable and everything seemed fine.

However one of our implementation consultants reported that he wasn’t able to access the server from his location using the IP address that I had provided to him. I tested it again and again it all appeared fine.

I then tried connecting to it from a different network outside of the company and I hit the exact same problem as my colleague had ‘TTL Expired In Transit’. So I then tried a TraceRoute to see if this revealed where the issue might be.

At first glance it appeared okay, traffic was being bounced back correctly from each router along the way. Then I saw the problem, it was because of a configuration error in our ISP’s routers which meant that traffic coming from outside of their network that was destined for the IP address I had assigned to the server was getting routed to a particular couple of routers which were then just bouncing it back and forth between the two of them until the TTL expired.

Converting Windows Server 2012 Standard to Windows Server 2012 Datacenter

It probably won’t come to this as I have now convinced the management to allow me to purchase Vsphere Essentials Plus, but I was curious about whether I could convert my Windows Server 2012 Standard Server to Windows Server 2012 Datacenter without having to do a complete reinstall.

Good news! It is possible and is dead simple to do. Via http://technet.microsoft.com/en-us/library/jj574204.aspx

From an elevated command prompt run the DISM tool and pop your new key in.

DISM /online /Set-Edition:ServerDatacenter /ProductKey:[Datacenter key, e.g. XXXXX-XXXXX-XXXXX-XXXXX-XXXXX] /AcceptEula

MCSA: Server 2012 exam success

I passed the Exam 70-417: Upgrading your skills to MCSA Windows Server 2012!

Thanks to Keith Mayer, J C Mackin, Ed Liberman, Rick Claus and of course my beautiful and understanding wife Amy for putting up with me studying for and stressing over yet another Microsoft exam.

Next step will be working towards the MCSE: Server Infrastructure certification by studying for Exam 70-413: Designing and Implementing a Server Infrastructure.

Bringing a laptop back from death

My responsibilities in this job are exceptionally broad so although I’m the systems administrator with overall responsibility for the company’s IT infrastructure I’m also on occasion called upon to do things such as repair a dead laptop.

I could have delegated it to another staff member but in this case it was important that the issue should be resolved quickly. I thrive under pressure, have a real knack for troubleshooting weird computer problems and generally just enjoy getting my hands dirty when it is something that I’d never encountered before.

It was mid-afternoon when I took the call from one of our consultants that works out of the office with customers on their site and he told me that very weird things were happening with his laptop computer. It was reportedly dead, completely unresponsive and would have a strange effect on any laptop charger that was plugged into it, the LED on the charger would turn off whenever plugged into the laptop and would then not work in any other laptop until the mains power had been turned off and on.

This sounded to me like the laptop was shorting out the laptop power supplies, fortunately not permanently as it might have killed off a swath of power supplies of one of our major customers. I asked my colleague to bring it into the office so I could take a look but I wasn’t holding out much hope that it could be fixed.

I ran through some basic logical checks to see if the laptop was behaving as described and that it wasn’t something stupid like the wrong laptop charger was being used.
The charger was indeed the correct one for that model and hadn’t been swapped or mixed up with someone else’s.
Plugged it into the mains and then the laptop and the LED was the extinguished.
Did the charger work with a different laptop, yes but only after cycling the mains electricity.
Did swapping the battery help. No and the battery appeared good in another laptop.

The problem then must lie with the laptop itself and with the power input socket. If we had a spare laptop I’d have pulled the hard disk and installed it in the replacement laptop and given that to my colleague and then stuck the dead laptop on the junk pile. Unfortunately we have no spares and he needed a working laptop for Monday and couldn’t come via the office so needed me to fix then and there if possible.

No choice then but to completely dismantle the laptop so that’s what I did. About 20 screws later I finally had the case apart and I could see the wiring of the power input and theorising it was a short of some sort I examined the wiring. The wiring from the power input socket appeared to be good to me. However there was a metal bracket that kept the power input socket connected to the chassis and by removing that I could see that there was a metal contact on the socket that would then form an electrical connection to the chassis.

The chassis looked dull and therefore might be corroded and preventing the electrical contact that was required (I’d had a similar issue with the starter motor on my car last year). A bit of abrasion on the chassis at the right point to make it nice and shiny I then reassembled the power input assembly and tested it by plugging in the charger before I completely reassembled the laptop. Moment of truth. I switched the mains electricity on and the LED on the charger stayed lit so it wasn’t being shorted out by the laptop any longer. Unplugged the power. Put the laptop back together again. Plugged the power in once more and hit the on button. Lights up. Dell logo shows on screen briefly and then Windows starts its boot up.

An hour and a bit after being given a mysteriously dead laptop I gave back a working machine and all was right with the world once more.

CCNA, MCITP and MCSE: Server Infrastructure