Fix Alibaba Cloud ECS Boot Problems Step by Step
Easy steps to repair the system disk and boot your server again
When an Alibaba Cloud ECS instance fails to boot, do not panic. This problem is very common. In most cases, the server is not broken. Only the system startup failed.
You may see the instance as running in the console. SSH does not connect. The VNC screen is empty or stuck. This usually means the operating system could not start.
The most important thing to know is this. Your data is usually safe. We can fix the problem by repairing the system disk.
Follow the steps below slowly and carefully.
Step 1: Stop the ECS instance
• Open the Alibaba Cloud console
• Stop the ECS instance
• Do not restart it again and again
• Wait until the status shows stopped
Step 2: Detach the system disk
• Open the disk section of the instance
• Find the system disk
• Detach the disk
• Do not delete the disk
Step 3: Create a rescue ECS
• Create a new ECS in the same region and zone
• Use a basic Linux image
• This server is only for fixing the disk
Step 4: Attach the broken disk
• Attach the system disk to the rescue ECS
• Attach it as a data disk
• Start the rescue ECS
• Log in using SSH
Step 5: Find the disk
Check the disks.
lsblk
You will see a new disk. It is often named vdb. This is the broken system disk.
Step 6: Mount the disk
Create a folder.
mkdir /mnt/rescue
Mount the disk.
mount /dev/vdb1 /mnt/rescue
If vdb1 does not work, check lsblk again and adjust the name.
Step 7: Check the fstab file
This file is a very common cause of boot failure.
• Open the file
• Look for disks or UUID values that do not exist
• Comment out the broken lines
cat /mnt/rescue/etc/fstab
vi /mnt/rescue/etc/fstab
If you are not sure about a line, comment it out and test later.
Step 8: Check boot files
Make sure the boot folder is not empty.
ls /mnt/rescue/boot
If this folder is empty or missing files, the system cannot boot.
Step 9: Fix the bootloader
Prepare the environment.
mount --bind /dev /mnt/rescue/dev
mount --bind /proc /mnt/rescue/proc
mount --bind /sys /mnt/rescue/sys
Enter the disk system.
chroot /mnt/rescue
Reinstall the bootloader.
grub-install /dev/vdb
grub-mkconfig -o /boot/grub/grub.cfg
Exit.
exit
Step 10: Check logs if needed
Logs can show what failed.
cat /mnt/rescue/var/log/boot.log
cat /mnt/rescue/var/log/messages
If you see clear errors, fix them before continuing.
Step 11: Unmount the disk
Unmount everything.
umount /mnt/rescue/dev
umount /mnt/rescue/proc
umount /mnt/rescue/sys
umount /mnt/rescue
Detach the disk from the rescue ECS.
Step 12: Boot the original ECS
• Attach the disk back as the system disk
• Start the ECS instance
• Try SSH again
If SSH works, the recovery is complete.
If it still does not work
• Go back and recheck fstab
• Check boot files again
• If needed, rebuild the system and copy data from the disk
Simple rules to avoid this problem
• Always take snapshots before changes
• Be very careful with fstab
• Do not rush disk or boot changes
• Test updates on a test server first
Alibaba Cloud ECS boot problems look scary, but they are usually easy to fix. If you stay calm and follow the steps, you can recover most servers without losing data.


