Troubleshooting Confidential VMs
This guide provides solutions for common issues you might encounter when working with confidential virtual machines on Aleph Cloud.
Diagnosing Issues
Fetch VM Logs
If your VM is supposed to be running but cannot be reached, start by checking the logs:
aleph instance logs <vm_hash>
If you don't know the hash of your VM, use:
aleph instance list
In a properly functioning VM, you should see disk unlocking, boot logs, and the system invite. The last lines should look similar to:
cloud-init[502]: Cloud-init v. 24.1.3-0ubuntu1~22.04.5 finished at Thu, 05 Sep 2024 19:36:07 +0000. Datasource DataSourceNoCloud [seed=/dev/sr0][dsmode=net]. Up 23.63 seconds
[ OK ] Finished Execute cloud user/final scripts.
[ OK ] Reached target Cloud-init target.
Ubuntu 22.04.4 LTS dbnszzoulvoea7crseir75egj4xbm5zzzaaut2nbpknjadidp3ua ttyS0
Test Your VM Image Locally
If the VM fails to start on Aleph Cloud, try launching your disk image locally with QEMU to verify it was properly built:
sudo qemu-system-x86_64 \
-enable-kvm \
-m 2048 \
-nic user,model=virtio \
-nographic \
-serial mon:stdio \
-drive if=pflash,format=raw,unit=0,file=/usr/share/ovmf/OVMF.fd,readonly=on \
-drive format=raw,file=</path/to/your/image.img>
Note
After entering your password, you might need to wait a minute or so for the disk to decrypt and boot.
To exit QEMU: press Ctrl + a
, then x
, and then [Enter]
.
Common Issues and Solutions
Cannot SSH Into the VM
If you're unable to connect to your VM via SSH:
Validate IPv6 Connectivity
Ensure that IPv6 is functioning correctly on your local network. Test this by visiting https://test-ipv6.com/.Verify the VM's IP Address
Retrieve the correct IP address:bashaleph instance list
Try Different User Logins
Default users vary by distribution:- Debian-based distributions:
root
- Ubuntu:
ubuntu
- Debian-based distributions:
Check SSH Key Configuration
Ensure you're using the correct SSH key that was installed during VM creation:bashssh -i /path/to/your/private_key <user>@<ip>
Check VM Logs
Look for SSH service startup or errors:bashaleph instance logs <vm_hash>
Wrong Decryption Password
If you entered the wrong decryption password while starting the VM, you need to reboot it:
aleph instance reboot <vm_hash> <node_url>
Then restart the confidential initialization process:
aleph instance confidential-init-session <vm_hash>
aleph instance confidential-start <vm_hash>
"Bad Measurement" Error
If you see this error in the logs:
qemu-system-x86_64: sev_launch_start: LAUNCH_START ret=1 fw_error=11 'Bad measurement'
qemu-system-x86_64: sev_kvm_init: failed to create encryption context
qemu-system-x86_64: failed to initialize kvm: Operation not permitted
Probable Causes
Policy Mismatch
The policy requested by the client doesn't match the start packet sent by the client.Platform Certificate Issues
The CRN's platform certificate may not match the start packet due to:- A session generated for one CRN was reused for another
- The CRN platform certificate was rotated after session generation
- The CRN is returning an outdated platform certificate
Resolution Steps
Regenerate the Session Certificate
bashaleph instance confidential-init-session <vm_hash>
When prompted to remove existing certificates, answer "yes." Then continue with:
bashaleph instance confidential-start <vm_hash>
If the Error Persists
Contact the Aleph Cloud support team with your VM hash and the node URL. The node operator may need to verify and update their platform certificate.
VM Starts But Services Don't Work
If your VM boots but your application or services aren't functioning:
Check Cloud-Init Logs
SSH into your VM and check:bashcat /var/log/cloud-init.log
Verify Network Configuration
Ensure network interfaces are properly configured:baship addr
Check Service Status
Verify your application services are running:bashsystemctl status <your-service>
Review Application Logs
Check application-specific logs for errors.
Disk Encryption Issues
If the VM fails to decrypt the disk:
Verify Password
Ensure you're using the exact same password you used when creating the encrypted disk.Check GRUB Configuration
If testing locally, you might see GRUB errors if the encryption setup is incorrect.Recreate the Encrypted Disk
If persistent issues occur, consider recreating your encrypted disk image following the encrypted disk guide.
Advanced Troubleshooting
Checking SEV Status
To verify if SEV is active in your VM:
dmesg | grep -i sev
You should see output indicating that AMD SEV is active.
Debugging Session Initialization
For more verbose output during session initialization:
aleph instance confidential-init-session <vm_hash> --debug
Getting Help
If you've tried the solutions above and are still experiencing issues:
- Join the Aleph Cloud Discord for community support
- Open an issue on the Aleph-VM GitHub repository
- Contact the Aleph Cloud team directly through the official website