Proxmox

Fix Intel Ethernet NIC Hang

Problem

If ethernet hangs and you get this journal log:

Mar 29 05:14:04 pve-01 kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                                 TDH                  <3>
                                 TDT                  <75>
                                 next_to_use          <75>
                                 next_to_clean        <2>
                               buffer_info[next_to_clean]:
                                 time_stamp           <1525c7e78>
                                 next_to_watch        <3>
                                 jiffies              <15287e140>
                                 next_to_watch.status <0>
                               MAC Status             <40080083>
                               PHY Status             <796d>
                               PHY 1000BASE-T Status  <3800>
                               PHY Extended Status    <3000>
                               PCI Status             <10>

Symptoms: unable to connect to internet, node becomes remotely inaccessible

Solution

Edit /etc/network/interfaces with the following:

iface vmbr0 inet static
        ...
        post-up ethtool -K <ethernet device> gso off gro off tso off tx off rx off
        ...

source /etc/network/interfaces.d/*

Make sure the ethtool package is installed on the system

Sources:

https://forum.proxmox.com/threads/intel-nic-e1000e-hardware-unit-hang.106001/

https://www.reddit.com/r/Proxmox/comments/1drs89s/intel_nic_e1000e_hardware_unit_hang/

Import a qcow2 file

qm importdisk <vm_id> file.qcow2 <storage-backend>

 

Rename a node

#!/usr/bin/bash

mkdir -p /tmp/qemu ## make temp dir for moving VM config files
cp /etc/pve/nodes/$original_hostname/qemu-server/* /tmp/qemu/

hostnamectl set-hostname "$new_hostname"
sed -i "s/$original_hostname/$new_hostname/g" /etc/hosts

services=("pveproxy.service" "pvebanner.service" "pve-cluster.service" "pvestatd.service" "pvedaemon.service")
for service in "${services[@]}"
do
systemctl restart "$service"
done

rm -rf "/etc/pve/nodes/$original_hostname"
cp /tmp/qemu/* /etc/pve/nodes/$new_hostname/qemu-server/
rm /tmp/qemu/*

Copy the contents of /var/lib/rrdcached/db/pve2-{node,storage}/old-hostname to /var/lib/rrdcached/db/pve2-{node,storage}/new-hostname and remove the old directory.

User Provisioning

Perform these following steps:

pveum useradd etorres@pam
pveum group add wheel -comment 'System admins'
pveum acl modify / -group wheel -role Administrator
pveum acl modify / -group wheel -role PVEAdmin
pveum acl modify / -group wheel -role PVEVMAdmin
pveum group modify wheel -privs Sys.PowerMgmt VM.PowerMgmt Permissions.M
pveum usermod etorres@pam -group wheel

Terraform Setup

Creating the Terraform role in PVE

# pveum user add terraform@pve

# pveum role add Terraform -privs "Realm.AllocateUser, VM.PowerMgmt, VM.GuestAgent.Unrestricted, Sys.Console, Sys.Audit,   Sys.AccessNetwork, VM.Config.Cloudinit, VM.Replicate, Pool.Allocate, SDN.Audit, Realm.Allocate, SDN.Use, Mapping.Modify, VM.Config.Memory, VM.GuestAgent.FileSystemMgmt, VM.Allocate, SDN.Allocate, VM.Console, VM.Clone, VM.Backup, Datastore.AllocateTemplate, VM.Snapshot, VM.Config.Network, Sys.Incoming, Sys.Modify, VM.Snapshot.Rollback, VM.Config.Disk, Datastore.Allocate, VM.Config.CPU, VM.Config.CDROM, Group.Allocate, Datastore.Audit, VM.Migrate, VM.GuestAgent.FileWrite, Mapping.Use, Datastore.AllocateSpace, Sys.Syslog, VM.Config.Options, Pool.Audit, User.Modify, VM.Config.HWType, VM.Audit, Sys.PowerMgmt, VM.GuestAgent.Audit, Mapping.Audit, VM.GuestAgent.FileRead, Permissions.Modify"

# pveum aclmod / -user terraform@pve -role Terraform

# pveum user token add terraform@pve <TokenName> --privsep=0
provider "proxmox" {
  endpoint  = var.virtual_environment_endpoint
  api_token = "terraform@pve!<TokenName>=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
  insecure  = true
  #ssh {
  #  agent    = true
  #  username = "terraform"
  #}
}