Skip to content

[Infiniband] Fix unbinding of physical functions when configuring Infiniband virtual functions #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 62 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
85feccd
Refactor some conformance tests to utilize SRIOV_NODE_AND_DEVICE_NAME…
evgenLevin Sep 3, 2024
91e04f6
metrics: Add PrometheusRule for namespaced metrics
zeeke Jul 10, 2024
b49cf15
metrics: Add permissions to remove monitor objects
zeeke Aug 28, 2024
aecb4bb
Merge pull request #732 from zeeke/metrics-exporter-rules
adrianchiris Sep 19, 2024
60c6404
Merge pull request #771 from zeeke/us/e2e-filter-devices
zeeke Sep 19, 2024
6aedb8c
Fix merge annotation function
SchSeba Sep 18, 2024
644fcf2
Delete webhooks when SriovOperatorConfig is deleted
zeeke Sep 19, 2024
e2d0611
Merge pull request #776 from SchSeba/fix_render
zeeke Sep 20, 2024
f17bb2a
metrics: Fix typo in `METRICS_EXPORTER_PROMETHEUS_DEPLOY_RULES`
zeeke Sep 19, 2024
f94fa64
Fix syntax for RDMA_CNI_IMAGE var substitution
mandre Sep 20, 2024
4bae6ce
Merge pull request #780 from mandre/fix-RDMA_CNI_IMAGE
zeeke Sep 22, 2024
3ff1b85
metrics: Add `node` label to `sriov_*` metrics
zeeke Sep 12, 2024
084810a
openstack: dynamically mount the config-drive
EmilienM Sep 11, 2024
ba21df0
Enclose array expansions in double quote
mandre Sep 23, 2024
3d553bf
Add missing shebang
mandre Sep 23, 2024
63246d6
Explicitly expand array values
mandre Sep 23, 2024
3529811
Iterate over globs.
mandre Sep 23, 2024
61aacb5
Fix: GetDevlinkDeviceParam to handle edge-cases correctly
ykulazhenkov Sep 23, 2024
6f44ae5
Merge pull request #779 from zeeke/us/OCPBUGS-41897
e0ne Oct 1, 2024
31175eb
Merge pull request #782 from ykulazhenkov/pr-fix-getdevlinkdeviceparam
e0ne Oct 2, 2024
aecf473
Merge pull request #774 from zeeke/metrics-exporter-drop-labels
SchSeba Oct 7, 2024
a01a139
metrics: Fix `Metrics should have the correct labels` test
zeeke Oct 7, 2024
6abdfe6
Fix NRI rbac
SchSeba Oct 8, 2024
9143c95
Merge pull request #787 from SchSeba/add_rbac_to_nri
zeeke Oct 9, 2024
fb193e8
Use grep for matching args with sh
mandre Sep 26, 2024
5394d21
CI: Add a bash linter to pre-submits
mandre Sep 23, 2024
e35f3d4
Merge pull request #785 from zeeke/us/metrics-e2e-fix
ykulazhenkov Oct 9, 2024
c02e517
Merge pull request #781 from mandre/shellcheck
ykulazhenkov Oct 9, 2024
92cf81c
Merge pull request #773 from EmilienM/configDrive
zeeke Oct 9, 2024
f286a04
config-daemon: Restart all instances of device-plugin
zeeke Oct 4, 2024
a85ab70
Merge pull request #783 from zeeke/us/multiple-device-plugins
zeeke Oct 9, 2024
85063dc
Add Intel Corporation Ethernet Controller E810-XXV for backplane, E82…
wizhaoredhat Oct 10, 2024
6556c92
Add NVIDIA ConnectX-8 to supported NICs list
e0ne Sep 19, 2024
8fe7a5e
Merge pull request #790 from wizhaoredhat/add_netsec_ethernet_control…
bn222 Oct 11, 2024
9782923
logging: Reduce device discovering verbosity
zeeke Oct 18, 2024
68b6c02
Merge pull request #793 from zeeke/us/config-daemon-reduce-logging
adrianchiris Oct 21, 2024
b5b0d6b
Add a note in documentation regarding systemd mode
souleb Oct 22, 2024
dc299c4
Fixing daemon sriov VFs config, where PF pci address got unbind inste…
heyvister1 Oct 27, 2024
df1407d
Fix k8s CI
SchSeba Oct 29, 2024
eb96108
Merge pull request #778 from e0ne/connectx-8-support
SchSeba Oct 29, 2024
ab79e2c
Merge pull request #794 from souleb/post-delete-systemd
SchSeba Oct 29, 2024
8d9e8da
Merge pull request #797 from heyvister1/fix-daemon-unbind-pf
SchSeba Oct 29, 2024
09a3af9
Merge pull request #801 from SchSeba/fix_k8s_ci_virtual_cluster
adrianchiris Oct 29, 2024
0d9a707
adding sriov operator config finalizer, to control generated cluster …
heyvister1 Oct 13, 2024
b1bb044
adding sriov operator config cleanup binary, to be used under helm un…
heyvister1 Oct 28, 2024
5009e99
Merge pull request #791 from heyvister1/webhooks-k8s-objects-deletion
zeeke Oct 30, 2024
6d32ec0
kernel: Set arguments based on CPU architecture
zeeke Oct 25, 2024
5522c96
Update `github.com/jaypipes/ghw`
zeeke Oct 25, 2024
2b02ba1
Merge pull request #796 from zeeke/us/OCPBUGS-43654
SchSeba Oct 31, 2024
73c1f81
RDMA subsystem is implemented via ib_core module config.
e0ne Mar 23, 2024
02c6b00
Add kernel args for rdma mode to complement the modprobe file
SchSeba Oct 28, 2024
92fee7b
Merge pull request #799 from SchSeba/rdma-subsytem-mode
SchSeba Nov 11, 2024
baa41c9
redesign device plugin
SchSeba Nov 7, 2024
8950f76
deploy: relax Operator node affinity
EmilienM Nov 14, 2024
623c339
Merge pull request #747 from SchSeba/device_plugin_redesign
SchSeba Nov 19, 2024
5fce462
Merge pull request #806 from EmilienM/nodeAffinity
SchSeba Nov 20, 2024
8a91004
Upgrade golangci-lint to work with Go 1.23
clarkzinzow Sep 18, 2024
1d92064
Add platform build arg.
clarkzinzow Nov 24, 2024
b98d857
Comment out Mellanox plugin's draining + rebooting for totalVfs + SRI…
clarkzinzow Jan 31, 2025
82b5147
Scan GUIDs (#7)
punkerpunker Feb 11, 2025
5e0f66e
ENG-21048 - KernelArgIommuOn instead of KernelArgIommuPt (to enable A…
punkerpunker Feb 18, 2025
1fb8405
ENG-19808 mlxfwreset before reboot in SR-IOV operator (#8)
punkerpunker Feb 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,16 @@ jobs:
# Required: the version of golangci-lint is required and must be specified without patch version: we always use the latest patch version.
version: v1.55.2

shellcheck:
name: Shellcheck
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run ShellCheck
uses: ludeeus/action-shellcheck@master
with:
severity: error

test-coverage:
name: test-coverage
runs-on: ubuntu-latest
Expand Down
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@ FROM golang:1.22 AS builder
WORKDIR /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator
COPY . .
RUN make _build-manager BIN_PATH=build/_output/cmd
RUN make _build-sriov-network-operator-config-cleanup BIN_PATH=build/_output/cmd

FROM quay.io/centos/centos:stream9
COPY --from=builder /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/build/_output/cmd/manager /usr/bin/sriov-network-operator
COPY --from=builder /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/build/_output/cmd/sriov-network-operator-config-cleanup /usr/bin/sriov-network-operator-config-cleanup
COPY bindata /bindata
ENV OPERATOR_NAME=sriov-network-operator
CMD ["/usr/bin/sriov-network-operator"]
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ export OPERATOR_EXEC?=oc

BUILD_GOPATH=$(TARGET_DIR):$(TARGET_DIR)/vendor:$(CURPATH)/cmd
IMAGE_BUILDER?=docker
IMAGE_BUILD_OPTS?=
IMAGE_BUILD_OPTS?=--platform linux/amd64
DOCKERFILE?=Dockerfile
DOCKERFILE_CONFIG_DAEMON?=Dockerfile.sriov-network-config-daemon
DOCKERFILE_WEBHOOK?=Dockerfile.webhook
Expand Down Expand Up @@ -53,14 +53,14 @@ GOLANGCI_LINT = $(BIN_DIR)/golangci-lint
# golangci-lint version should be updated periodically
# we keep it fixed to avoid it from unexpectedly failing on the project
# in case of a version bump
GOLANGCI_LINT_VER = v1.55.2
GOLANGCI_LINT_VER = v1.61.0


.PHONY: all build clean gendeepcopy test test-e2e test-e2e-k8s run image fmt sync-manifests test-e2e-conformance manifests update-codegen

all: generate lint build

build: manager _build-sriov-network-config-daemon _build-webhook
build: manager _build-sriov-network-config-daemon _build-webhook _build-sriov-network-operator-config-cleanup

_build-%:
WHAT=$* hack/build-go.sh
Expand Down Expand Up @@ -226,7 +226,7 @@ test-e2e-k8s: export NAMESPACE=sriov-network-operator
test-e2e-k8s: test-e2e

test-bindata-scripts: fakechroot
fakechroot ./test/scripts/enable-kargs_test.sh
fakechroot ./test/scripts/kargs_test.sh

test-%: generate manifests envtest
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir=/tmp -p path)" HOME="$(shell pwd)" go test ./$*/... -coverprofile cover-$*.out -coverpkg ./... -v
Expand Down
21 changes: 14 additions & 7 deletions api/v1/helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,12 @@ import (
)

const (
LASTNETWORKNAMESPACE = "operator.sriovnetwork.openshift.io/last-network-namespace"
NETATTDEFFINALIZERNAME = "netattdef.finalizers.sriovnetwork.openshift.io"
POOLCONFIGFINALIZERNAME = "poolconfig.finalizers.sriovnetwork.openshift.io"
ESwithModeLegacy = "legacy"
ESwithModeSwitchDev = "switchdev"
LASTNETWORKNAMESPACE = "operator.sriovnetwork.openshift.io/last-network-namespace"
NETATTDEFFINALIZERNAME = "netattdef.finalizers.sriovnetwork.openshift.io"
POOLCONFIGFINALIZERNAME = "poolconfig.finalizers.sriovnetwork.openshift.io"
OPERATORCONFIGFINALIZERNAME = "operatorconfig.finalizers.sriovnetwork.openshift.io"
ESwithModeLegacy = "legacy"
ESwithModeSwitchDev = "switchdev"

SriovCniStateEnable = "enable"
SriovCniStateDisable = "disable"
Expand Down Expand Up @@ -721,7 +722,12 @@ func (cr *SriovIBNetwork) RenderNetAttDef() (*uns.Unstructured, error) {
data.Data["CapabilitiesConfigured"] = true
data.Data["SriovCniCapabilities"] = cr.Spec.Capabilities
}

if cr.Spec.PKey == "" {
data.Data["pKeyConfigured"] = false
} else {
data.Data["pKeyConfigured"] = true
data.Data["pKey"] = cr.Spec.PKey
}
if cr.Spec.IPAM != "" {
data.Data["SriovCniIpam"] = SriovCniIpam + ":" + strings.Join(strings.Fields(cr.Spec.IPAM), "")
} else {
Expand Down Expand Up @@ -764,6 +770,7 @@ func (cr *SriovNetwork) RenderNetAttDef() (*uns.Unstructured, error) {
data := render.MakeRenderData()
data.Data["CniType"] = "sriov"
data.Data["SriovNetworkName"] = cr.Name
data.Data["pKeyConfigured"] = false
if cr.Spec.NetworkNamespace == "" {
data.Data["SriovNetworkNamespace"] = cr.Namespace
} else {
Expand All @@ -784,7 +791,6 @@ func (cr *SriovNetwork) RenderNetAttDef() (*uns.Unstructured, error) {
data.Data["VlanProtoConfigured"] = true
data.Data["SriovCniVlanProto"] = cr.Spec.VlanProto
}

if cr.Spec.Capabilities == "" {
data.Data["CapabilitiesConfigured"] = false
} else {
Expand Down Expand Up @@ -882,6 +888,7 @@ func (cr *OVSNetwork) RenderNetAttDef() (*uns.Unstructured, error) {
data := render.MakeRenderData()
data.Data["CniType"] = "ovs"
data.Data["NetworkName"] = cr.Name
data.Data["pKeyConfigured"] = false
if cr.Spec.NetworkNamespace == "" {
data.Data["NetworkNamespace"] = cr.Namespace
} else {
Expand Down
1 change: 1 addition & 0 deletions api/v1/sriovibnetwork_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ type SriovIBNetworkSpec struct {
// MetaPluginsConfig configuration to be used in order to chain metaplugins to the sriov interface returned
// by the operator.
MetaPluginsConfig string `json:"metaPlugins,omitempty"`
PKey string `json:"pKey,omitempty"`
}

// SriovIBNetworkStatus defines the observed state of SriovIBNetwork
Expand Down
8 changes: 8 additions & 0 deletions api/v1/sriovnetworknodestate_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (
type SriovNetworkNodeStateSpec struct {
Interfaces Interfaces `json:"interfaces,omitempty"`
Bridges Bridges `json:"bridges,omitempty"`
System System `json:"system,omitempty"`
}

type Interfaces []Interface
Expand Down Expand Up @@ -114,10 +115,17 @@ type OVSUplinkConfigExt struct {
Interface OVSInterfaceConfig `json:"interface,omitempty"`
}

type System struct {
// +kubebuilder:validation:Enum=shared;exclusive
//RDMA subsystem. Allowed value "shared", "exclusive".
RdmaMode string `json:"rdmaMode,omitempty"`
}

// SriovNetworkNodeStateStatus defines the observed state of SriovNetworkNodeState
type SriovNetworkNodeStateStatus struct {
Interfaces InterfaceExts `json:"interfaces,omitempty"`
Bridges Bridges `json:"bridges,omitempty"`
System System `json:"system,omitempty"`
SyncStatus string `json:"syncStatus,omitempty"`
LastSyncError string `json:"lastSyncError,omitempty"`
}
Expand Down
4 changes: 4 additions & 0 deletions api/v1/sriovnetworkpoolconfig_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ type SriovNetworkPoolConfigSpec struct {
// Drain will respect Pod Disruption Budgets (PDBs) such as etcd quorum guards,
// even if maxUnavailable is greater than one.
MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty"`

// +kubebuilder:validation:Enum=shared;exclusive
// RDMA subsystem. Allowed value "shared", "exclusive".
RdmaMode string `json:"rdmaMode,omitempty"`
}

type OvsHardwareOffloadConfig struct {
Expand Down
17 changes: 17 additions & 0 deletions api/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions bindata/manifests/cni-config/sriov/sriov-cni-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ spec:
{{- if .CapabilitiesConfigured -}}
"capabilities":{{.SriovCniCapabilities}},
{{- end -}}
{{- if .pKeyConfigured -}}
"pkey":"{{.pKey}}",
{{- end -}}
{{- if .StateConfigured -}}
"link_state":"{{.SriovCniState}}",
{{- end -}}
Expand Down
38 changes: 38 additions & 0 deletions bindata/manifests/metrics-exporter/metrics-prometheus-rule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
{{ if and .IsPrometheusOperatorInstalled .PrometheusOperatorDeployRules }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: sriov-vf-rules
namespace: {{.Namespace}}
spec:
groups:
- name: sriov-network-metrics-operator.rules
interval: 30s
rules:
- expr: |
sriov_vf_tx_packets * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
record: network:sriov_vf_tx_packets
- expr: |
sriov_vf_rx_packets * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
record: network:sriov_vf_rx_packets
- expr: |
sriov_vf_tx_bytes * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
record: network:sriov_vf_tx_bytes
- expr: |
sriov_vf_rx_bytes * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
record: network:sriov_vf_rx_bytes
- expr: |
sriov_vf_tx_dropped * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
record: network:sriov_vf_tx_dropped
- expr: |
sriov_vf_rx_dropped * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
record: network:sriov_vf_rx_dropped
- expr: |
sriov_vf_rx_broadcast * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
record: network:sriov_vf_rx_broadcast
- expr: |
sriov_vf_rx_multicast * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
record: network:sriov_vf_rx_multicast
{{ end }}

11 changes: 11 additions & 0 deletions bindata/manifests/metrics-exporter/metrics-prometheus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,17 @@ spec:
bearerTokenFile: "/var/run/secrets/kubernetes.io/serviceaccount/token"
scheme: "https"
honorLabels: true
relabelings:
- action: replace
sourceLabels:
- __meta_kubernetes_endpoint_node_name
targetLabel: node
- action: labeldrop
regex: pod
- action: labeldrop
regex: container
- action: labeldrop
regex: namespace
tlsConfig:
serverName: sriov-network-metrics-exporter-service.{{.Namespace}}.svc
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
Expand Down
2 changes: 1 addition & 1 deletion bindata/manifests/plugins/sriov-device-plugin.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ spec:
hostNetwork: true
nodeSelector:
{{- range $key, $value := .NodeSelectorField }}
{{ $key }}: {{ $value }}
{{ $key }}: "{{ $value }}"
{{- end }}
tolerations:
- operator: Exists
Expand Down
2 changes: 1 addition & 1 deletion bindata/manifests/webhook/002-rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ rules:
- apiGroups:
- ""
resources:
- configmap
- configmaps
verbs:
- 'watch'
- 'list'
Expand Down
33 changes: 0 additions & 33 deletions bindata/scripts/enable-kargs.sh

This file was deleted.

55 changes: 55 additions & 0 deletions bindata/scripts/kargs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/bin/bash
set -x

command=$1
shift
declare -a kargs=( "$@" )
ret=0
args=$(chroot /host/ cat /proc/cmdline)

if chroot /host/ test -f /run/ostree-booted ; then
for t in "${kargs[@]}";do
if [[ $command == "add" ]];then
if [[ $args != *${t}* ]];then
if chroot /host/ rpm-ostree kargs | grep -vq ${t}; then
chroot /host/ rpm-ostree kargs --append ${t} > /dev/null 2>&1
fi
let ret++
fi
fi
if [[ $command == "remove" ]];then
if [[ $args == *${t}* ]];then
if chroot /host/ rpm-ostree kargs | grep -q ${t}; then
chroot /host/ rpm-ostree kargs --delete ${t} > /dev/null 2>&1
fi
let ret++
fi
fi
done
else
chroot /host/ which grubby > /dev/null 2>&1
# if grubby is not there, let's tell it
if [ $? -ne 0 ]; then
exit 127
fi
for t in "${kargs[@]}";do
if [[ $command == "add" ]];then
if [[ $args != *${t}* ]];then
if chroot /host/ grubby --info=DEFAULT | grep args | grep -vq ${t}; then
chroot /host/ grubby --update-kernel=DEFAULT --args=${t} > /dev/null 2>&1
fi
let ret++
fi
fi
if [[ $command == "remove" ]];then
if [[ $args == *${t}* ]];then
if chroot /host/ grubby --info=DEFAULT | grep args | grep -q ${t}; then
chroot /host/ grubby --update-kernel=DEFAULT --remove-args=${t} > /dev/null 2>&1
fi
let ret++
fi
fi
done
fi

echo $ret
Loading
Loading