Seldon-Core 基础实践之 Istio 和 seldon core 安装

一、 Istio安装

1、 Istio

Istio 是一个开源服务网格。如果您不熟悉术语服务网格,那么值得多读一点关于 Istio 的内容。

Seldon Core 可以与 istio 结合使用。 Istio 提供了一个入口网关,Seldon Core 可以自动将新部署连接到该网关。下面描述了使用 istio 的步骤。

1.1 下载

对于 Linux 和 macOS,下载 Istio 的最简单方法是使用以下命令:

# curl -L https://istio.io/downloadIstio | sh -
curl -L https://istio.io/downloadIstio  | ISTIO_VERSION=1.11.4 | sh -
# curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.6.8 TARGET_ARCH=x86_64 sh -
# 如果该地址拉不下来,则直接去github找资源
wget  https://github.com/istio/istio/releases/download/1.11.4/istio-1.11.4-linux-amd64.tar.gz
tar -zxvf istio-1.11.4-linux-amd64.tar.gz  -C /opt/modules/
[root@centos03 istio]# cd /opt/modules/istio-1.11.4/

[root@centos03 istio-1.11.4]# ls -l
总用量 28
drwxr-x---.  2 root root    22 10月 13 22:50 bin
-rw-r--r--.  1 root root 11348 10月 13 22:50 LICENSE
drwxr-xr-x.  5 root root    52 10月 13 22:50 manifests
-rw-r-----.  1 root root   854 10月 13 22:50 manifest.yaml
-rw-r--r--.  1 root root  5866 10月 13 22:50 README.md
drwxr-xr-x. 21 root root  4096 10月 13 22:50 samples
drwxr-xr-x.  3 root root    57 10月 13 22:50 tools

移动包目录:

cd istio-1.11.4

istioctl客户端添加到您的路径(Linux 或 macOS):

export PATH=$PWD/bin:$PATH

1.2 安装istio

Istio 提供了一个命令行工具 istioctl 来简化安装过程。演示配置文件有一组很好的默认值,可以在您的本地集群上使用。

> istioctl install --set profile=demo -y

操作:

[root@centos03 istio-1.11.4]# istioctl install --set profile=demo -y
✔ Istio core installed                                                                                                           
✔ Istiod installed                                                                                                               
✔ Ingress gateways installed                                                                                                     
✔ Egress gateways installed                                                                                                      
✔ Installation complete                                                                                                          
Thank you for installing Istio 1.11.  Please take a few minutes to tell us about your install/upgrade experience!  https://forms.gle/kWULBRjUv7hHci7T6
[root@centos03 istio-1.11.4]# 

命名空间标签 istio-injection=enabled 指示 Istio 在我们在该命名空间中部署的任何内容旁边自动注入代理。我们将为我们的默认命名空间设置它:

kubectl label namespace default istio-injection=enabled

1.3 创建 Istio 网关

为了让 Seldon Core 使用 Istio 的功能来管理集群流量,我们需要通过运行以下命令来创建一个 Istio 网关:

kubectl apply -f - << END
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: seldon-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
END

操作:

[root@centos03 istio-1.11.4]# kubectl apply -f - << END
> apiVersion: networking.istio.io/v1alpha3
> kind: Gateway
> metadata:
>   name: seldon-gateway
>   namespace: istio-system
> spec:
>   selector:
>     istio: ingressgateway # use istio default controller
>   servers:
>   - port:
>       number: 80
>       name: http
>       protocol: HTTP
>     hosts:
>     - "*"
> END
gateway.networking.istio.io/seldon-gateway created

二、K8S安装Seldon-Core

1、环境要求

安装要求:

  • k8s >= 1.18
  • Helm >= 3.0
  • Istio >= 1.5
[root@centos03 ~]# helm version
version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}

2、使用Helm安装Seldon Core

2.1 创建命名空间

kubectl create namespace seldon-system
[root@centos03 ~]# kubectl create namespace seldon-system
namespace/seldon-system created

2.2 安装

现在我们可以在 seldon-system 命名空间中安装 Seldon Core。

helm install seldon-core seldon-core-operator \
    --repo https://storage.googleapis.com/seldon-charts \
    --set usageMetrics.enabled=true \
    --set istio.enabled=true \
    --namespace seldon-system

部署成功:

[root@centos03 ~]# helm install seldon-core seldon-core-operator \
>     --repo https://storage.googleapis.com/seldon-charts \
>     --set usageMetrics.enabled=true \
>     --set istio.enabled=true \
>     --namespace seldon-system
NAME: seldon-core
LAST DEPLOYED: Fri Dec  3 08:33:58 2021
NAMESPACE: seldon-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

您可以通过执行以下操作来检查您的 Seldon 控制器是否正在运行:

kubectl get pods -n seldon-system

您应该会看到一个 STATUS=Running 的 seldon-controller-manager pod

2.3.4 删除未启动的seldon-core服务

找到seldon-core命名空间下未启动的服务的deployment name:

[root@centos03 istio-1.11.4]# kubectl get deployment -n seldon-system
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
seldon-controller-manager   0/1     1            0           86m
您在 /var/spool/mail/root 中有新邮件

删除该Pod:

[root@centos03 istio-1.11.4]# kubectl delete deployment seldon-controller-manager -n seldon-system
deployment.apps "seldon-controller-manager" deleted
[root@centos03 istio-1.11.4]# 

重新安装

[root@centos03 ~]# helm install seldon-core seldon-core-operator \
    --repo https://storage.googleapis.com/seldon-charts \
    --set usageMetrics.enabled=true \
    --set istio.enabled=true \
    --namespace seldon-system

重新安装seldon-core时出现如下报错:
Error: cannot re-use a name that is still in use

解决方案如下:

helm ls --all-namespaces
[root@centos03 ~]# helm ls --all-namespaces
NAME                    NAMESPACE                       REVISION    UPDATED                                 STATUS      CHART                       APP VERSION
notification-manager    kubesphere-monitoring-system    1           2021-10-09 19:43:37.02050295 +0800 CST  deployed    notification-manager-1.0.0  1.0.0      
seldon-core             seldon-system                   1           2021-12-03 08:33:58.992570322 +0800 CST deployed    seldon-core-operator-1.11.2 1.11.2     
snapshot-controller     kube-system                     8           2021-12-03 08:28:11.818233147 +0800 CST deployed    snapshot-controller-0.1.0   2.1.1      
[root@centos03 ~]# 
[root@centos03 ~]# kubectl delete namespace seldon-system
namespace "seldon-system" deleted

[root@centos03 ~]#  kubectl create namespace seldon-system
[root@centos03 ~]# helm install seldon-core seldon-core-operator \
    --repo https://storage.googleapis.com/seldon-charts \
    --set usageMetrics.enabled=true \
    --set istio.enabled=true \
    --namespace seldon-system

重启之后,可以看到 pod 一直处于pending 状态:
file

查看启动:

# kubectl get pods -n seldon-system
kubectl describe pod seldon-controller-manager-7b77d5988-7qnkk   -n seldon-system

查看pod描述:

[root@centos03 ~]# kubectl describe pod seldon-controller-manager-7b77d5988-7qnkk   -n seldon-system
Name:           seldon-controller-manager-7b77d5988-7qnkk
Namespace:      seldon-system
Priority:       0
Node:           <none>
Labels:         app=seldon
                app.kubernetes.io/instance=seldon1
                app.kubernetes.io/name=seldon
                app.kubernetes.io/version=v0.5
                control-plane=seldon-controller-manager
                pod-template-hash=7b77d5988
Annotations:    prometheus.io/scrape: true
                sidecar.istio.io/inject: false
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/seldon-controller-manager-7b77d5988
Containers:
  manager:
    Image:       docker.io/seldonio/seldon-core-operator:1.11.2
    Ports:       4443/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      /manager
    Args:
      --enable-leader-election
      --webhook-port=4443
      --create-resources=$(MANAGER_CREATE_RESOURCES)
      --log-level=$(MANAGER_LOG_LEVEL)
      --leader-election-id=$(MANAGER_LEADER_ELECTION_ID)

    Limits:
      cpu:     500m
      memory:  300Mi
    Requests:
      cpu:     100m
      memory:  200Mi
    Environment:
      MANAGER_LEADER_ELECTION_ID:                   a33bd623.machinelearning.seldon.io
      MANAGER_LOG_LEVEL:                            INFO
      WATCH_NAMESPACE:                              
      RELATED_IMAGE_EXECUTOR:                       
      RELATED_IMAGE_ENGINE:                         
      RELATED_IMAGE_STORAGE_INITIALIZER:            
      RELATED_IMAGE_SKLEARNSERVER:                  
      RELATED_IMAGE_XGBOOSTSERVER:                  
      RELATED_IMAGE_MLFLOWSERVER:                   
      RELATED_IMAGE_TFPROXY:                        
      RELATED_IMAGE_TENSORFLOW:                     
      RELATED_IMAGE_EXPLAINER:                      
      RELATED_IMAGE_MOCK_CLASSIFIER:                
      MANAGER_CREATE_RESOURCES:                     false
      POD_NAMESPACE:                                seldon-system (v1:metadata.namespace)
      CONTROLLER_ID:                                
      AMBASSADOR_ENABLED:                           true
      AMBASSADOR_SINGLE_NAMESPACE:                  false
      ENGINE_CONTAINER_IMAGE_AND_VERSION:           docker.io/seldonio/engine:1.11.2
      ENGINE_CONTAINER_IMAGE_PULL_POLICY:           IfNotPresent
      ENGINE_CONTAINER_SERVICE_ACCOUNT_NAME:        default
      ENGINE_CONTAINER_USER:                        8888
      ENGINE_LOG_MESSAGES_EXTERNALLY:               false
      PREDICTIVE_UNIT_HTTP_SERVICE_PORT:            9000
      PREDICTIVE_UNIT_GRPC_SERVICE_PORT:            9500
      PREDICTIVE_UNIT_DEFAULT_ENV_SECRET_REF_NAME:  
      PREDICTIVE_UNIT_METRICS_PORT_NAME:            metrics
      ENGINE_SERVER_GRPC_PORT:                      5001
      ENGINE_SERVER_PORT:                           8000
      ENGINE_PROMETHEUS_PATH:                       /prometheus
      ISTIO_ENABLED:                                true
      KEDA_ENABLED:                                 false
      ISTIO_GATEWAY:                                istio-system/seldon-gateway
      ISTIO_TLS_MODE:                               
      USE_EXECUTOR:                                 true
      EXECUTOR_CONTAINER_IMAGE_AND_VERSION:         docker.io/seldonio/seldon-core-executor:1.11.2
      EXECUTOR_CONTAINER_IMAGE_PULL_POLICY:         IfNotPresent
      EXECUTOR_PROMETHEUS_PATH:                     /prometheus
      EXECUTOR_SERVER_PORT:                         8000
      EXECUTOR_CONTAINER_USER:                      8888
      EXECUTOR_CONTAINER_SERVICE_ACCOUNT_NAME:      default
      EXECUTOR_SERVER_METRICS_PORT_NAME:            metrics
      EXECUTOR_REQUEST_LOGGER_DEFAULT_ENDPOINT:     http://default-broker
      DEFAULT_USER_ID:                              8888
      EXECUTOR_DEFAULT_CPU_REQUEST:                 500m
      EXECUTOR_DEFAULT_MEMORY_REQUEST:              512Mi
      EXECUTOR_DEFAULT_CPU_LIMIT:                   500m
      EXECUTOR_DEFAULT_MEMORY_LIMIT:                512Mi
      ENGINE_DEFAULT_CPU_REQUEST:                   500m
      ENGINE_DEFAULT_MEMORY_REQUEST:                512Mi
      ENGINE_DEFAULT_CPU_LIMIT:                     500m
      ENGINE_DEFAULT_MEMORY_LIMIT:                  512Mi
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from seldon-manager-token-j4sgs (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  seldon-webhook-server-cert
    Optional:    false
  seldon-manager-token-j4sgs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  seldon-manager-token-j4sgs
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  38h                 default-scheduler  0/1 nodes are available: 1 Insufficient cpu.
  Warning  FailedScheduling  38h                 default-scheduler  0/1 nodes are available: 1 Insufficient cpu.
  Warning  FailedScheduling  16s (x29 over 27m)  default-scheduler  0/1 nodes are available: 1 Insufficient cpu.
[root@centos03 ~]# 

如果判断某个 Node 资源是否足够? 通过 kubectl describe node 查看 node 资源情况,关注以下信息:

Allocatable: 表示此节点能够申请的资源总和
Allocated resources: 表示此节点已分配的资源 (Allocatable 减去节点上所有 Pod 总的 Request)

前者与后者相减,可得出剩余可申请的资源。如果这个值小于 Pod 的 request,就不满足 Pod 的资源要求,Scheduler 在 Predicates (预选) 阶段就会剔除掉这个 Node,也就不会调度上去。

Kubernetes节点资源耗尽状态的处理

解决方案:

[root@centos03 ~]# cd /etc/systemd/system/kubelet.service.d/
[root@centos03 kubelet.service.d]# ls -l
总用量 4
-rw-r--r--. 1 root root 991 10月  9 19:36 10-kubeadm.conf
[root@centos03 kubelet.service.d]# cat 10-kubeadm.conf 
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generate at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.222.12 --hostname-override=centos03 "
ExecStart=
ExecStart=/usr/local/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
[root@centos03 kubelet.service.d]# 

systemd启动kubelet时会用10-kubeadm.conf中的ExecStart覆盖/lib/systemd/system/kubelet.service中的ExecStart,这样我们才能看到上面kubelet后面那一长溜命令行启动参数。我们要做的就是在这行启动参数后面添加上我们想设置的nodefs.available的threshold值。

出于配置风格一致的考量,我们定义一个新的Environment var,比如就叫:KUBELET_EVICTION_POLICY_ARGS

Environment="KUBELET_EVICTION_POLICY_ARGS=--eviction-hard=nodefs.available<5%" ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_EXTRA_ARGS $KUBELET_EVICTION_POLICY_ARGS

file

然后重启kubelet:

systemctl daemon-reload
systemctl restart kubelet

上边的问题是由于虚拟机CPU分配的过少,导致启动失败,在虚拟及配置调高CPU内存即可。

file

本地端口转发

因为您的 kubernetes 集群在本地运行,我们需要将您本地机器上的一个端口转发到集群中的一个端口,以便我们能够从外部访问它。你可以通过运行来做到这一点:

kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80

这会将任何流量从本地机器上的端口 8080 转发到集群内的端口 80。

三、部署模型

您现在已在本地集群上成功安装 Seldon Core,并准备开始将模型部署为生产微服务。

使用预先打包的模型服务器部署您的模型

我们为一些最流行的深度学习和机器学习框架提供优化的模型服务器,允许您部署经过训练的模型二进制文件/权重,而无需容器化或修改它们。

您只需将模型二进制文件上传到您喜欢的对象存储中,在这种情况下,我们在 Google 存储桶中有一个经过训练的 scikit-learn iris 模型:

gs://seldon-models/v1.12.0-dev/sklearn/iris/model.joblib

创建一个命名空间来运行你的模型:

kubectl create namespace seldon

然后,我们可以通过运行以下命令,使用为 scikit-learn (SKLEARN_SERVER) 预先打包的模型服务器将带有 Seldon Core 的模型部署到我们的 Kubernetes 集群:kubectl apply

$ kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: iris-model
  namespace: seldon
spec:
  name: iris
  predictors:
  - graph:
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/v1.12.0-dev/sklearn/iris
      name: classifier
    name: default
    replicas: 1
END

操作:

[root@centos03 ~]#  kubectl apply -f - << END
> apiVersion: machinelearning.seldon.io/v1
> kind: SeldonDeployment
> metadata:
>   name: iris-model
>   namespace: seldon
> spec:
>   name: iris
>   predictors:
>   - graph:
>       implementation: SKLEARN_SERVER
>       modelUri: gs://seldon-models/v1.12.0-dev/sklearn/iris
>       name: classifier
>     name: default
>     replicas: 1
> END
seldondeployment.machinelearning.seldon.io/iris-model created
[root@centos03 ~]# 

向您部署的模型发送 API 请求

部署的每个模型都公开了一个标准化的用户界面,以使用我们的 OpenAPI 模式发送请求。

这可以通过端点访问,http://<ingress_url>/seldon/<namespace>/<model-name>/api/v1.0/doc/ 这将允许您直接通过浏览器发送请求。

file

http://192.168.222.12:8080/seldon/seldon/iris-model/api/v1.0/doc/

查看该pod暴露的服务:

[root@centos03 ~]# kubectl get svc --all-namespaces | grep seldon
seldon-system                  seldon-webhook-service                    ClusterIP      10.233.57.127   <none>        443/TCP                                                                      16d
seldon                         iris-model-default                        ClusterIP      10.233.53.254   <none>        8000/TCP,5001/TCP                                                            6d15h
seldon                         iris-model-default-classifier             ClusterIP      10.233.61.74    <none>        9000/TCP,9500/TCP                                                            14d

在k8s集群测试服务:

$ curl -X POST http://10.233.53.254:8000/api/v1.0/predictions \
    -H 'Content-Type: application/json' \
    -d '{ "data": { "ndarray": [[1,2,3,4]] } }'

在服务器上执行:

[root@centos03 ~]# kubectl get svc --all-namespaces | grep seldon
seldon-system                  seldon-webhook-service                    ClusterIP      10.233.57.127   <none>        443/TCP                                                                      16d
seldon                         iris-model-default                        ClusterIP      10.233.53.254   <none>        8000/TCP,5001/TCP                                                            6d15h
seldon                         iris-model-default-classifier             ClusterIP      10.233.61.74    <none>        9000/TCP,9500/TCP                                                            14d
[root@centos03 ~]# curl -X POST http://10.233.53.254:8000/seldon/seldon/iris-model/api/v1.0/predictions \
>     -H 'Content-Type: application/json' \
>     -d '{ "data": { "ndarray": [[1,2,3,4]] } }'
404 page not found
[root@centos03 ~]#  curl -X POST http://10.233.53.254:8000/api/v1.0/predictions \
>     -H 'Content-Type: application/json' \
>     -d '{ "data": { "ndarray": [[1,2,3,4]] } }'
{"data":{"names":["t:0","t:1","t:2"],"ndarray":[[0.0006985194531162835,0.00366803903943666,0.995633441507447]]},"meta":{"requestPath":{"classifier":"seldonio/sklearnserver:1.11.2"}}}
[root@centos03 ~]# 

使用语言包装器部署您的自定义模型

对于更多具有自定义依赖项(例如第 3 方库、操作系统二进制文件甚至外部系统)的自定义深度学习和机器学习用例,我们可以使用任何 Seldon Core 语言包装器。

您只需要编写一个类包装器来公开模型的逻辑;例如在 Python 中,我们可以创建一个文件Model.py:

import pickle
class Model:
    def __init__(self):
        self._model = pickle.loads( open("model.pickle", "rb") )

    def predict(self, X):
        output = self._model(X)
        return output

我们现在可以使用Seldon Core s2i utils容器化我们的类文件来生成sklearn_iris图像:

s2i build . seldonio/seldon-core-s2i-python3:0.18 sklearn_iris:0.1

现在我们将它部署到我们的 Seldon Core Kubernetes 集群:

$ kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: iris-model
  namespace: model-namespace
spec:
  name: iris
  predictors:
  - componentSpecs:
    - spec:
      containers:
      - name: classifier
        image: sklearn_iris:0.1
  - graph:
      name: classifier
    name: default
    replicas: 1
END

向您部署的模型发送 API 请求¶
部署的每个模型都公开了一个标准化的用户界面,以使用我们的 OpenAPI 模式发送请求。

这可以通过端点访问,http:///seldon///api/v1.0/doc/这将允许您直接通过浏览器发送请求。


相关文章:
Install Locally Seldon Core 官方文档
Seldon-Core K8S 官方文档
Istio官方文档
Istio 服务网格部署实践
Kubernetes节点资源耗尽状态的处理
Pod 一直处于 Pending 状态
深入了解高级生产机器学习集成

为者常成,行者常至