> 文章列表 > 推荐算法再次踩坑记录

推荐算法再次踩坑记录

推荐算法再次踩坑记录

去年搞通了EasyRec这个玩意,没想到今年还要用推荐方面的东西,行吧,再来一次,再次踩坑试试。

1、EasyRec训练测试数据下载:

git clone后,进入EasyRec,然后执行:bash scripts/init.sh 将所用到的数据全部下载完成✅

2、模型部署,参考博文docker部署tf-serving :

首先必须将final文件,将此文件下的全部复制到/models/half_plus_two/下面。

2.1查看模型基本参数:

saved_model_cli show --dir /models/half_plus_two/00000123/ --tag_set serve --signature_def serving_default
The given SavedModel SignatureDef contains the following input(s):inputs['x'] tensor_info:dtype: DT_FLOATshape: (-1, 1)name: x:0
The given SavedModel SignatureDef contains the following output(s):outputs['y'] tensor_info:dtype: DT_FLOATshape: (-1, 1)name: y:0
Method name is: tensorflow/serving/predict

2.2查看模型是否支持GPU

saved_model_cli show --dir /models/half_plus_two/00000123/
The given SavedModel contains the following tag-sets:
serve#表示不支持GPU,支持的应该是serve,gpu

2.3输入数据查看结果

saved_model_cli run --dir /models/half_plus_two/00000123/ --tag_set serve --signature_def serving_default --input_exprs="x=[[1],[9]]"
[[2.5][6.5]]
结果正确,一半+2,1*0.5+2,9*0.5+2,结果一致

2.4curl: (56) Recv failure: Connection reset by peer

发现docker -p指定端口不能用,其中有鬼,之前服务器就可以指定端口 ,因此仍旧改为8501端口。

docker启动代码见此文。

2.5docker停止及删除容器 ,没有此操作无法重启该名字的容器。

注意,里面有模型名字及容器名字,建议都取一样的名字。比如half_plus_two

docker kill half_plus_two
docker rm half_plus_two

3、部署训练好的dssm模型

3.1查看模型输入参数

saved_model_cli show --dir /models/mydssm/163333/ --tag_set serve --signature_def serving_default
The given SavedModel SignatureDef contains the following input(s):inputs['app_category'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_9:0inputs['app_domain'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_8:0inputs['app_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_7:0inputs['banner_pos'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_3:0inputs['c1'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_2:0inputs['c14'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_15:0inputs['c15'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_16:0inputs['c16'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_17:0inputs['c17'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_18:0inputs['c18'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_19:0inputs['c19'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_20:0inputs['c20'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_21:0inputs['c21'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_22:0inputs['device_conn_type'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_14:0inputs['device_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_10:0inputs['device_ip'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_11:0inputs['device_model'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_12:0inputs['device_type'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_13:0inputs['hour'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_1:0inputs['site_category'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_6:0inputs['site_domain'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_5:0inputs['site_id'] tensor_info:dtype: DT_STRINGshape: (-1)name: input_4:0
The given SavedModel SignatureDef contains the following output(s):outputs['logits'] tensor_info:dtype: DT_FLOATshape: (-1)name: Squeeze:0outputs['probs'] tensor_info:dtype: DT_FLOATshape: (-1)name: Sigmoid:0
Method name is: tensorflow/serving/predict#同样也不支持GPU
saved_model_cli show --dir /models/mydssm/13339343/
The given SavedModel contains the following tag-sets:
serve

3.2测试请求

curl -d '{"instances": [{"app_category":"1","app_domain":"2","app_id":"3","banner_pos":"4","c1":"5","c14":"6","c15":"7","c16":"8","c17":"9","c18":"10","c19":"11","c20":"12","c21":"13","device_conn_type":"14","device_id":"15","device_ip":"16","device_model":"17","device_type":"18","hour":"19","site_category":"20","site_domain":"21","site_id":"22"}]}'     -X POST http://localhost:8501/v1/models/mydssm:predict
{"predictions": [{"logits": -2.64485741,"probs": 0.0663066804}]
#https://github.com/tensorflow/serving/issues/2104
>>> import json,requests
>>> heads = {"content-type": "application/json"}
>>> jd={"signature_name": "serving_default","instances":[{"app_category":"1","app_domain":"2","app_id":"3","banner_pos":"4","c1":"5","c14":"6","c15":"7","c16":"8","c17":"9","c18":"10","c19":"11","c20":"12","c21":"13","device_conn_type":"14","device_id":"15","device_ip":"16","device_model":"17","device_type":"18","hour":"19","site_category":"20","site_domain":"21","site_id":"22"}]}
>>> requests.post(url,data=json.dumps(jd),headers=heads).json()
{'predictions': [{'logits': -2.64485741, 'probs': 0.0663066804}]}

4、部署 tf-serving使用GPU,需要docker安装GPU版本

4.1拉取images

docker pull tensorflow/serving:latest-gpu

4.2安装nvidia docker容器工具

CentOS

sudo dnf clean expire-cache \\&& sudo dnf install -y nvidia-container-toolkit-base
nvidia-ctk --version
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
grep "  name:" /etc/cdi/nvidia.yaml

Ubuntu

 sudo apt-get update \\&& sudo apt-get install -y nvidia-container-toolkit-base
nvidia-ctk --version
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
grep "  name:" /etc/cdi/nvidia.yaml

4.3启动

docker run --runtime=nvidia -p 8501:8501 \\ --mount type=bind,\\ source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,\\ target=/models/half_plus_two \\ -e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu --per_process_gpu_memory_fraction=0.5
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

综上所述,不足之处请参考,英伟达官方介绍,tf-serving官方github

拜拜