使用Flask快速搭建一个Steam游戏推荐系统

3,371 阅读4分钟

有人沉迷于刷抖音,有人沉迷于刷知乎,推荐系统如今已经影响甚至控制着人们的生活。本文将从最简单的算法和流程入手,使用Flask和gorse快速搭建一个Steam游戏推荐系统。

推荐系统架构

在开始开发之前,我们需要设计一下我的推荐系统的架构,如下图所示:

可以分割为三个部分:

  • gorse: gorse是一个离线推荐系统,向它提交用户-游戏购买记录,它可以自动训练模型,生成游戏推荐列表;
  • Flask: 使用Flask编写的Web服务负责用户登录、从Steam请求用户库存信息,向gorse推送库存信息,以及拉取推送结果;
  • Steam: 通过API提供库存信息,以及提供游戏封面图片。

这个Steam游戏推荐系统已经部署到了steamlens.gorse.io,如果有Steam账号以及能够访问Steam社区的方法(你懂的),可以尝试一下它的个性化推荐效果。代码也开源在了GitHub上,如果有能够访问Steam社区服务器的VPS,那么可以尝试自己部署。

创建推荐系统服务器

安装

首先我们需要安装推荐系统后端gorse,如果已经安装Go语言环境,将$GOBIN加入环境变量$PATH,那么可以直接使用以下命令安装:

$ go get github.com/zhenghaoz/gorse/...

数据准备

一切的一切都基于数据,好在网上已经有别人共享的Steam数据集了,原数据量非常大,为了方便演示使用,它被采样到了games.csv。我们创建一个文件夹,然后下载数据:

$ mkdir SteamLens
$ cd SteamLens
$ wget http://cdn.sine-x.com/backups/games.csv
...
$ head games.csv
76561197960272226,10,505
76561197960272226,20,0
76561197960272226,30,0
76561197960272226,40,0
76561197960272226,50,0
76561197960272226,60,0
76561197960272226,70,0
76561197960272226,130,0
76561197960272226,80,0
76561197960272226,100,0

可以发现数据有三列,分别是用户、游戏和时长。

测试模型

在创建推荐服务之前,需要选择最适合的推荐算法,gorse提供来对各种模型进行评估,可以运行gorse test -h或者查看在线文档学习如何使用。我们的数据集属于带权(游戏时长)隐式反馈,根据各个模型支持的输入,可以使用四种模型:item-popknn_implicitbprwrmf

首先测试一下非个性化推荐,作为基准:

$ gorse test item-pop --load-csv games.csv --csv-sep ',' --eval-precision --eval-recall --eval-ndcg --eval-map --eval-mrr
...
+--------------+----------+----------+----------+----------+----------+----------------------+
|              |  FOLD 1  |  FOLD 2  |  FOLD 3  |  FOLD 4  |  FOLD 5  |         MEAN         |
+--------------+----------+----------+----------+----------+----------+----------------------+
| Precision@10 | 0.080942 | 0.080655 | 0.080253 | 0.078880 | 0.078248 | 0.079796(±0.001548)  |
| Recall@10    | 0.308894 | 0.310532 | 0.312299 | 0.305665 | 0.308428 | 0.309163(±0.003498)  |
| NDCG@10      | 0.211919 | 0.209796 | 0.209004 | 0.209945 | 0.210466 | 0.210226(±0.001693)  |
| MAP@10       | 0.133684 | 0.132018 | 0.130520 | 0.133500 | 0.135297 | 0.133004(±0.002484)  |
| MRR@10       | 0.247601 | 0.242664 | 0.240176 | 0.244244 | 0.241920 | 0.243321(±0.004280)  |
+--------------+----------+----------+----------+----------+----------+----------------------+
2019/11/07 09:56:51 Complete cross validation (22.037387763s)

测试一下隐式KNN:

$ gorse test knn_implicit --load-csv games.csv --csv-sep ',' --eval-precision --eval-recall --eval-ndcg --eval-map --eval-mrr
...
+--------------+----------+----------+----------+----------+----------+----------------------+
|              |  FOLD 1  |  FOLD 2  |  FOLD 3  |  FOLD 4  |  FOLD 5  |         MEAN         |
+--------------+----------+----------+----------+----------+----------+----------------------+
| Precision@10 | 0.150892 | 0.153211 | 0.147429 | 0.152162 | 0.150013 | 0.150742(±0.003312)  |
| Recall@10    | 0.529160 | 0.546523 | 0.533619 | 0.543382 | 0.533702 | 0.537277(±0.009245)  |
| NDCG@10      | 0.528442 | 0.546386 | 0.529590 | 0.545167 | 0.530433 | 0.536004(±0.010383)  |
| MAP@10       | 0.451220 | 0.469989 | 0.453748 | 0.468641 | 0.453865 | 0.459493(±0.010497)  |
| MRR@10       | 0.635610 | 0.656008 | 0.636238 | 0.658769 | 0.636045 | 0.644534(±0.014235)  |
+--------------+----------+----------+----------+----------+----------+----------------------+
2019/11/07 09:59:14 Complete cross validation (1m4.169339752s)

再测试一下BPR:

$ gorse test bpr --load-csv games.csv --csv-sep ',' --eval-precision --eval-recall --eval-ndcg --eval-map --eval-mrr
...
+--------------+----------+----------+----------+----------+----------+----------------------+
|              |  FOLD 1  |  FOLD 2  |  FOLD 3  |  FOLD 4  |  FOLD 5  |         MEAN         |
+--------------+----------+----------+----------+----------+----------+----------------------+
| Precision@10 | 0.127123 | 0.128440 | 0.129396 | 0.124914 | 0.126719 | 0.127318(±0.002405)  |
| Recall@10    | 0.502971 | 0.511863 | 0.515385 | 0.503914 | 0.505500 | 0.507926(±0.007458)  |
| NDCG@10      | 0.434958 | 0.421336 | 0.427279 | 0.405582 | 0.424385 | 0.422708(±0.017126)  |
| MAP@10       | 0.350960 | 0.332219 | 0.336659 | 0.313238 | 0.337824 | 0.334180(±0.020942)  |
| MRR@10       | 0.495087 | 0.466407 | 0.477137 | 0.447885 | 0.475176 | 0.472338(±0.024453)  |
+--------------+----------+----------+----------+----------+----------+----------------------+
2019/11/07 10:01:51 Complete cross validation (56.85278659s)

最后测试一下WRMF,因为游戏时长的数值非常大,我们需要设置一个小的权重系数\alpha=0.01

$ gorse test wrmf --load-csv games.csv --csv-sep ',' --eval-precision --eval-recall --eval-ndcg --eval-map --eval-mrr --set-alpha 0.001
...
+--------------+----------+----------+----------+----------+----------+----------------------+
|              |  FOLD 1  |  FOLD 2  |  FOLD 3  |  FOLD 4  |  FOLD 5  |         MEAN         |
+--------------+----------+----------+----------+----------+----------+----------------------+
| Precision@10 | 0.145834 | 0.148021 | 0.147034 | 0.146564 | 0.143163 | 0.146123(±0.002960)  |
| Recall@10    | 0.524673 | 0.533390 | 0.533113 | 0.535772 | 0.525784 | 0.530546(±0.005873)  |
| NDCG@10      | 0.499655 | 0.504544 | 0.506967 | 0.513855 | 0.501728 | 0.505350(±0.008505)  |
| MAP@10       | 0.415299 | 0.419840 | 0.423166 | 0.431339 | 0.421243 | 0.422177(±0.009161)  |
| MRR@10       | 0.592257 | 0.592858 | 0.596109 | 0.610589 | 0.590023 | 0.596367(±0.014222)  |
+--------------+----------+----------+----------+----------+----------+----------------------+
2019/11/07 10:06:52 Complete cross validation (3m52.912709237s)

目前看起来(我们其实没有好好调参),KNN算法在我们的数据集上表现最好,速度也令人满意,所以我们选择KNN作为本案例的推荐算法。没有一个推荐算法一定由于其他算法,最佳的算法取决于数据集的特性,例如MovieLens 100K上最佳模型是WRMF而不是KNN。

导入数据

选择好模型,我们将数据导入gorse的内置数据库,创建一个文件夹data用于存在数据,将数据导入到data/gorse.db中:

$ mkdir data
$ gorse import-feedback data/gorse.db games.csv --sep ','

启动服务器

接下来创建推荐服务的配置文件config/gorse.toml,需要设置服务器监听地址、端口、数据库文件位置、一些琐碎的推荐配置,隐式KNN不需要超参,所以[params]处留空。

# This section declares settings for the server.
[server]
host = "0.0.0.0"        # server host
port = 8080             # server port

# This section declares setting for the database.
[database]
file = "data/gorse.db"  # database file

# This section declares settings for recommendation.
[recommend]
model = "knn_implicit"  # recommendation model
cache_size = 100        # the number of cached recommendations
update_threshold = 10   # update model when more than 10 ratings are added
check_period = 1        # check for update every one minute
similarity = "implicit" # similarity metric for neighbors

# This section declares hyperparameters for the recommendation model.
[params]

保存配置文件后,运行推荐服务器:

$ gorse serve -c config/gorse.toml
...
2019/11/07 16:45:05 update recommends
2019/11/07 16:47:02 update neighbors by implicit

如果出现最后两行,说明推荐结果已经生成完毕。

测试推荐接口

我们可以使用gorse提供的RESTful API来获取推荐结果:

$ curl http://127.0.0.1:8080/recommends/76561197960272226?number=10
[
 {
  "ItemId": 4540,
  "Score": 23.479386364078838
 },
 ...
 {
  "ItemId": 57300,
  "Score": 22.156954153653245
 }
]

我们获取了10条推荐,包含游戏ID和推荐评分。

创建前端展示服务器

申请密钥

我们需要连接用户的Steam账户获取库存游戏,因此涉及用户登录,需要访问“注册 Steam 网页 API 密钥”页面向Steam申请API密钥用来调用API,

Flask开发环境

接下来可以准备Flask开发需要的Pythn包了,需要依次安装:

$ pip install Flask
$ pip install Flask-OpenID
$ pip install Flask-SQLAlchemy
$ pip install uWSGI

我们可以在SteamLens下创建一个文件夹steamlens用于存放Flask程序代码:

$ mkdir steamlens

前端页面

前端设计不是本文的重点,HTML模板具体代码可见steamlens/templates,静态资源可见steamlens/static,仓库中提供了两种页面:

模板 作用 数据
page_gallery.jinja2 展示游戏列表 current_time: 时间, title: 标题, items: 游戏列表, nickname: 拥护昵称
page_app.jinja2 展示一款游戏和相似游戏列表 current_time: 时间, item_id: 游戏ID, title: 标题, items: 相似列表, nickname: 用户昵称

填写配置文件

在编写后端代码之前,将配置信息填写好:

# Configuration for gorse
GORSE_API_URI = 'http://127.0.0.1:8080'
GORSE_NUM_ITEMS = 30

# Configuration for SQL
SQLALCHEMY_DATABASE_URI = 'sqlite:///../data/steamlens.db'
SQLALCHEMY_TRACK_MODIFICATIONS = False

# Configuration for OpenID
OPENID_STIRE = '../data/openid_store'
SECRET_KEY = 'STEAM_API_KEY'

记得要把STEAM_API_KEY换成Steam的密钥

用户登录

我们首先编写基本框架和连接Steam的功能,文件位于steamlens/app.py,程序功能如下:

  1. 创建一个Flask app对象,从环境变量STEAMLENS_SETTINGS读取配置;
  2. 创建OpenID对象,用于连接Steam认证;
  3. 创建SQLAlchemy对象,用于连接数据库;
  4. 当用户登录后,获取用户名和ID保存到数据库,将库存游戏列表推送至gorse服务器。
import json
import os.path
import re
from datetime import datetime
from urllib.parse import urlencode
from urllib.request import urlopen

import requests
from flask import Flask, render_template, redirect, session, g
from flask_openid import OpenID
from flask_sqlalchemy import SQLAlchemy

app = Flask(__name__)
app.config.from_envvar('STEAMLENS_SETTINGS')

oid = OpenID(app, os.path.join(os.path.dirname(__file__), app.config['OPENID_STIRE']))
db = SQLAlchemy(app)

#################
# Steam Service #
#################

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    steam_id = db.Column(db.String(40))
    nickname = db.Column(db.String(80))

    @staticmethod
    def get_or_create(steam_id):
        rv = User.query.filter_by(steam_id=steam_id).first()
        if rv is None:
            rv = User()
            rv.steam_id = steam_id
            db.session.add(rv)
        return rv


@app.route("/login")
@oid.loginhandler
def login():
    if g.user is not None:
        return redirect(oid.get_next_url())
    else:
        return oid.try_login("http://steamcommunity.com/openid")


@app.route('/logout')
def logout():
    session.pop('user_id', None)
    return redirect('/pop')


@app.before_request
def before_request():
    g.user = None
    if 'user_id' in session:
        g.user = User.query.filter_by(id=session['user_id']).first()


@oid.after_login
def new_user(resp):
    _steam_id_re = re.compile('steamcommunity.com/openid/id/(.*?)$')
    match = _steam_id_re.search(resp.identity_url)
    g.user = User.get_or_create(match.group(1))
    steamdata = get_user_info(g.user.steam_id)
    g.user.nickname = steamdata['personaname']
    db.session.commit()
    session['user_id'] = g.user.id
    # Add games to gorse
    games = get_owned_games(g.user.steam_id)
    data = [{'UserId': int(g.user.steam_id), 'ItemId': int(v['appid']), 'Feedback': float(v['playtime_forever'])} for v in games]
    headers = {"Content-Type": "application/json"}
    requests.put('http://127.0.0.1:8080/feedback', data=json.dumps(data), headers=headers)
    return redirect(oid.get_next_url())


def get_user_info(steam_id):
    options = {
        'key': app.secret_key,
        'steamids': steam_id
    }
    url = 'http://api.steampowered.com/ISteamUser/' \
          'GetPlayerSummaries/v0001/?%s' % urlencode(options)
    rv = json.load(urlopen(url))
    return rv['response']['players']['player'][0] or {}


def get_owned_games(steam_id):
    options = {
        'key': app.secret_key,
        'steamid': steam_id,
        'format': 'json'
    }
    url = 'http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?%s' % urlencode(options)
    rv = json.load(urlopen(url))
    return rv['response']['games']


# Create tables if not exists.
db.create_all()

推荐展示

接着在steamlens/app.py中添加推荐展示功能,使用gorse提供的RESTful API,获取热门游戏、随机游戏、个性化推荐游戏以及某款游戏的相似游戏。

#######################
# Recommender Service #
#######################

@app.context_processor
def inject_current_time():
    return {'current_time': datetime.utcnow()}


@app.route('/')
def index():
    return redirect('/pop')


@app.route('/pop')
def pop():
    # Get nickname
    nickname = None
    if g.user:
        nickname = g.user.nickname
    # Get items
    r = requests.get('%s/popular?number=%d' % (app.config['GORSE_API_URI'], app.config['GORSE_NUM_ITEMS']))
    items = [v['ItemId'] for v in r.json()]
    # Render page
    return render_template('page_gallery.jinja2', title='Popular Games', items=items, nickname=nickname)


@app.route('/random')
def random():
    # Get nickname
    nickname = None
    if g.user:
        nickname = g.user.nickname
    # Get items
    r = requests.get('%s/random?number=%d' % (app.config['GORSE_API_URI'], app.config['GORSE_NUM_ITEMS']))
    items = [v['ItemId'] for v in r.json()]
    # Render page
    return render_template('page_gallery.jinja2', title='Random Games', items=items, nickname=nickname)


@app.route('/recommend')
def recommend():
    # Check login
    if g.user is None:
        return render_template('page_gallery.jinja2', title='Please login first', items=[])
    # Get items
    r = requests.get('%s/recommends/%s?number=%s' %
                     (app.config['GORSE_API_URI'], g.user.steam_id, app.config['GORSE_NUM_ITEMS']))
    # Render page
    if r.status_code == 200:
        items = [v['ItemId'] for v in r.json()]
        return render_template('page_gallery.jinja2', title='Recommended Games', items=items, nickname=g.user.nickname)
    return render_template('page_gallery.jinja2', title='Generating Recommended Games...', items=[], nickname=g.user.nickname)


@app.route('/item/<int:app_id>')
def item(app_id: int):
    # Get nickname
    nickname = None
    if g.user:
        nickname = g.user.nickname
    # Get items
    r = requests.get('%s/neighbors/%d?number=%d' %
                     (app.config['GORSE_API_URI'], app_id, app.config['GORSE_NUM_ITEMS']))
    items = [v['ItemId'] for v in r.json()]
    # Render page
    return render_template('page_app.jinja2', item_id=app_id, title='Similar Games', items=items, nickname=nickname)


@app.route('/user')
def user():
    # Check login
    if g.user is None:
        return render_template('page_gallery.jinja2', title='Please login first', items=[])
    # Get items
    r = requests.get('%s/user/%s' % (app.config['GORSE_API_URI'], g.user.steam_id))
    # Render page
    if r.status_code == 200:
        items = [v['ItemId'] for v in r.json()]
        return render_template('page_gallery.jinja2', title='Owned Games', items=items, nickname=g.user.nickname)
    return render_template('page_gallery.jinja2', title='Synchronizing Owned Games ...', items=[], nickname=g.user.nickname)

运行服务器

我们使用uWSGI来启动Flask服务器,因此需要在最外面的文件夹SteamLens中创建一个uwsgi.ini:

[uwsgi]

# Bind to the specified UNIX/TCP socket using default protocol
socket=0.0.0.0:5000

# Point to the main directory of the Web Site
chdir=/path/to/SteamLens/steamlens/

# Python startup file
wsgi-file=app.py

# The application variable of Python Flask Core Oject 
callable=app

# The maximum numbers of Processes
processes=1

# The maximum numbers of Threads
threads=2

# Set internal buffer size 
buffer-size=8192

记得需要将chdir改成文件夹SteamLens/steamlens所在的路径。最后执行以下命令运行Flask应用:

$ STEAMLENS_SETTINGS ../config/steamlens.cfg uwsgi --ini uwsgi.ini

可以访问steamlens.gorse.io/查看在线演示,登录系统后等待片刻,即可生成个性化推荐结果。针对笔者的推荐结果如下:

笔者热爱FPS类游戏,它给我推荐了大量的FPS游戏。但是,可以发现推荐的游戏都比较老,这是因为项目使用的数据集是2013年左右的,随着Steam更新了隐私策略,目前也无法在没有用户授权的情况下获取用户库存了。