ALEXSONG_JY

2020-08-21   阅读量: 804

pandas的RFM模型

扫码加入数据分析学习群


13 RFM 模型



In [413]:

import numpy as np
import pandas as pd

In [414]:

data = pd.read_excel('PYTHON-RFM实战数据.xlsx')
data.head()

Out[414]:


品牌名称买家昵称付款日期订单状态实付金额邮费省份城市购买数量
0CDA数据分析叫我李22019-01-01 00:17:59交易成功1866上海上海市1
1CDA数据分析0cyb19922019-01-01 00:59:54交易成功1450广东省广州市1
2CDA数据分析萝污萌莉2019-01-01 07:48:48交易成功1948山东省东营市1
3CDA数据分析atblovemyy2019-01-01 09:15:49付款以后用户退款成功,交易自动关闭840江苏省镇江市1
4CDA数据分析小星期鱼2019-01-01 09:59:33付款以后用户退款成功,交易自动关闭740上海上海市1

In [416]:


data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28833 entries, 0 to 28832
Data columns (total 9 columns):
品牌名称    28833 non-null object
买家昵称    28833 non-null object
付款日期    28833 non-null datetime64[ns]
订单状态    28833 non-null object
实付金额    28833 non-null int64
邮费      28833 non-null int64
省份      28833 non-null object
城市      28832 non-null object
购买数量    28833 non-null int64
dtypes: datetime64[ns](1), int64(3), object(5)
memory usage: 2.0+ MB

In [417]:

#仅当交易成功时 ,才有FRM模型
data = data[data.订单状态=='交易成功']
data.shape

Out[417]:

(27793, 9)

13.1 关键字段提取

In [418]:

data = data[['买家昵称','付款日期','实付金额','购买数量']]
data.head()

Out[418]:


买家昵称付款日期实付金额购买数量
0叫我李22019-01-01 00:17:591861
10cyb19922019-01-01 00:59:541451
2萝污萌莉2019-01-01 07:48:481941
5重碎叠2019-01-01 10:00:071971
6iho_jann2019-01-01 10:00:081681

13.1.1 R

In [419]:

# 当前观测时间(数据提取时间)假设 是2019-07-1
data.付款日期.max()

Out[419]:

Timestamp('2019-06-30 22:46:22.511000')

In [420]:


data.买家昵称.shape

Out[420]:

(27793,)

In [422]:

data.买家昵称.nunique()

Out[422]:

25420

In [423]:

#由于同一个客户对应不同的订单,选取客户付款日期的最大值
r = data.groupby('买家昵称')['付款日期'].max().reset_index()
r.head()

Out[423]:


买家昵称付款日期
0.blue_ram2019-02-04 17:49:34.000
1.christiny2019-01-29 14:17:15.000
2.willn12019-01-11 03:46:18.000
3.托托m2019-01-11 02:26:33.000
40000妮2019-06-28 16:53:26.458

In [424]:


# 时间日期格式可以直接做差值
r['R'] = (pd.to_datetime('2019-7-1') - r.付款日期).dt.days
r

Out[424]:


买家昵称付款日期R
0.blue_ram2019-02-04 17:49:34.000146
1.christiny2019-01-29 14:17:15.000152
2.willn12019-01-11 03:46:18.000170
3.托托m2019-01-11 02:26:33.000170
40000妮2019-06-28 16:53:26.4582
............
25415龙火师2019-04-07 08:43:00.00084
25416龙魔鬼女2019-04-19 22:14:24.00072
25417龟mil宝2019-06-19 04:26:30.58911
25418!谢鹏逗逼?2019-06-06 11:14:52.00024
25419~小邱~2019-01-23 23:51:51.457158

25420 rows × 3 columns

In [425]:

# 删除无用特征
r = r[['买家昵称','R']]
r.head()

Out[425]:


买家昵称R
0.blue_ram146
1.christiny152
2.willn1170
3.托托m170
40000妮2

13.1.2 F

In [426]:


f = data.groupby('买家昵称')['付款日期'].count().reset_index()
f.columns=['买家昵称','F']
f.head()

Out[426]:


买家昵称F
0.blue_ram1
1.christiny1
2.willn11
3.托托m1
40000妮1

13.1.3 M

In [428]:


# 总金额
m = data.groupby('买家昵称')['实付金额'].sum().reset_index()
m.columns=['买家昵称','M']
m.head()

Out[428]:


买家昵称M
0.blue_ram49
1.christiny183
2.willn134
3.托托m37
40000妮164

13.2 三张表合并

In [431]:

rfm = pd.merge(r,f)
rfm = pd.merge(rfm,m)

rfm

Out[431]:


买家昵称RFM
0.blue_ram146149
1.christiny1521183
2.willn1170134
3.托托m170137
40000妮21164
...............
25415龙火师841175
25416龙魔鬼女72187
25417龟mil宝112497
25418!谢鹏逗逼?241137
25419~小邱~1581185

25420 rows × 4 columns

13.3 维度打分

In [432]:


rfm.describe().T

Out[432]:


countmeanstdmin25%50%75%max
R25420.097.12041758.4508900.037.0105.0155.0180.0
F25420.01.0933520.3478381.01.01.01.015.0
M25420.0138.13158996.59229330.076.0124.0191.06091.0

In [433]:


# 采用5分制对rfm进行打分
# R 0-30-60-90-120-以上
rfm['R-score'] = pd.cut(rfm['R'],bins=[-1,30,60,90,120,180],labels=[5,4,3,2,1]).astype(float)
rfm.head()
 

Out[433]:


买家昵称RFMR-score
0.blue_ram1461491.0
1.christiny15211831.0
2.willn11701341.0
3.托托m1701371.0
40000妮211645.0

In [436]:


# F
rfm['F-score'] = pd.cut(rfm['F'],bins=[1,2,3,4,5,20],labels=[1,2,3,4,5],right=False).astype(float)
# M
rfm['M-score'] = pd.cut(rfm['M'],bins=[0,50,100,150,200,10000],labels=[1,2,3,4,5],right=False).astype(float)

In [437]:

rfm.head()

Out[437]:


买家昵称RFMR-scoreF-scoreM-score
0.blue_ram1461491.01.01.0
1.christiny15211831.01.04.0
2.willn11701341.01.01.0
3.托托m1701371.01.01.0
40000妮211645.01.04.0

In [434]:

rfm.F.value_counts()

Out[434]:

1     23367
2      1795
3       217
4        33
5         5
15        1
7         1
6         1
Name: F, dtype: int64

In [435]:

rfm.M.value_counts()

Out[435]:

87     1289
94      746
144     658
34      643
89      634
       ... 
451       1
483       1
515       1
531       1
687       1
Name: M, Length: 536, dtype: int64

13.3.1 第二轮打分:和第一轮分数平均值作比较

In [438]:

rfm['R是否大于均值'] = (rfm['R-score'] > rfm['R-score'].mean())*1
rfm['F是否大于均值'] = (rfm['F-score'] > rfm['F-score'].mean())*1
rfm['M是否大于均值'] = (rfm['M-score'] > rfm['M-score'].mean())*1

In [439]:

rfm

Out[439]:


买家昵称RFMR-scoreF-scoreM-scoreR是否大于均值F是否大于均值M是否大于均值
0.blue_ram1461491.01.01.0000
1.christiny15211831.01.04.0001
2.willn11701341.01.01.0000
3.托托m1701371.01.01.0000
40000妮211645.01.04.0101
.................................
25415龙火师8411753.01.04.0101
25416龙魔鬼女721873.01.02.0100
25417龟mil宝1124975.02.05.0111
25418!谢鹏逗逼?2411375.01.03.0100
25419~小邱~15811851.01.04.0001

25420 rows × 10 columns

13.4 客户分层

In [444]:


# 通过数值的加法rfm组合在一起
rfm['人群数值'] = rfm['R是否大于均值']*100+rfm['F是否大于均值']*10+rfm['M是否大于均值']
rfm.head()

Out[444]:


买家昵称RFMR-scoreF-scoreM-scoreR是否大于均值F是否大于均值M是否大于均值人群数值
0.blue_ram1461491.01.01.00000
1.christiny15211831.01.04.00011
2.willn11701341.01.01.00000
3.托托m1701371.01.01.00000
40000妮211645.01.04.0101101

In [443]:

rfm.head()

Out[443]:


买家昵称RFMR-scoreF-scoreM-scoreR是否大于均值F是否大于均值M是否大于均值
0.blue_ram1461491.01.01.0000
1.christiny15211831.01.04.0001
2.willn11701341.01.01.0000
3.托托m1701371.01.01.0000
40000妮211645.01.04.0101

13.5 基于人群的数值为客户打上标签

In [445]:

def trans(x):
    if x==111:
        label = '高价值客户'
    elif x==11:
        label = '重要召回客户'
    elif x==101:
        label = '重要发展客户'
    elif x==1:
        label = '重要挽留客户'
    elif x==110:
        label = '潜力客户'
    elif x==100:
        label = '新客户'
    elif x==10:
        label = '一般客户'
    elif x==0:
        label = '流失客户'
    return label

In [446]:

rfm['人群类型'] = rfm['人群数值'].apply(trans)
rfm

Out[446]:


买家昵称RFMR-scoreF-scoreM-scoreR是否大于均值F是否大于均值M是否大于均值人群数值人群类型
0.blue_ram1461491.01.01.00000流失客户
1.christiny15211831.01.04.00011重要挽留客户
2.willn11701341.01.01.00000流失客户
3.托托m1701371.01.01.00000流失客户
40000妮211645.01.04.0101101重要发展客户
.......................................
25415龙火师8411753.01.04.0101101重要发展客户
25416龙魔鬼女721873.01.02.0100100新客户
25417龟mil宝1124975.02.05.0111111高价值客户
25418!谢鹏逗逼?2411375.01.03.0100100新客户
25419~小邱~15811851.01.04.00011重要挽留客户

25420 rows × 12 columns

13.5.1 统计描述:人群统计、金额统计

In [447]:


count = rfm['人群类型'].value_counts().reset_index()
count.columns=['客户类型','人数']
count

Out[447]:


客户类型人数
0流失客户9217
1新客户6328
2重要挽留客户4486
3重要发展客户3336
4高价值客户1029
5重要召回客户604
6潜力客户271
7一般客户149

In [448]:


rfm['M']

Out[448]:

0         49
1        183
2         34
3         37
4        164
        ... 
25415    175
25416     87
25417    497
25418    137
25419    185
Name: M, Length: 25420, dtype: int64

In [449]:

money = rfm.groupby('人群类型')['M'].sum().reset_index()

In [450]:

money

Out[450]:


人群类型M
0一般客户16985
1新客户544621
2流失客户778540
3潜力客户28723
4重要发展客户695750
5重要召回客户178848
6重要挽留客户934340
7高价值客户333498


添加CDA认证专家【维克多阿涛】,微信号:【cdashijiazhuang】,提供数据分析指导及CDA考试秘籍。已助千人通过CDA数字化人才认证。欢迎交流,共同成长!
29.9467 2 0 关注作者 收藏

评论(0)


暂无数据

推荐课程