2020-08-21 阅读量: 804

pandas的RFM模型

扫码加入数据分析学习群

13 RFM 模型

In [413]:

import numpy as np

import pandas as pd

In [414]:

data = pd.read_excel('PYTHON-RFM实战数据.xlsx')

data.head()

Out[414]:

	品牌名称	买家昵称	付款日期	订单状态	实付金额	邮费	省份	城市	购买数量
0	CDA数据分析	叫我李2	2019-01-01 00:17:59	交易成功	186	6	上海	上海市	1
1	CDA数据分析	0cyb1992	2019-01-01 00:59:54	交易成功	145	0	广东省	广州市	1
2	CDA数据分析	萝污萌莉	2019-01-01 07:48:48	交易成功	194	8	山东省	东营市	1
3	CDA数据分析	atblovemyy	2019-01-01 09:15:49	付款以后用户退款成功，交易自动关闭	84	0	江苏省	镇江市	1
4	CDA数据分析	小星期鱼	2019-01-01 09:59:33	付款以后用户退款成功，交易自动关闭	74	0	上海	上海市	1

In [416]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28833 entries, 0 to 28832
Data columns (total 9 columns):
品牌名称    28833 non-null object
买家昵称    28833 non-null object
付款日期    28833 non-null datetime64[ns]
订单状态    28833 non-null object
实付金额    28833 non-null int64
邮费      28833 non-null int64
省份      28833 non-null object
城市      28832 non-null object
购买数量    28833 non-null int64
dtypes: datetime64[ns](1), int64(3), object(5)
memory usage: 2.0+ MB

In [417]:

#仅当交易成功时 ，才有FRM模型

data = data[data.订单状态=='交易成功']

data.shape

Out[417]:

(27793, 9)

13.1 关键字段提取

In [418]:

data = data[['买家昵称','付款日期','实付金额','购买数量']]

data.head()

Out[418]:

	买家昵称	付款日期	实付金额	购买数量
0	叫我李2	2019-01-01 00:17:59	186	1
1	0cyb1992	2019-01-01 00:59:54	145	1
2	萝污萌莉	2019-01-01 07:48:48	194	1
5	重碎叠	2019-01-01 10:00:07	197	1
6	iho_jann	2019-01-01 10:00:08	168	1

13.1.1 R

In [419]:

# 当前观测时间（数据提取时间）假设 是2019-07-1

data.付款日期.max()

Out[419]:

Timestamp('2019-06-30 22:46:22.511000')

In [420]:

data.买家昵称.shape

Out[420]:

(27793,)

In [422]:

data.买家昵称.nunique()

Out[422]:

In [423]:

#由于同一个客户对应不同的订单，选取客户付款日期的最大值

r = data.groupby('买家昵称')['付款日期'].max().reset_index()

r.head()

Out[423]:

	买家昵称	付款日期
0	.blue_ram	2019-02-04 17:49:34.000
1	.christiny	2019-01-29 14:17:15.000
2	.willn1	2019-01-11 03:46:18.000
3	.托托m	2019-01-11 02:26:33.000
4	0000妮	2019-06-28 16:53:26.458

In [424]:

# 时间日期格式可以直接做差值

r['R'] = (pd.to_datetime('2019-7-1') - r.付款日期).dt.days

Out[424]:

	买家昵称	付款日期	R
0	.blue_ram	2019-02-04 17:49:34.000	146
1	.christiny	2019-01-29 14:17:15.000	152
2	.willn1	2019-01-11 03:46:18.000	170
3	.托托m	2019-01-11 02:26:33.000	170
4	0000妮	2019-06-28 16:53:26.458	2
...	...	...	...
25415	龙火师	2019-04-07 08:43:00.000	84
25416	龙魔鬼女	2019-04-19 22:14:24.000	72
25417	龟mil宝	2019-06-19 04:26:30.589	11
25418	！谢鹏逗逼？	2019-06-06 11:14:52.000	24
25419	～小邱～	2019-01-23 23:51:51.457	158

25420 rows × 3 columns

In [425]:

# 删除无用特征

r = r[['买家昵称','R']]

r.head()

Out[425]:

	买家昵称	R
0	.blue_ram	146
1	.christiny	152
2	.willn1	170
3	.托托m	170
4	0000妮	2

13.1.2 F

In [426]:

f = data.groupby('买家昵称')['付款日期'].count().reset_index()

f.columns=['买家昵称','F']

f.head()

Out[426]:

	买家昵称	F
0	.blue_ram	1
1	.christiny	1
2	.willn1	1
3	.托托m	1
4	0000妮	1

13.1.3 M

In [428]:

# 总金额

m = data.groupby('买家昵称')['实付金额'].sum().reset_index()

m.columns=['买家昵称','M']

m.head()

Out[428]:

	买家昵称	M
0	.blue_ram	49
1	.christiny	183
2	.willn1	34
3	.托托m	37
4	0000妮	164

13.2 三张表合并

In [431]:

rfm = pd.merge(r,f)

rfm = pd.merge(rfm,m)

rfm

Out[431]:

	买家昵称	R	F	M
0	.blue_ram	146	1	49
1	.christiny	152	1	183
2	.willn1	170	1	34
3	.托托m	170	1	37
4	0000妮	2	1	164
...	...	...	...	...
25415	龙火师	84	1	175
25416	龙魔鬼女	72	1	87
25417	龟mil宝	11	2	497
25418	！谢鹏逗逼？	24	1	137
25419	～小邱～	158	1	185

25420 rows × 4 columns

13.3 维度打分

In [432]:

rfm.describe().T

Out[432]:

	count	mean	std	min	25%	50%	75%	max
R	25420.0	97.120417	58.450890	0.0	37.0	105.0	155.0	180.0
F	25420.0	1.093352	0.347838	1.0	1.0	1.0	1.0	15.0
M	25420.0	138.131589	96.592293	30.0	76.0	124.0	191.0	6091.0

In [433]:

# 采用5分制对rfm进行打分

# R 0-30-60-90-120-以上

rfm['R-score'] = pd.cut(rfm['R'],bins=[-1,30,60,90,120,180],labels=[5,4,3,2,1]).astype(float)

rfm.head()

Out[433]:

	买家昵称	R	F	M	R-score
0	.blue_ram	146	1	49	1.0
1	.christiny	152	1	183	1.0
2	.willn1	170	1	34	1.0
3	.托托m	170	1	37	1.0
4	0000妮	2	1	164	5.0

In [436]:

# F

rfm['F-score'] = pd.cut(rfm['F'],bins=[1,2,3,4,5,20],labels=[1,2,3,4,5],right=False).astype(float)

# M

rfm['M-score'] = pd.cut(rfm['M'],bins=[0,50,100,150,200,10000],labels=[1,2,3,4,5],right=False).astype(float)

In [437]:

rfm.head()

Out[437]:

	买家昵称	R	F	M	R-score	F-score	M-score
0	.blue_ram	146	1	49	1.0	1.0	1.0
1	.christiny	152	1	183	1.0	1.0	4.0
2	.willn1	170	1	34	1.0	1.0	1.0
3	.托托m	170	1	37	1.0	1.0	1.0
4	0000妮	2	1	164	5.0	1.0	4.0

In [434]:

rfm.F.value_counts()

Out[434]:

1     23367
2      1795
3       217
4        33
5         5
15        1
7         1
6         1
Name: F, dtype: int64

In [435]:

rfm.M.value_counts()

Out[435]:

87     1289
94      746
144     658
34      643
89      634
       ... 
451       1
483       1
515       1
531       1
687       1
Name: M, Length: 536, dtype: int64

13.3.1 第二轮打分：和第一轮分数平均值作比较

In [438]:

rfm['R是否大于均值'] = (rfm['R-score'] > rfm['R-score'].mean())*1

rfm['F是否大于均值'] = (rfm['F-score'] > rfm['F-score'].mean())*1

rfm['M是否大于均值'] = (rfm['M-score'] > rfm['M-score'].mean())*1

In [439]:

rfm

Out[439]:

	买家昵称	R	F	M	R-score	F-score	M-score	R是否大于均值	F是否大于均值	M是否大于均值
0	.blue_ram	146	1	49	1.0	1.0	1.0	0	0	0
1	.christiny	152	1	183	1.0	1.0	4.0	0	0	1
2	.willn1	170	1	34	1.0	1.0	1.0	0	0	0
3	.托托m	170	1	37	1.0	1.0	1.0	0	0	0
4	0000妮	2	1	164	5.0	1.0	4.0	1	0	1
...	...	...	...	...	...	...	...	...	...	...
25415	龙火师	84	1	175	3.0	1.0	4.0	1	0	1
25416	龙魔鬼女	72	1	87	3.0	1.0	2.0	1	0	0
25417	龟mil宝	11	2	497	5.0	2.0	5.0	1	1	1
25418	！谢鹏逗逼？	24	1	137	5.0	1.0	3.0	1	0	0
25419	～小邱～	158	1	185	1.0	1.0	4.0	0	0	1

25420 rows × 10 columns

13.4 客户分层

In [444]:

# 通过数值的加法rfm组合在一起

rfm['人群数值'] = rfm['R是否大于均值']*100+rfm['F是否大于均值']*10+rfm['M是否大于均值']

rfm.head()

Out[444]:

	买家昵称	R	F	M	R-score	F-score	M-score	R是否大于均值	M是否大于均值	人群数值
0	.blue_ram	146	1	49	1.0	1.0	1.0	0	0	0
1	.christiny	152	1	183	1.0	1.0	4.0	0	1	1
2	.willn1	170	1	34	1.0	1.0	1.0	0	0	0
3	.托托m	170	1	37	1.0	1.0	1.0	0	0	0
4	0000妮	2	1	164	5.0	1.0	4.0	1	1	101

In [443]:

rfm.head()

Out[443]:

	买家昵称	R	F	M	R-score	F-score	M-score	R是否大于均值	M是否大于均值
0	.blue_ram	146	1	49	1.0	1.0	1.0	0	0
1	.christiny	152	1	183	1.0	1.0	4.0	0	1
2	.willn1	170	1	34	1.0	1.0	1.0	0	0
3	.托托m	170	1	37	1.0	1.0	1.0	0	0
4	0000妮	2	1	164	5.0	1.0	4.0	1	1

13.5 基于人群的数值为客户打上标签

In [445]:

def trans(x):

    if x==111:

        label = '高价值客户'

    elif x==11:

        label = '重要召回客户'

    elif x==101:

        label = '重要发展客户'

    elif x==1:

        label = '重要挽留客户'

    elif x==110:

        label = '潜力客户'

    elif x==100:

        label = '新客户'

    elif x==10:

        label = '一般客户'

    elif x==0:

        label = '流失客户'

    return label

In [446]:

rfm['人群类型'] = rfm['人群数值'].apply(trans)

rfm

Out[446]:

	买家昵称	R	F	M	R-score	F-score	M-score	R是否大于均值	F是否大于均值	M是否大于均值	人群数值	人群类型
0	.blue_ram	146	1	49	1.0	1.0	1.0	0	0	0	0	流失客户
1	.christiny	152	1	183	1.0	1.0	4.0	0	0	1	1	重要挽留客户
2	.willn1	170	1	34	1.0	1.0	1.0	0	0	0	0	流失客户
3	.托托m	170	1	37	1.0	1.0	1.0	0	0	0	0	流失客户
4	0000妮	2	1	164	5.0	1.0	4.0	1	0	1	101	重要发展客户
...	...	...	...	...	...	...	...	...	...	...	...	...
25415	龙火师	84	1	175	3.0	1.0	4.0	1	0	1	101	重要发展客户
25416	龙魔鬼女	72	1	87	3.0	1.0	2.0	1	0	0	100	新客户
25417	龟mil宝	11	2	497	5.0	2.0	5.0	1	1	1	111	高价值客户
25418	！谢鹏逗逼？	24	1	137	5.0	1.0	3.0	1	0	0	100	新客户
25419	～小邱～	158	1	185	1.0	1.0	4.0	0	0	1	1	重要挽留客户

25420 rows × 12 columns

13.5.1 统计描述：人群统计、金额统计

In [447]:

count = rfm['人群类型'].value_counts().reset_index()

count.columns=['客户类型','人数']

count

Out[447]:

	客户类型	人数
0	流失客户	9217
1	新客户	6328
2	重要挽留客户	4486
3	重要发展客户	3336
4	高价值客户	1029
5	重要召回客户	604
6	潜力客户	271
7	一般客户	149

In [448]:

rfm['M']

Out[448]:

0         49
1        183
2         34
3         37
4        164
        ... 
25415    175
25416     87
25417    497
25418    137
25419    185
Name: M, Length: 25420, dtype: int64

In [449]:

money = rfm.groupby('人群类型')['M'].sum().reset_index()

In [450]:

money

Out[450]:

	人群类型	M
0	一般客户	16985
1	新客户	544621
2	流失客户	778540
3	潜力客户	28723
4	重要发展客户	695750
5	重要召回客户	178848
6	重要挽留客户	934340
7	高价值客户	333498