@创建于:.07.01
@修改于:.07.01,.07.09
文章目录
1、构建样例数据(无Null值和有Null值)2、agg常见聚合函数3、agg匿名聚合函数4、agg内的自定义方法5、transform应用6、apply应用7、参考链接1、构建样例数据(无Null值和有Null值)
import pandas as pdimport numpy as npdf = pd.DataFrame({'books':['book_1', 'book_2', 'book_2', 'book_1', 'book_2', 'book_3'],'price':[10, 20, 30, 30, 20, 10],'num':[4, 6, 2, 4, 2, 8]})dfbooks price num0 book_1 10 41 book_2 20 62 book_2 30 23 book_1 30 44 book_2 20 25 book_3 10 8
df2 = pd.DataFrame({'books':['book_1', 'book_1', 'book_2', 'book_2', 'book_2', 'book_3', 'book_4'],'price':[10, 20, 10, 30, None,10, np.NaN],'num':[2, 4, np.NaN, 2, 4, 8, 2]})df2books price num0 book_1 10.0 2.01 book_1 20.0 4.02 book_2 10.0 NaN3 book_2 30.0 2.04 book_2 NaN 4.05 book_3 10.0 8.06 book_4 NaN 2.0
2、agg常见聚合函数
f_agg = ['max','mean', 'sum']df.groupby('books')['price', 'num'].agg(func=f_agg)
3、agg匿名聚合函数
# 可以使用匿名聚合函数# 可以使用as_index=False是的分组字段不作为行索引df.groupby('books', as_index=False)['price', 'num'].agg(lambda x: x.eq(10).sum())books price num0 book_1 1 01 book_2 0 02 book_3 1 0
# 当agg中的聚合函数多于1个的时候,as_index失效df.groupby('books', as_index=False)['price'].agg([lambda x: x.eq(10).sum(), 'mean']).rename({'<lambda_0>':'eq10'}, axis=1)eq10 meanbooks book_1 1 20.000000book_2 0 23.333333book_3 1 10.000000
f_agg = ['max', lambda x: x.eq(10).sum(), 'mean', lambda x: x.isna().sum()]df.groupby('books')['price'].agg(func=f_agg).rename({'<lambda_0>':'eq10'}, axis=1)max eq10 mean <lambda_1>books book_1 30 1 20.000000 0book_2 30 0 23.333333 0book_3 10 1 10.000000 0
4、agg内的自定义方法
df2 = pd.DataFrame({'books':['book_1', 'book_1', 'book_2', 'book_2', 'book_2', 'book_3', 'book_4'],'price':[10, 20, 10, 30, None,10, np.NaN],'num':[2, 4, np.NaN, 2, 4, 8, 2]})def q25(x):return x.quantile(0.25)def q50(x):return x.quantile(0.50)def q75(x):return x.quantile(0.75)df2.groupby('books')['num'].agg(['mean', 'max', 'median', q25,q50, q75])mean max median q25 q50 q75books book_1 3.0 4.0 3.0 2.5 3.0 3.5book_2 3.0 4.0 3.0 2.5 3.0 3.5book_3 8.0 8.0 8.0 8.0 8.0 8.0book_4 2.0 2.0 2.0 2.0 2.0 2.0
参考链接:Pandas groupby quantile values
5、transform应用
df2.groupby('books').transform(lambda x: x.fillna(x.mean()))# 没有了books这一列price num0 10.0 2.01 20.0 4.02 10.0 3.03 30.0 2.04 20.0 4.05 10.0 8.06 NaN 2.0
6、apply应用
df2.groupby('books').apply(lambda x: x.fillna(x.mean()))
dft = df2.groupby('books').apply(lambda x: x.fillna(x.mean()))dft.index = dft.index.droplevel(0)dft
df2.groupby('books').transform(lambda x: x/x.max())df2.groupby('books').apply(lambda x: x/x.max())# 二者的结果是一致的,都没有books字段price num0 0.500000 0.51 1.000000 1.02 0.333333 NaN3 1.000000 0.54 NaN 1.05 1.000000 1.06 NaN 1.0
7、参考链接
Pandas教程 | 超好用的Groupby用法详解
DataFrame进行数据分组运算并筛选指定条件的group
Pandas groupby quantile values