当前位置：首页 > news >正文

数据的分组聚合

news 2025/7/18 13:50:16

1：分组 t.groupby

#coding:utf-8
import pandas as pd
import numpy as np
file_path='./starbucks_store_worldwide.csv'
df=pd.read_csv(file_path)
#print(df.head(1))
#print(df.info())
grouped=df.groupby(by='Country')
print(grouped)
#DataFrameGroupBy
#可以遍历，也可以使用聚合方法

2:DataFrameGroupBy可以进行遍历

grouped=df.groupby(by='Country')
print(grouped)
#DataFrameGroupBy
#可以遍历for i, j in grouped:print(i)print('_'*100)print(j,type(j))print('*'*100)

3:DateFrameGroupBy可以聚合

print(grouped.count())，可以对grouped进行统计操作

country_count=grouped['Brand'].count()
print(country_count['CN'])
print(country_count['US'])

4:统计中国每个省份店铺的数量

#coding:utf-8
import pandas as pd
import numpy as np
file_path='./starbucks_store_worldwide.csv'
df=pd.read_csv(file_path)
china_date=df[df['Country']=='CN']
#print(china_date)
grouped=china_date.groupby(by='City').count()['Brand']
print(grouped)

5:按照多条件进行分组

#coding:utf-8
import pandas as pd
import numpy as np
file_path='./starbucks_store_worldwide.csv'
df=pd.read_csv(file_path)
china_date=df[df['Country']=='CN']
#print(china_date)
#grouped=china_date.groupby(by='City').count()['Brand']
grouped=df['Brand'].groupby(by=[df['Country'],df['State/Province']]).count()
print(grouped)
print(type(grouped))

6:df['Brand']和df[['Brand']]一个代表Series格式，一个代表DateFrame格式

#coding:utf-8
import pandas as pd
import numpy as np
file_path='./starbucks_store_worldwide.csv'
df=pd.read_csv(file_path)
china_date=df[df['Country']=='CN']
#print(china_date)
#grouped=china_date.groupby(by='City').count()['Brand']
grouped=df['Brand'].groupby(by=[df['Country'],df['State/Province']]).count()
print(grouped)
print(type(grouped))

7：索引和复合索引

#把某一列作为索引df.set_index

#重置索引 df.index=['x','y']

df1=pd.DataFrame(np.ones(8).reshape(2,4))
df1.index=['a','b']
# df1.reindex['a','f']
# print(df1)
df1.columns=['c','d','e','f']
#print(df1)
df2=df1.set_index('c')
print(df2)

df2=df1.set_index('c',drop=False)
#c不止是索引，仍然是列
print(df2)

#index.unique

df2=df1.set_index('c',drop=False).index.unique()print(df2)

#index是可迭代的对象，可以len( ),也可以list()

df2=len(df1.set_index('c',drop=False))
#c不止是索引，仍然是列
print(df2)
df2=list(df1.set_index('c',drop=False))
print(df2)

#设置2个列作为索引

 #设置两个列作为索引
df3=df1.set_index(['c','d'],drop='false')
print(df3)

#简单的索引操作

查看全文

http://www.lryc.cn/news/6193.html

【Airplay_BCT】Bonjour conformance tests苹果IOT

开发微服务电商项目演示（五）

Git删除大文件历史记录

Seata-Server分布式事务原理加源码(一) - 微服务之分布式事务原理

【ZooKeeper】zookeeper源码9-ZooKeeper读写流程源码分析

Python实现批量导入xlsx数据1000条

Ubuntu20.04安装redis与远程连接

剑指 Offer 56 - II. 数组中数字出现的次数 II

C语言学习笔记(八): 自定义数据类型

Video Speed Controller谷歌视频加速插件——16倍速

VSCode 的下载安装及基本使用

【操作系统】磁盘IO常见性能指标和分析工具实战

SpringMVC基础

低代码开发平台|制造管理-质检管理搭建指南

推荐一个.Ner Core开发的配置中心开源项目

Vue3+vite4使用mockjs进行模拟开发遇到的坑

一起Talk Android吧（第四百九十三回：动画知识总结)

腾讯云企业网盘正式入驻数字工具箱

2.13练习

【iOS】APP IM聊天框架的设计（基于第三方SDK）

centos安装FastDFS，集成到SpringBoot中

相关文章：