Pandas基本数据结构之Series简介

云烟 • 2小时前 • 编程

欢迎大家来到IT世界,在知识的湖畔探索吧!

Pandas，Numpy和Matplotlib并称数据分析三剑客， Pandas是一款开源的，具有BSD协议许可，基于Numpy开发的用于数据分析的python工具包，Pandas提供了两种基本的数据结构，分别是Series（一维数据）和DataFrame（二维数据），这两种数据结构足以处理金融、统计、社会科学、工程等领域里的大多数典型用例。处理数据一般分为几个阶段：数据整理与清洗、数据分析与建模、数据可视化与报表，而Pandas 是处理数据的理想工具，并且具有较高的性能和较广泛的应用。本文主要简单介绍一下Pandas的两种基本的数据结构，包括基于这两类数据结构的数据类型、索引、对齐等操作。

类型	维数	描述
Series	1	带标签的一维同构数组
DataFrame	2	带标签的，大小可变的，二维异构表格

Series

Series是带index的一维数组，可存储整数、浮点数、字符串、Python 对象等类型的数据。轴标签统称为索引。调用 pd.Series 函数即可创建 Series，pandas支持从以下集中数据类型创建Series：

python数组
Python字典
标量

pandas创建Series的基本范式为：

s = pandas.Series(data,s = pandas.Series( data, index, dtype, copy)

欢迎大家来到IT世界,在知识的湖畔探索吧!

参数说明如下：

参数	描述
data	输入的数据，可以是列表、常量、ndarray 数组等。
index	索引值必须是惟一的，如果没有传递索引，则默认为 np.arrange(n)。
dtype	dtype表示数据类型，如果没有提供，则会自动判断得出。
copy	表示对 data 进行拷贝，默认为 False。

下面分别介绍使用不同的数据类型进行Series的创建。

使用字典创建Series

data 为字典，且未设置 index 参数时，Series 按字典的插入顺序排序索引，不会将字典中的key进行重新排序。如果设置了 index 参数，则按索引标签提取 data 里对应的值。如果index对应的key在字典中不存在，则当前索引对应的值设为默认值NaN。

欢迎大家来到IT世界,在知识的湖畔探索吧!In [28]: #使用python字典创建Series d = {'a': 1, 'c': 2, 'b': 3} s = pd.Series(d) s Out[28]: a 1 c 2 b 3 dtype: int64 In [29]: #如果index中指定了字典中不存在的index，则默认值为NaN(Not a Number) s = pd.Series(d, index=['a','b','c','d']) s Out[29]: a 1.0 b 2.0 c 3.0 d NaN dtype: float64

使用多维数组创建Series

data 是多维数组时，index的长度必须与 data 长度一致。若不指定 index 参数，则默认创建数值型索引，即 [0, …, len(data) – 1]。这里需要注意的是：pandas中的索引是可以重复的。

In [30]: #使用python数组创建Series,采用默认索引 d = np.random.randn(5) s = pd.Series(d) s Out[30]: 0 0. 1 0.089687 2 -0. 3 0. 4 0.024762 dtype: float64 In [32]: d = np.random.randn(5) s = pd.Series(d, index = ['a','b','c','d','e']) s #使用numpy数组创建Series，并指定索引 d = np.random.randn(5) s = pd.Series(d, index = ['a','b','c','d','e']) s Out[32]: a -0. b 0. c 0. d -0. e -0. dtype: float64 In [54]: #使用numpy数组创建Series，并指定重复索引 d = np.random.randn(5) s = pd.Series(d, index = ['a','a','c','c','e']) s Out[54]: a -0. a 0. c -1. c -0. e -0. dtype: float64

使用标量创建Series

data 是标量值时，Series 按索引长度重复该标量值。若不提供索引，则默认创建一个长度为1索引为0的Series。

欢迎大家来到IT世界,在知识的湖畔探索吧! In [33]: #使用标量创建Series，并指定索引 s = pd.Series(6, index = ['a','b','c','d','e']) s Out[33]: a 6 b 6 c 6 d 6 e 6 dtype: int64 In [35]: #使用标量创建Series，不指定索引 s = pd.Series(6) s Out[35]: 0 6 dtype: int64

操作Series

Series的操作与Numpy中的ndarray类似，支持大多数的numpy函数，还支持索引切片及矢量对齐

In [37]: d = np.random.randn(5) s = pd.Series(d, index = ['a','b','c','d','e']) s Out[37]: a 0. b 1. c -1. d 0.045794 e 0. dtype: float64 In [38]: s[0] Out[38]: 0.74275 In [39]: s[1:3] Out[39]: b 1. c -1. dtype: float64 In [40]: s[s > s.median()] Out[40]: a 0. b 1. dtype: float64 In [41]: s[[4,2,1]] Out[41]: e 0. c -1. b 1. dtype: float64 In [42]: s 2 Out[42]: a 0. b 1. c 2. d 0.002097 e 0.046236 dtype: float64 In [44]: s + 3 Out[44]: a 3. b 4. c 1. d 3.045794 e 3. dtype: float64 In [45]: np.exp(s) Out[45]: a 1. b 3. c 0. d 1.046858 e 1. dtype: float64 In [55]: s.to_numpy() Out[55]: array([-0., 0., -1., -0., -0.])

另外，Series还支持类似字典的操作。

欢迎大家来到IT世界,在知识的湖畔探索吧!In [58]: d = np.random.randn(5) s = pd.Series(d, index = ['a','b','c','d','e']) s Out[58]: a 0. b 1. c -0. d 0.003032 e 0. dtype: float64 In [60]: s['c'] Out[60]: -0.54173 In [61]: s['f'] -------------------------------------- KeyError: 'f' In [62]: s.get('a') Out[62]: 0.37758 In [64]: s.get('f', -1) Out[64]: -1 In [65]: 'a' in s Out[65]:True In [66]: 'f' in s Out[66]: False

另外，虽然Series支持类似字典的操作，但是由于Series中是允许存在重复key的，因此对重复的index对应的data进行操作时，该操作会应用到当前index所对应的所有data上。

In [67]: d = np.random.randn(5) s = pd.Series(d, index = ['a','a','a','d','d']) s Out[67]: a 0. a -0. a 0. d 0. d -0. dtype: float64 In [68]: s['a'] Out[68]: a 0. a -0. a 0. dtype: float64 In [69]: s['a'] = 3.14 In [70]: s Out[70]: a 3. a 3. a 3. d 0. d -0. dtype: float64

另外，pandas还提供了基于Series的聚合操作，场景的聚合操作行数有groupby、sum、mean等等。

欢迎大家来到IT世界,在知识的湖畔探索吧!In [76]: s Out[76]: a 3. a 3. a 3. d 0. d -0. dtype: float64 In [77]: s.mean() Out[77]: 1.9894 In [78]: s.sum() Out[78]: 9.947 In [79]: s.groupby(s.index).mean() Out[79]: a 3. d -0.007566 dtype: float64 In [80]: s.groupby(s.index).sum() Out[80]: a 9. d -0.015132 dtype: float64 In [81]: s.groupby(level=0).sum() Out[81]: a 9. d -0.015132 dtype: float64 In [83]: s.groupby(['1','0','1','0','1']).sum() Out[83]: 0 3. 1 6.094505 dtype: float64

OK，关于Series就先简单介绍这么多，后面有时间会简单介绍以下DataFrame相关的东西。

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://itzsg.com/104632.html

云烟 2023年 4月 25日

你这文采，堪称现代李白

评论于 Servlet 数据库访问[通俗易懂]
样 2023年 4月 25日

博主有大厦之将倾而面不改色，狂澜于既倒而稳如泰山只能。

评论于 Servlet 数据库访问[通俗易懂]
样 2023年 4月 25日

博主好文采

评论于古天乐拍戏误伤眼球缝八针！而他第一时间却只想给粉丝道歉[通俗易懂]
样 2023年 4月 25日

干货干货

评论于 UG编程，钻孔攻丝，铣螺纹，干货知识[亲测有效]
云烟 2023年 4月 25日

人家是chatGPT 不是GBT

评论于程序开发中MySql、SQLServer、SQLite数据库的使用场景及性能评测

Pandas基本数据结构之Series简介

Series

使用字典创建Series

使用多维数组创建Series

使用标量创建Series

操作Series

发表回复

联系我们YX

mu99908888

Pandas基本数据结构之Series简介

Series

使用字典创建Series

使用多维数组创建Series

使用标量创建Series

操作Series

相关推荐

发表回复

联系我们YX

mu99908888