2016-11-17发表2016-11-17更新开发7 分钟读完 (大约1033个字)

petl函数表

petl

文档地址： https://petl.readthedocs.io/en/latest/index.html

Extract/Load

可以读取 csv pickle text xml html json database excel array hdfs 等文件

Transfrom 转换

Basic

petl.transform.basics.

head(table, n=5) 取开始的行
tail(table, n=5) 取尾部的行
rowslice(table, *sliceargs) 对行做切片
cut(table, *args, **kwargs) 取指定列
cutout(table, *args, **kwargs) 排除指定列
movefield(table, field, index) 移动列到新的位置
cat(*tables, **kwargs) 连接两个表
stack(*tables, **kwargs) 连接两个表，不匹配表头
skipcomments(table, prefix) 跳过包含指定前缀的行
addfield(table, field, value=None, index=None, missing=None) 添加一个新的字段
addcolumn(table, field, col, index=None, missing=None) 添加新的数据列(和addfield有些相似)
addrownumbers(table, start=1, step=1, field=’row’) 添加行计数
addfieldusingcontext(table, field, query) 添加字段，query是一个回调，把相邻三行作为三个参数传入
annex(*tables, **kwargs) join 两个或多个表，按行连接

petl.transform.headers

rename(table, *args, **kwargs) 修改名称
setheader(table, header) 设置表头
extendheader(table, fields) 扩展表头（可以用来补全表头信息）
pushheader(table, header, *args) 设置表头（将第一行当中数据，而不是表头)
prefixheader(table, prefix) 为表头添加前缀
suffixheader(table, suffix) 为表头添加后缀
sortheader(table, reverse=False, missing=None) 排序
skip(table, n) 跳过行的项不为 n 的行

Converting

petl.transform.conversions

convert(table, *args, **kwargs) 转换（包括数据类型，格式，替换，lambda 等)
convertall(table, *args, **kwargs) 同convert
convertnumbers(table, strict=False, **kwargs) 转换为数值
replace(table, field, a, b, **kwargs) 替换
replaceall(table, a, b, **kwargs) 替换全部
format(table, field, fmt, **kwargs) 格式化
formatall(table, fmt, **kwargs) 格式化全部
interpolate(table, field, fmt, **kwargs) 插入
interpolateall(table, fmt, **kwargs)
update(table, field, value, **kwargs) 更新

Selecting rows

petl.transform.selects

select(table, *args, kwargs) 根据条件选择行（扩展出各种 select 的快捷方法
selectusingcontext(table, query) 基于上一行和下一行来选择数据
rowlenselect(table, n, complement=False) 选择长度为n的行
facet(table, key) 按指定的key分组，将返回值映射到字典
biselect(table, *args, **kwargs) 拆分成两个表，一个为选择的行，一个为抛弃的行

Regular expressions

petl.transform.regex

search(table, *args, **kwargs) 查询返回匹配行
searchcomplement(table, *args, **kwargs) 查询返回不匹配的行
sub(table, field, pattern, repl, count=0, flags=0) 替换
split(table, field, pattern, newfields=None, include_original=False, maxsplit=0, flags=0) 分割指定fields
capture(table, field, pattern, newfields=None, include_original=False, flags=0, fill=None) 分组，通过表达式分组来分割fields

Unpacking compound values

petl.transform.unpacks

unpack(table, field, newfields=None, include_original=False, missing=None) 拆分 list 或 tuples
unpackdict(table, field, keys=None, includeoriginal=False, samplesize=1000, missing=None) 拆分 dictionary

Transforming rows

petl.transform.maps

fieldmap(table, mappings=None, failonerror=False, errorvalue=None) 转换表，在输入与输出之间映射
rowmap(table, rowmapper, header, failonerror=False) 转换表，函数
rowmapmany(table, rowgenerator, header, failonerror=False) 转换表，将行映射到任意行
rowgroupmap(table, key, mapper, header=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Sorting

petl.transform.sorts

sort 排序
mergesort 多个表排序，输出一个表
issorted 判断是否为排序表

Joins

petl.transform.joins

join 根据指定的键连接表
leftjoin 左连接
lookupjoin 左连接，如果右表多，任意选择行
rightjoin 右连接
outerjoin 完全连接，保留所有，为空填充None
crossjoin 笛卡尔连接
antijoin 返回左表中没有在右表出现的行
unjoin 将一个表拆成两个表
hashjoin
hashleftjoin
hashlookupjoin
hashrightjoin
hashantijoin

Set operations

petl.transform.setops

complement 返回a中不在b中的行
diff 查找两个表之间的差异，返回两个表，分别为b表不在a中的，a表不在b中的
recordcomplement 查找不在b中的记录
recorddiff 查找两个表之间的差异
intersection 返回a也在b中的数据
hashcomplement
hashintersection

Deduplicating rows

petl.transform.dedup

duplicates 选出给定键具有重复的行
unique 选出给定键唯一行
conflicts 选出相同键值，但其他行不同的
distinct 返回表中不同行
isunique 判断是否唯一

Reducing rows (aggregation)

petl.transform.reductions

aggregate 聚集，对指定键做聚合
rowreduce 对行根据指定键做reduce
mergeduplicates 合并给定键下的重复行
merge 根据指定键合并表
fold python reduce 操作
groupcountdistinctvalues
groupselectfirst

‘’’
还有很多，不想写了。。。
‘’’

petl函数表

https://beixiu.net/dev/petl函数表/

作者

北嗅

发布于

2016-11-17

更新于

2016-11-17

许可协议

#petl etl

petl函数表

petl

Extract/Load

Transfrom 转换

Basic

Header

Converting

Selecting rows

Regular expressions

Unpacking compound values

Transforming rows

Sorting

Joins

Set operations

Deduplicating rows

Reducing rows (aggregation)

作者

发布于

更新于

许可协议

评论

分类

最新文章

归档

广告

标签

广告

链接