dfply.set_ops

Module Contents

dfply.set_ops.validate_set_ops(df, other)

Helper function to ensure that DataFrames are valid for set operations. Columns must be the same name in the same order, and indices must be of the same dimension with the same names.

dfply.set_ops.union(df, other, index=False, keep='first')

Returns rows that appear in either DataFrame.

Args:

df (pandas.DataFrame): data passed in through the pipe. other (pandas.DataFrame): other DataFrame to use for set operation with

the first.
Kwargs:
index (bool): Boolean indicating whether to consider the pandas index
as part of the set operation (default False).
keep (str): Indicates which duplicate should be kept. Options are ‘first’
and ‘last’.
dfply.set_ops.intersect(df, other, index=False, keep='first')

Returns rows that appear in both DataFrames.

Args:

df (pandas.DataFrame): data passed in through the pipe. other (pandas.DataFrame): other DataFrame to use for set operation with

the first.
Kwargs:
index (bool): Boolean indicating whether to consider the pandas index
as part of the set operation (default False).
keep (str): Indicates which duplicate should be kept. Options are ‘first’
and ‘last’.
dfply.set_ops.set_diff(df, other, index=False, keep='first')

Returns rows that appear in the first DataFrame but not the second.

Args:

df (pandas.DataFrame): data passed in through the pipe. other (pandas.DataFrame): other DataFrame to use for set operation with

the first.
Kwargs:
index (bool): Boolean indicating whether to consider the pandas index
as part of the set operation (default False).
keep (str): Indicates which duplicate should be kept. Options are ‘first’
and ‘last’.