dfply.window_functions

Module Contents

dfply.window_functions.lead(series, i=1)

Returns a series shifted forward by a value. NaN values will be filled in the end.

Same as a call to series.shift(i)

Args:
series: column to shift forward. i (int): number of positions to shift forward.
dfply.window_functions.lag(series, i=1)

Returns a series shifted backwards by a value. NaN values will be filled in the beginning.

Same as a call to series.shift(-i)

Args:
series: column to shift backward. i (int): number of positions to shift backward.
dfply.window_functions.between(series, a, b, inclusive=False)

Returns a boolean series specifying whether rows of the input series are between values a and b.

Args:

series: column to compare, typically symbolic. a: value series must be greater than (or equal to if inclusive=True)

for the output series to be True at that position.
b: value series must be less than (or equal to if inclusive=True) for
the output series to be True at that position.
Kwargs:
inclusive (bool): If True, comparison is done with >= and <=.
If False (the default), comparison uses > and <.
dfply.window_functions.dense_rank(series, ascending=True)

Equivalent to series.rank(method=’dense’, ascending=ascending).

Args:
series: column to rank.
Kwargs:
ascending (bool): whether to rank in ascending order (default is True).
dfply.window_functions.min_rank(series, ascending=True)

Equivalent to series.rank(method=’min’, ascending=ascending).

Args:
series: column to rank.
Kwargs:
ascending (bool): whether to rank in ascending order (default is True).
dfply.window_functions.cumsum(series)

Calculates cumulative sum of values. Equivalent to series.cumsum().

Args:
series: column to compute cumulative sum for.
dfply.window_functions.cummean(series)

Calculates cumulative mean of values. Equivalent to series.expanding().mean().

Args:
series: column to compute cumulative mean for.
dfply.window_functions.cummax(series)

Calculates cumulative maximum of values. Equivalent to series.expanding().max().

Args:
series: column to compute cumulative maximum for.
dfply.window_functions.cummin(series)

Calculates cumulative minimum of values. Equivalent to series.expanding().min().

Args:
series: column to compute cumulative minimum for.
dfply.window_functions.cumprod(series)

Calculates cumulative product of values. Equivalent to series.cumprod().

Args:
series: column to compute cumulative product for.
dfply.window_functions.cumany(series)

Calculates cumulative any of values. Equivalent to series.expanding().apply(np.any).astype(bool).

Args:
series: column to compute cumulative any for.
dfply.window_functions.cumall(series)

Calculates cumulative all of values. Equivalent to series.expanding().apply(np.all).astype(bool).

Args:
series: column to compute cumulative all for.
dfply.window_functions.percent_rank(series, ascending=True)
dfply.window_functions.row_number(series, ascending=True)

Returns row number based on column rank Equivalent to series.rank(method=’first’, ascending=ascending).

Args:
series: column to rank.
Kwargs:
ascending (bool): whether to rank in ascending order (default is True).

Usage: diamonds >> head() >> mutate(rn=row_number(X.x))

carat cut color clarity depth table price x y z rn

0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43 2.0 1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31 1.0 2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31 3.0 3 0.29 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63 4.0 4 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75 5.0