dfply.vector¶
Module Contents¶
-
dfply.vector.order_series_by(series, order_series)¶ Orders one series according to another series, or a list of other series. If a list of other series are specified, ordering is done hierarchically like when a list of columns is supplied to .sort_values().
- Args:
series (
pandas.Series): the pandas Series object to be reordered. order_series: either a pandas Series object or a list of pandas Seriesobjects. These will be sorted using .sort_values() with ascending=True, and the new order will be used to reorder the Series supplied in the first argument.- Returns:
- reordered pandas.Series object
-
dfply.vector.desc(series)¶ Mimics the functionality of the R desc function. Essentially inverts a series object to make ascending sort act like descending sort.
- Args:
- series (
pandas.Series): pandas series to be inverted prior to - ordering/sorting.
- series (
- Returns:
- inverted pandas.Series. The returned series will be numeric (integers),
- regardless of the type of the original series.
Example:
First group by cut, then find the first value of price when ordering by price ascending, and ordering by price descending using the desc function.
- diamonds >> group_by(X.cut) >> summarize(carat_low=first(X.price, order_by=X.price),
- carat_high=first(X.price, order_by=desc(X.price)))
cut carat_high carat_low
0 Fair 18574 337 1 Good 18788 327 2 Ideal 18806 326 3 Premium 18823 326 4 Very Good 18818 336
-
dfply.vector.coalesce(*series)¶ Takes the first non-NaN value in order across the specified series, returning a new series. Mimics the coalesce function in dplyr and SQL.
- Args:
- *series: Series objects, typically represented in their symbolic form
- (like X.series).
- Example:
- df = pd.DataFrame({
- ‘a’:[1,np.nan,np.nan,np.nan,np.nan], ‘b’:[2,3,np.nan,np.nan,np.nan], ‘c’:[np.nan,np.nan,4,5,np.nan], ‘d’:[6,7,8,9,np.nan]
}) df >> transmute(coal=coalesce(X.a, X.b, X.c, X.d))
coal0 1 1 3 2 4 3 5 4 np.nan
-
dfply.vector.case_when(*conditions)¶ Functions as a switch statement, creating a new series out of logical conditions specified by 2-item lists where the left-hand item is the logical condition and the right-hand item is the value where that condition is true.
Conditions should go from the most specific to the most general. A conditional that appears earlier in the series will “overwrite” one that appears later. Think of it like a series of if-else statements.
The logicals and values of the condition pairs must be all the same length, or length 1. Logicals can be vectors of booleans or a single boolean (True, for example, can be the logical statement for the final conditional to catch all remaining.).
- Args:
- *conditions: Each condition should be a list with two values. The first
- value is a boolean or vector of booleans that specify indices in which the condition is met. The second value is a vector of values or single value specifying the outcome where that condition is met.
- Example:
- df = pd.DataFrame({
- ‘num’:np.arange(16)
}) df >> mutate(strnum=case_when([X.num % 15 == 0, ‘fizzbuzz’],
[X.num % 3 == 0, ‘fizz’], [X.num % 5 == 0, ‘buzz’], [True, X.num.astype(str)]))num strnum
0 0 fizzbuzz 1 1 1 2 2 2 3 3 fizz 4 4 4 5 5 buzz 6 6 fizz 7 7 7 8 8 8 9 9 fizz 10 10 buzz 11 11 11 12 12 fizz 13 13 13 14 14 14 15 15 fizzbuzz
-
dfply.vector.if_else(condition, when_true, otherwise)¶ Wraps creation of a series based on if-else conditional logic into a function call.
Provide a boolean vector condition, value(s) when true, and value(s) when false, and a vector will be returned the same length as the conditional vector according to the logical statement.
- Args:
- condition: A boolean vector representing the condition. This is often
- a logical statement with a symbolic series.
- when_true: A vector the same length as the condition vector or a single
- value to apply when the condition is True.
- otherwise: A vector the same length as the condition vector or a single
- value to apply when the condition is False.
Example: df = pd.DataFrame