PairGroupedUniverse#
API documentation for tradingstrategy.utils.groupeduniverse.PairGroupedUniverse Python class in Trading Strategy framework.
- class PairGroupedUniverse[source]#
Bases:
object
A base class for manipulating columnar price/liquidity data by a pair.
The server streams the data for all pairs in a single continuous time-indexed format. For most the use cases, we want to look up and manipulate data by pairs. To achieve this, we use Pandas
pd.GroupBy
and recompile the data on the client side.This works for
OHLCV candles
Liquidity candles
Lending reserves (one PairGroupedUniverse per each metric like supply APR and borrow APR)
The input
pd.DataFrame
is sorted by default using timestamp column and then made this column as an index. This is not optimised (not inplace).See also
- __init__(df, time_bucket=TimeBucket.d1, timestamp_column='timestamp', index_automatically=True, fix_wick_threshold=(0.1, 1.9), fix_inbetween_threshold=(- 0.99, 5.0), primary_key_column='pair_id', remove_candles_with_zero_volume=True, forward_fill=False, bad_open_close_threshold=3.0, autoheal_pair_limit=200, forward_fill_until=None)[source]#
Set up new candle universe where data is grouped by trading pair.
- Parameters:
df (DataFrame) – DataFrame backing the data.
time_bucket (TimeBucket) –
What bar size candles we are operating at. Default to daily.
TODO: Currently not used. Will be removed in the future versions.
timestamp_column (str) – What column use to build a time index. Used for QStrader / Backtrader compatibility.
index_automatically (bool) – Convert the index to use time series. You might avoid this with QSTrader kind of data.
fix_wick_threshold (tuple | None) –
Apply abnormal high/low wick fix filter.
Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.
See
fix_bad_wicks()
for more information.bad_open_close_threshold (float | None) – See
fix_bad_wicks()
.primary_key_column (str) – The pair/reserve id column name in the dataframe.
remove_zero_candles –
Remove candles with zero values for OHLC.
To deal with abnormal data.
forward_fill (bool) –
Forward-will gaps in the data.
See forward fill and forward filling data for more information.
autoheal_pair_limit –
Don’t try to autoheal data if the candle universe is too large.
Autohealing is very taxing operation and should not be performed on large universes. Instead you should preprocess the universe to a candles Parquet file and load directly from there.
autoheal_limit – If we have more than
fix_inbetween_threshold (tuple | None) –
remove_candles_with_zero_volume (bool) –
forward_fill_until (datetime.datetime | pandas._libs.tslibs.timestamps.Timestamp | None) –
Methods
__init__
(df[, time_bucket, ...])Set up new candle universe where data is grouped by trading pair.
Clear candles cached by pair.
Construct universe based on multiple trading pairs.
create_from_single_pair_dataframe
(df[, bucket])Construct universe based on a single trading pair data.
forward_fill
([columns, drop_other_columns])Forward-fill sparse OHLCV candle data.
get_all_pairs
([max_count])Go through all liquidity samples, one DataFrame per trading pair.
get_all_samples_by_range
(start, end)Get list of candles/samples for all pairs at a certain range.
Get list of candles/samples for all pairs at a certain timepoint.
Get column names from the underlying pandas.GroupBy object
get_last_entries_by_pair_and_timestamp
(pair, ...)Get samples for a single pair before a timestamp.
Return the number of pairs in this dataset.
Get all pairs present in the dataset
Get the first timestamp in the index that is before the given timestamp.
Return the dataset size - how many samples total for all pairs
get_samples_by_pair
(pair_id)Get samples for a single pair.
get_single_pair_data
([timestamp, ...])Get all candles/liquidity samples for the single alone pair in the universe by a certain timestamp.
get_single_value
(asset_id, when, ...[, ...])Get a single value for a single pair/asset at a specific point of time.
get_timestamp_range
([use_timezone])Return the time range of data we have for.
iterate_samples_by_pair_range
(start, end)Get list of candles/samples for all pairs at a certain range.
Attributes
Grouped DataFrame cache for faster lookup
- __init__(df, time_bucket=TimeBucket.d1, timestamp_column='timestamp', index_automatically=True, fix_wick_threshold=(0.1, 1.9), fix_inbetween_threshold=(- 0.99, 5.0), primary_key_column='pair_id', remove_candles_with_zero_volume=True, forward_fill=False, bad_open_close_threshold=3.0, autoheal_pair_limit=200, forward_fill_until=None)[source]#
Set up new candle universe where data is grouped by trading pair.
- Parameters:
df (DataFrame) – DataFrame backing the data.
time_bucket (TimeBucket) –
What bar size candles we are operating at. Default to daily.
TODO: Currently not used. Will be removed in the future versions.
timestamp_column (str) – What column use to build a time index. Used for QStrader / Backtrader compatibility.
index_automatically (bool) – Convert the index to use time series. You might avoid this with QSTrader kind of data.
fix_wick_threshold (tuple | None) –
Apply abnormal high/low wick fix filter.
Percent value of maximum allowed high/low wick relative to close. By default fix values where low is 90% lower than close and high is 90% higher than close.
See
fix_bad_wicks()
for more information.bad_open_close_threshold (float | None) – See
fix_bad_wicks()
.primary_key_column (str) – The pair/reserve id column name in the dataframe.
remove_zero_candles –
Remove candles with zero values for OHLC.
To deal with abnormal data.
forward_fill (bool) –
Forward-will gaps in the data.
See forward fill and forward filling data for more information.
autoheal_pair_limit –
Don’t try to autoheal data if the candle universe is too large.
Autohealing is very taxing operation and should not be performed on large universes. Instead you should preprocess the universe to a candles Parquet file and load directly from there.
autoheal_limit – If we have more than
fix_inbetween_threshold (tuple | None) –
remove_candles_with_zero_volume (bool) –
forward_fill_until (datetime.datetime | pandas._libs.tslibs.timestamps.Timestamp | None) –
- get_columns()[source]#
Get column names from the underlying pandas.GroupBy object
- Return type:
Index
- get_sample_count()[source]#
Return the dataset size - how many samples total for all pairs
- Return type:
- get_pair_count()[source]#
Return the number of pairs in this dataset.
TODO: Rename. Also used by lending reserves, and this then refers to count of reserves, not pairs.
- Return type:
- get_samples_by_pair(pair_id)[source]#
Get samples for a single pair.
After the samples have been extracted, set timestamp as the index for the data.
- get_last_entries_by_pair_and_timestamp(pair, timestamp, small_time=Timedelta('0 days 00:00:01'))[source]#
Get samples for a single pair before a timestamp.
Return a DataFrame slice containing all datapoints before the timestamp.
We assume timestamp is current decision frame. E.g. for daily close data return the previous day close to prevent any lookahead bias.
- Parameters:
pair_id – Integer id for a trading pair
timestamp (pandas._libs.tslibs.timestamps.Timestamp | datetime.datetime) – Get all samples excluding this timestamp.
pair (tradingstrategy.pair.DEXPair | int) –
- Returns:
Dataframe that contains samples for a single trading pair.
Indexed by timestamp.
- Raises:
KeyError – If we do not have data for pair_id
- Return type:
DataFrame
- get_all_pairs(max_count=None)[source]#
Go through all liquidity samples, one DataFrame per trading pair.
- get_all_samples_by_timestamp(ts)[source]#
Get list of candles/samples for all pairs at a certain timepoint.
- Raises:
KeyError – The universe does not contain a sample for a given timepoint
- Returns:
A DataFrame that contains candles/samples at the specific timeout
- Parameters:
ts (Timestamp) –
- Return type:
DataFrame
- get_all_samples_by_range(start, end)[source]#
Get list of candles/samples for all pairs at a certain range.
Useful to get the last few samples for multiple pairs.
Example:
# Set up timestamps for 3 weeks range, one week in middle end = Timestamp('2021-10-25 00:00:00') start = Timestamp('2021-10-11 00:00:00') middle = start + (end - start) / 2 # Get weekly candles raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas() candle_universe = GroupedCandleUniverse(raw_candles) candles = candle_universe.get_all_samples_by_range(start, end) # We have pair data for 3 different weeks assert len(candles.index.unique()) == 3 # Each week has its of candles broken down by a pair # and can be unique addressed by their pair_id assert len(candles.loc[start]) >= 1000 assert len(candles.loc[middle]) >= 1000 assert len(candles.loc[end]) >= 1000
- Parameters:
start (Timestamp) – start of the range (inclusive)
end (Timestamp) – end of the range (inclusive)
- Returns:
A DataFrame that contains candles/samples for all pairs at the range.
- Return type:
DataFrame
- iterate_samples_by_pair_range(start, end)[source]#
Get list of candles/samples for all pairs at a certain range.
Useful to get the last few samples for multiple pairs.
Example:
raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas() candle_universe = GroupedCandleUniverse(raw_candles) # Calibrate our week random_date = pd.Timestamp("2021-10-29") end = candle_universe.get_prior_timestamp(random_date) assert end == pd.Timestamp("2021-10-25") # Because we ar using weekly candles, # and start and end are inclusive endpoints, # we should get 3 weeks of samples start = pd.Timestamp(end) - pd.Timedelta(weeks=2) for pair_id, pair_df in candle_universe.iterate_samples_by_pair_range(start, end): # Because of missing samples, some pairs may have different ranges. # In this example, we iterate 3 weeks ranges, so we can have # 1, 2 or 3 weekly candles. # If there was no data at all pair_id is not present in the result. range_start = pair_df.index[0] range_end = pair_df.index[-1] assert range_start <= range_end # Calculate the momentum for the full range of all samples first_candle = pair_df.iloc[0] last_candle = pair_df.iloc[-1] # Calculate momentum = (last_candle["close"] - first_candle["open"]) / first_candle["open"] - 1
- Parameters:
start (Timestamp) – start of the range (inclusive)
end (Timestamp) – end of the range (inclusive)
- Returns:
DataFrame.groupby result
- Return type:
DataFrame
- get_timestamp_range(use_timezone=False)[source]#
Return the time range of data we have for.
Note
Because we assume multipair data, the data is grouped by and not indexed as time series. Thus, this function can be a slow operation.
- Parameters:
use_timezone –
The resulting timestamps will have their timezone set to UTC. If not set then naive timestamps are generated.
Legacy option. Do not use.
- Returns:
(start timestamp, end timestamp) tuple, UTC-timezone aware
If the data frame is empty, return None, None.
- Return type:
- get_prior_timestamp(ts)[source]#
Get the first timestamp in the index that is before the given timestamp.
This allows us to calibrate weekly/4 hours/etc. indexes to any given time..
Example:
raw_candles = client.fetch_all_candles(TimeBucket.d7).to_pandas() candle_universe = GroupedCandleUniverse(raw_candles) # Calibrate our week random_date = pd.Timestamp("2021-10-29") weekly_ts_before = candle_universe.get_prior_timestamp(random_date) assert weekly_ts_before == pd.Timestamp("2021-10-25")
- Returns:
Any timestamp from the index that is before or at the same time of the given timestamp.
- Parameters:
ts (Timestamp) –
- Return type:
Timestamp
- get_single_pair_data(timestamp=None, sample_count=None, allow_current=False, raise_on_not_enough_data=True, time_range_epsilon_seconds=0.5)[source]#
Get all candles/liquidity samples for the single alone pair in the universe by a certain timestamp.
A shortcut method for trading strategies that trade only one pair. Designed to be backtesting and live trading friendly function to access candle data.
Example:
Note
By default get_single_pair_da ta() returns the candles prior to the timestamp, the behavior can be changed with get_single_pair_data(allow_current=True). At the start of the backtest, we do not have any previous candle available yet, so this function may raise
NoDataAvailable
.- Parameters:
timestamp (Optional[Timestamp]) – Get the sample until this timestamp and all previous samples.
allow_current –
Allow to read any candle precisely at the timestamp. If you read the candle of your current strategy cycle timestamp, bad things may happen.
In backtesting, reading the candle at the current timestamp introduces forward-looking bias. In live trading, reading the candle at the current timestamp may give you no candle or an incomplete candle (trades are still piling up on it).
sample_count (Optional[int]) –
Minimum candle/liquidity sample count needed.
Limit the returned number of candles N candles before the timestamp.
If the data does not have enough samples before timestamp, then raise
NoDataAvailable
.raise_on_not_enough_data –
Raise an error if no data is available.
This can be e.g. because the trading pair has
time_range_epsilon_seconds – The time delta epsilon we use to determine between “current” and “previous” candle.
- Raises:
NoDataAvailable –
Raised when there is no data available at the range.
Set fail_on_empty=False to return an empty DataFrame instead.
- Return type:
DataFrame
- get_single_value(asset_id, when, data_lag_tolerance, kind='close', asset_name=None, link=None)[source]#
Get a single value for a single pair/asset at a specific point of time.
The data may be sparse data. There might not be sample available in the same time point or immediate previous time point. In this case the method looks back for the previous data point within tolerance time range.
This method should be relative fast and optimised for any price, volume and liquidity queries.
Example:
# TODO
- Parameters:
asset_id (int) – Trading pair id
when (pandas._libs.tslibs.timestamps.Timestamp | datetime.datetime) – Timestamp to query
kind – One of OHLC data points: “open”, “close”, “low”, “high”
tolerance – If there is no liquidity sample available at the exact timepoint, look to the past to the get the nearest sample. For example if candle time interval is 5 minutes and look_back_timeframes is 10, then accept a candle that is maximum of 50 minutes before the timepoint.
asset_name (str | None) –
Used in exception messages.
If not given use
asset_id
.link (str | None) –
Link to the asset page.
Used in exception messages.
If not given use
<link unavailable>
.data_lag_tolerance (Timedelta) –
- Returns:
Return (value, delay) tuple.
We always return a value. In the error cases an exception is raised. The delay is the timedelta between the wanted timestamp and the actual timestamp of the sampled value.
Candles are always timestamped by their opening.
- Raises:
NoDataAvailable – There were no samples available with the given condition.
- Return type:
- forward_fill(columns=('open', 'close'), drop_other_columns=True)[source]#
Forward-fill sparse OHLCV candle data.
Forward fills the missing candle values for non-existing candles. Trading Strategy data does not have candles unless there was actual trades happening at the markets.
See
tradingstrategy.utils.forward_fill
for details.Note
Does not touch the original self.df DataFrame any way. Only self.pairs is modified with forward-filled data.
- Parameters:
Columns to fill.
To save memory and speed, only fill the columns you need. Usually open and close are enough and also filled by default.
drop_other_columns –
Remove other columns before forward-fill to save memory.
The resulting DataFrame will only have columns listed in columns parameter.
The removed columns include ones like high and low, but also Trading Strategy specific columns like start_block and end_block. It’s unlikely we are going to need forward-filled data in these columns.
- classmethod create_from_single_pair_dataframe(df, bucket=None)[source]#
Construct universe based on a single trading pair data.
Useful for synthetic data/testing.
- Parameters:
df (DataFrame) –
bucket (tradingstrategy.timebucket.TimeBucket | None) –
- Return type:
- classmethod create_from_multiple_candle_dataframes(dfs, autoheal_pair_limit=200)[source]#
Construct universe based on multiple trading pairs.
Useful for synthetic data/testing.
- Parameters:
dfs (Iterable[DataFrame]) – List of dataframes/series where each trading pair is as isolated OHLCV data feed.
- Return type: