Hyper Optimized Algorithmic Strategy Vs/+ Machine Learning Models Part -1 (K-Nearest Neighbors)
How useful is a Machine Learning Model for trading? A practical approach
Introduction
Welcome back, fellow traders and enthusiasts! In the previous expedition through the realms of algorithmic trading, we explored the thrilling landscape from crafting strategies to hyper-optimization, witnessing jaw-dropping profits in backtesting. Now, buckle up for the next leg of our journey, where we delve into the intricate world of K-Nearest Neighbors (KNN). Today, I unveil how KNN acted as our compass, leading us to segregate 138 crypto futures on Binance into five distinct categories based on volatility. Get ready to witness the magic unfold and discover how this strategic classification sets the stage for even greater adventures with Hidden Markov Models (HMM) and beyond.
Previous Article Link:
“How I achieved 3000+% Profit in Backtesting for Various Algorithmic Trading Bots and how you can do the same for your Trading Strategies — Using Python Code” — Link
Previous posts related to Algorithmic Trading and development of strategies for cryptocurrency markets (can be adoptable for other markets too)
1st Edition: “Unlock 4450% Profit with Algorithm Trading on Cryptocurrency: A Freqtrade Case Study” — Link
2nd Edition: “2509% Profit Unlocked: A Case Study on Algorithmic Trading with Freqtrade” — LINK
3rd Edition: “Unleashing the Power: Unveiling a 10,000%+ Profit Surge in 2.5 Years with Advanced Cryptocurrency Algorithmic Trading Using Freqtrade” — LINK
4th Edition: “Unraveling the Cryptocurrency Market: How Pivot Points and Price Action Led to 6204%+ Profit in Backtesting using Freqtrade ” — Link
5th Edition: “ Whooping 3202%+ profit with Famous UTBot Alerts from TradingView using Python on Freqtrade” — Link
6th Edition: “Unlocking 3106+% Profits Using Algorithmic Trading on 130+ Crypto Assets! — From Pine Code to Python” — Link
7th Edition: “The 8787%+ ROI Algo Strategy Unveiled for Crypto Futures! Revolutionized With Famous RSI, MACD, Bollinger Bands, ADX, EMA” — Link
Basics of Machine Learning: Unveiling the Pillars of Algorithmic Mastery
Welcome, aspiring traders and curious minds! Before we delve into the intricate dance of K-Nearest Neighbors (KNN) and its transformative impact on our trading strategy, let’s take a moment to illuminate the fundamental concepts that underpin the realm of machine learning.
1. Supervised Learning:
In the vast landscape of machine learning, supervised learning stands as a beacon of guidance. Imagine it as a wise mentor, fed with historical data, meticulously teaching our algorithms the ropes. In this paradigm, the algorithm is provided with a set of labeled examples, mapping inputs to desired outputs. Much like a seasoned trader teaching a novice, the algorithm learns to make predictions or decisions based on this curated knowledge. Our journey into algorithmic mastery often begins with this structured form of learning.
2. Unsupervised Learning:
Now, picture the intrigue of unsupervised learning as a mysterious expedition into uncharted territories. In this domain, our algorithms step into the unknown without a guide, tasked with extracting patterns and relationships from raw data. Think of it as the algorithm exploring the market on its own, uncovering hidden gems without predefined labels. Clustering and dimensionality reduction are the tools of the trade, allowing algorithms to discern structure and uncover insights autonomously.
As we embark on this voyage into the Basics of Machine Learning, keep these foundational concepts in mind. They serve as the bedrock upon which we build our understanding of how K-Nearest Neighbors (KNN) will ingeniously contribute to the evolution of our trading strategy. So, fasten your seatbelts, for the adventure has just begun!
Unveiling KNN’s Magic:
KNN is not just an algorithm; it’s a strategic companion that guides us through the labyrinth of market dynamics. In this installment, we’ll explore how KNN becomes the compass that navigates us through the vast sea of cryptocurrency assets, particularly on the Binance platform.
The Essence of Clustering:
At its core, KNN brings forth the art of clustering — a sophisticated technique that categorizes assets based on their inherent characteristics. Imagine having the ability to group similar assets, allowing us to tailor our trading strategies with precision, reacting to market nuances with unparalleled finesse.
From 138 to 5: Categorizing Crypto Futures:
In this leg of our journey, I’ll unravel the intricacies of how KNN helped us segregate 138 crypto futures on Binance into five distinct categories based on volatility. This strategic categorization sets the stage for more refined and targeted trading approaches, laying the groundwork for what lies ahead in our exploration of Hidden Markov Models (HMM) and beyond.
So, tighten your seatbelts as we embark on this illuminating journey, where KNN transforms data into actionable insights, propelling our trading strategy to new heights. Let the clustering magic begin!
The Code Explanation
Step 1: Importing Libraries and Removing Warnings
# Remove unwanted warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
This code snippet ignores future warnings for cleaner output.
Step 2: Importing Necessary Libraries for Data Management
# Data extraction and management
import pandas as pd
import numpy as np
from pandas_datareader.data import DataReader
from pandas_datareader.nasdaq_trader import get_nasdaq_symbols
Here, Pandas is imported for data manipulation, and DataReader is used to fetch financial data.
Step 3: Importing Libraries for Feature Engineering and Machine Learning
# Feature Engineering
from sklearn.preprocessing import StandardScaler
# Machine Learning
from sklearn.cluster import KMeans
from sklearn import metrics
from kneed import KneeLocator
These libraries are for feature scaling (StandardScaler) and machine learning (KMeans clustering).
Step 4: Importing Libraries for Cointegration and Statistics
# Cointegration and Statistics
from statsmodels.tsa.stattools import coint
import statsmodels.api as sm
These libraries are used for statistical analysis, particularly cointegration tests.
Step 5: Importing Libraries for Reporting Visualization
# Reporting visualization
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline
These libraries are for visualizing the results, including t-SNE for dimensionality reduction.
Step 6: Importing Cryptocurrency Exchange API Library (ccxt)
import ccxt
The ccxt library allows interaction with cryptocurrency exchanges. In this code, it’s used to get market symbols.
Step 7: Defining a List of Cryptocurrency Symbols
symbols = ["BTC/USDT:USDT",
"GMT/USDT:USDT",
"ETH/USDT:USDT",
"MTL/USDT:USDT",
"NEAR/USDT:USDT",
"SOL/USDT:USDT",
"OGN/USDT:USDT",
"ZIL/USDT:USDT",
"APE/USDT:USDT",
"XRP/USDT:USDT",
"ADA/USDT:USDT",
"AVAX/USDT:USDT",
"KNC/USDT:USDT",
"DOGE/USDT:USDT",
"WAVES/USDT:USDT",
"1000SHIB/USDT:USDT",
"FTM/USDT:USDT",
"BNB/USDT:USDT",
"XMR/USDT:USDT",
"DOT/USDT:USDT",
"GALA/USDT:USDT",
"MATIC/USDT:USDT",
"LRC/USDT:USDT",
"RUNE/USDT:USDT",
"AUDIO/USDT:USDT",
"FIL/USDT:USDT",
"ETC/USDT:USDT",
"EOS/USDT:USDT",
"ZEC/USDT:USDT",
"AXS/USDT:USDT",
"LTC/USDT:USDT",
"SAND/USDT:USDT",
"LINK/USDT:USDT",
"SXP/USDT:USDT",
"ATOM/USDT:USDT",
"BCH/USDT:USDT",
"PEOPLE/USDT:USDT",
"MANA/USDT:USDT",
"AAVE/USDT:USDT",
"ALICE/USDT:USDT",
"BNX/USDT:USDT",
"KAVA/USDT:USDT",
"CRV/USDT:USDT",
"ONE/USDT:USDT",
"VET/USDT:USDT",
"THETA/USDT:USDT",
"DYDX/USDT:USDT",
"ICP/USDT:USDT",
"ALGO/USDT:USDT",
"SUSHI/USDT:USDT",
"REN/USDT:USDT",
"COMP/USDT:USDT",
"XLM/USDT:USDT",
"CHZ/USDT:USDT",
"TLM/USDT:USDT",
"TRX/USDT:USDT",
"XTZ/USDT:USDT",
"FTT/USDT:USDT",
"IMX/USDT:USDT",
"CELR/USDT:USDT",
"WOO/USDT:USDT",
"HNT/USDT:USDT",
"EGLD/USDT:USDT",
"ENJ/USDT:USDT",
"CELO/USDT:USDT",
"BAT/USDT:USDT",
"KSM/USDT:USDT",
"UNI/USDT:USDT",
"ROSE/USDT:USDT",
"BAKE/USDT:USDT",
"RSR/USDT:USDT",
"IOST/USDT:USDT",
"GRT/USDT:USDT",
"DASH/USDT:USDT",
"ALPHA/USDT:USDT",
"FLOW/USDT:USDT",
"OCEAN/USDT:USDT",
"DENT/USDT:USDT",
"CHR/USDT:USDT",
"OMG/USDT:USDT",
"HOT/USDT:USDT",
"LINA/USDT:USDT",
"SRM/USDT:USDT",
"COTI/USDT:USDT",
"SKL/USDT:USDT",
"NEO/USDT:USDT",
"SNX/USDT:USDT",
"ICX/USDT:USDT",
"AR/USDT:USDT",
"1INCH/USDT:USDT",
"API3/USDT:USDT",
"ANKR/USDT:USDT",
"DUSK/USDT:USDT",
"REEF/USDT:USDT",
"BAL/USDT:USDT",
"BAND/USDT:USDT",
"ZRX/USDT:USDT",
"C98/USDT:USDT",
"QTUM/USDT:USDT",
"STORJ/USDT:USDT",
"IOTA/USDT:USDT",
"ONT/USDT:USDT",
"MASK/USDT:USDT",
"GTC/USDT:USDT",
"HBAR/USDT:USDT",
"MKR/USDT:USDT",
"TOMO/USDT:USDT",
"ENS/USDT:USDT",
"ZEN/USDT:USDT",
"SFP/USDT:USDT",
"CVC/USDT:USDT",
"IOTX/USDT:USDT",
"CTK/USDT:USDT",
"FLM/USDT:USDT",
"NKN/USDT:USDT",
"YFI/USDT:USDT",
"RLC/USDT:USDT",
"BTS/USDT:USDT",
"KLAY/USDT:USDT",
"BEL/USDT:USDT",
"XEM/USDT:USDT",
"ANT/USDT:USDT",
"SC/USDT:USDT",
"LIT/USDT:USDT",
"CTSI/USDT:USDT",
"STMX/USDT:USDT",
"UNFI/USDT:USDT",
"RVN/USDT:USDT",
"1000XEC/USDT:USDT",
"RAY/USDT:USDT",
"BLZ/USDT:USDT",
"ATA/USDT:USDT",
"ARPA/USDT:USDT",
"DGB/USDT:USDT",
"LPT/USDT:USDT",
"TRB/USDT:USDT",
"OP/USDT:USDT",
"GAL/USDT:USDT"]
formatted_symbols = [symbol.replace("/USDT:USDT", "") for symbol in symbols]
A list of cryptocurrency symbols is defined. The formatted_symbols
list is created by removing unnecessary parts from the original symbols.
I use the same assets while doing backtest on Freqtrade
platform for crypto trading (backtetsing and live trading/dry-run)
Link to the above strategy explanation— Link
Step 8: Fetching Cryptocurrency Market Data from Binance
file_name = "./binance_data.csv"
file_name_coint = "./raw_data_coint_pairs.csv"
load_existing = True
load_coint_pairs = True
if not load_existing:
# Initialize ccxt exchange instance for Binance
exchange = ccxt.binance()
# Fetch all market symbols
symbols = exchange.fetch_markets()
# Define the time period you're interested in
since_timestamp = exchange.parse8601('2023-01-01T00:00:00Z') # Replace with your desired start time
# Set the timeframe (1h in this case)
timeframe = '1h'
# Initialize an empty DataFrame to store the OHLCV data
all_data = pd.DataFrame()
# Fetch OHLCV data for each symbol
for symbol in symbols:
if symbol['quote'] == 'USDT': # Filter only symbols with '/USDT:USDT' at the end
try:
# Fetch OHLCV data
ohlcv = exchange.fetch_ohlcv(symbol['symbol'], timeframe, since=since_timestamp)
# Convert the data to a DataFrame
df = pd.DataFrame(ohlcv, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
# Add symbol column
df['symbol'] = symbol['symbol']
# Append the data to the overall DataFrame
all_data = all_data.append(df, ignore_index=True)
except ccxt.NetworkError as e:
print(f"Network error while fetching {symbol['symbol']}: {e}")
except ccxt.ExchangeError as e:
print(f"Exchange error while fetching {symbol['symbol']}: {e}")
except Exception as e:
print(f"Error while fetching {symbol['symbol']}: {e}")
# Save the data to a .csv or .json file
output_file = './binance_data.csv' # Replace with your desired output file name
all_data.to_csv(output_file, index=False) # Use to_json() for .json format
print(f"Data saved to {output_file}")
The code fetches historical market data for the specified symbols from the Binance exchange.
Step 9: Reading and Formatting Cryptocurrency Market Data and Preprocessing Cryptocurrency Market Data
# Specify the path to your saved .csv file
csv_file_path = file_name # Replace with the actual file path
# Read the .csv file into a DataFrame
df = pd.read_csv(csv_file_path)
# Rename columns for better readability
df.rename(columns={'timestamp': 'Date', 'open': 'Open', 'high': 'High', 'low': 'Low', 'close': 'Adj Close', 'volume': 'Volume', 'symbol': 'Symbol'}, inplace=True)
# Convert timestamps to datetime objects
df['Date'] = pd.to_datetime(df['Date'] / 1000, unit='s')
# Handle duplicate entries by aggregating 'Adj Close' values
df_agg = df.groupby(['Date', 'Symbol']).agg({'Adj Close': 'mean'}).reset_index()
# Pivot the DataFrame to shift rows to columns
df_pivoted = df_agg.pivot(index='Date', columns='Symbol', values='Adj Close')
df_pivoted = df_pivoted.apply(lambda col: col.fillna(col.mean()))
# Display the pivoted DataFrame
pd.set_option("display.max_columns", None) # Show all columns
pd.set_option("display.width", 1000) # Adjust display width
pd.set_option("display.precision", 2) # Set precision
# Filter columns with only "/USDT:USDT" in the suffix
filtered_columns = [col for col in df_pivoted.columns if col.endswith('/USDT:USDT')]
# Create a DataFrame with the filtered columns
filtered_df = df_pivoted[filtered_columns]
# Remove "/USDT:USDT" from column names
filtered_df.columns = [col.replace('/USDT:USDT', '') for col in filtered_df.columns]
print("filtered_columns Data:")
filtered_df = filtered_df[formatted_symbols]
filtered_df
The code reads the market data from a CSV file, renames columns, and adjusts the data format. The code pivots the data to have symbols as columns and dates as indices.
Step 10: Performing Feature Scaling
# Create DataFrame with Returns and Volatility information
df_returns = pd.DataFrame(data.pct_change().mean() * 365, columns=["Returns"])
# df_returns = pd.DataFrame(data.pct_change().mean() * 255, columns=["Returns"])
# df_returns["Volatility"] = data.pct_change().std() * np.sqrt(255) # use for stock trading dayws wihch has 255 aprox trading days
df_returns["Volatility"] = data.pct_change().std() * np.sqrt(365)
df_returns.head()
# Feature scaling
scaler = StandardScaler()
scaler = scaler.fit_transform(df_returns)
scaled_data = pd.DataFrame(scaler, columns=df_returns.columns, index=df_returns.index)
The code creates a dataframe separately for “Returns” and “Volatility” and then scales the features using StandardScaler.
The StandardScaler
is used to scale the features in the dataset before applying the K-Means clustering algorithm. Scaling is a crucial preprocessing step in many machine learning algorithms, including K-Means, because it ensures that all features have the same scale. Here's why it's important in the context of K-Means clustering:
Equal Weight to Features:
- K-Means uses distances between data points to assign them to clusters.
- If features have different scales, K-Means might give more importance to features with larger scales during the clustering process.
- Scaling ensures that all features contribute equally to the distance computation.
Sensitivity to Initial Centers:
- K-Means is sensitive to the initial placement of cluster centers.
- Features with larger scales can dominate the initial placement, affecting the convergence of the algorithm.
- Scaling mitigates this issue by giving equal weight to all features.
Improves Convergence:
- Scaling can help K-Means converge faster and more reliably, as it normalizes the influence of each feature.
- It aids in achieving a more stable and consistent clustering result across different runs.
the StandardScaler
is fitted to the returns data (df_returns
). The fit_transform
method standardizes the features by removing the mean and scaling to unit variance. The resulting scaled_data
is used in the subsequent steps of the clustering process.
In summary, using StandardScaler
ensures that the features have a mean of 0 and a standard deviation of 1, making them compatible for K-Means clustering and promoting a fair contribution of each feature to the clustering algorithm.
Step 11: Applying K-Means Clustering
# Find the optimum number of clusters
X = df_scaled.copy()
K = range(1, 15)
distortions = []
for k in K:
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
distortions.append(kmeans.inertia_)
kl = KneeLocator(K, distortions, curve="convex", direction="decreasing")
c = kl.elbow
print("Optimum Clusters: ", c)
Initialization:
X = df_scaled.copy()
: Creates a copy of the scaled data for clustering.
Iteration over Possible Cluster Numbers (k):
K = range(1, 15)
: Specifies a range of potential cluster numbers from 1 to 14.distortions = []
: Initializes an empty list to store the distortion (inertia) for each value of k.
K-Means Clustering for Each k:
- A loop iterates through each value of k.
kmeans = KMeans(n_clusters=k)
: Initializes a K-Means clustering model with the current k.kmeans.fit(X)
: Fits the model to the scaled data.distortions.append(kmeans.inertia_)
: Appends the inertia (sum of squared distances of samples to their closest cluster center) to the list.
Applying the Elbow Method:
kl = KneeLocator(K, distortions, curve="convex", direction="decreasing")
: Initializes the KneeLocator object, which helps find the "elbow" point in the distortion curve.K
: The range of cluster numbers.distortions
: The corresponding distortion values.curve="convex"
: Specifies the curve type as convex, as typically observed in the elbow method.direction="decreasing"
: Indicates that the distortion values are decreasing.
Determining the Optimal Number of Clusters c:
c = kl.elbow
: Retrieves the optimal number of clusters using the knee/elbow point determined by KneeLocator.
Output:
print("Optimum Clusters: ", c)
: Displays the optimal number of clusters.
Optimum Clusters: 6
Explanation of the Elbow Method:
- The Elbow Method involves plotting the distortion values for different values of k and identifying the point where the rate of decrease sharply changes, resembling an “elbow” in the plot.
- The optimal number of clusters is often where the distortion starts to decrease at a slower rate (after the elbow point).
KneeLocator
helps automate the detection of this elbow point in the curve.
In summary, this part of the code is a systematic approach to finding the optimal number of clusters for the K-Means algorithm, enhancing the accuracy and effectiveness of the clustering process. The optimal cluster number (c
) is then used in subsequent steps of the code.
Step 12: Visualizing K-Means Clustering Results
# Visualizing K-Means Clustering Results
# ...
# Fit K-Means Model
k_means = KMeans(n_clusters=6)
# k_means = KMeans(n_clusters=c)
k_means.fit(X)
prediction = k_means.predict(df_scaled)
Initialization of K-Means:
k_means = KMeans(n_clusters=6)
: Initializes a K-Means clustering model with a predetermined number of clusters (in this case, 6).- Alternatively, the commented line (
# k_means = KMeans(n_clusters=c)
) suggests using the optimal number of clusters (c
) determined in the previous step.
Fitting the Model:
k_means.fit(X)
: Fits the K-Means model to the scaled data (X
).
Prediction:
prediction = k_means.predict(df_scaled)
: Predicts the cluster labels for each data point in the scaled dataset. These labels represent the cluster to which each data point is assigned.
Next, the code visualizes the clustering results:
# Show Results
centroids = k_means.cluster_centers_
fig = plt.figure(figsize=(18, 10))
ax = fig.add_subplot(111)
scatter = ax.scatter(X.iloc[:, 0], X.iloc[:, 1], c=k_means.labels_, cmap="rainbow", label=X.index)
ax.set_title("K-Means Cluster Analysis Results")
ax.set_xlabel("Mean Return")
ax.set_ylabel("Volatility")
plt.colorbar(scatter)
plt.plot(centroids[:, 0], centroids[:, 1], "sg", markersize=10)
plt.show()
Centroids and Scatter Plot:
centroids = k_means.cluster_centers_
: Retrieves the coordinates of the cluster centroids.fig = plt.figure(figsize=(18, 10))
: Creates a figure for plotting.ax = fig.add_subplot(111)
: Adds a subplot to the figure.scatter = ax.scatter(X.iloc[:, 0], X.iloc[:, 1], c=k_means.labels_, cmap="rainbow", label=X.index)
: Creates a scatter plot where each point is colored according to its assigned cluster label.ax.set_title("K-Means Cluster Analysis Results")
: Sets the title of the plot.ax.set_xlabel("Mean Return")
andax.set_ylabel("Volatility")
: Sets labels for the x and y axes.plt.colorbar(scatter)
: Adds a colorbar to the plot.plt.plot(centroids[:, 0], centroids[:, 1], "sg", markersize=10)
: Plots the cluster centroids as green squares.
Display the Plot:
plt.show()
: Displays the final clustering plot.
In summary, this code segment applies the K-Means algorithm to the scaled data, predicts cluster labels, and visually represents the clustering results using a scatter plot with cluster centroids. The color of each point indicates its assigned cluster, and the green squares represent the centroids of the clusters.
You can find the whole code here — https://patreon.com/pppicasso
Step 13: Creating a DataFrame with Cluster Information
clustered_series = pd.Series(index=X.index, data=k_means.labels_.flatten())
clustered_series_all = pd.Series(index=X.index, data=k_means.labels_.flatten())
clustered_series = clustered_series[clustered_series != -1]
clustered_series[:7]
- Two pandas Series are created:
clustered_series
andclustered_series_all
. Both are initialized with cluster labels assigned by K-Means. - The code then filters out points with a label of
-1
(which might be outliers or unassigned points). - The result is a series (
clustered_series
) containing cluster labels for each data point.
Step 14: Creating DataFrame with Symbol and Cluster Number:
df = pd.DataFrame(clustered_series).reset_index()
df.columns = ["symbol", "cluster_number"]
df['symbol'] = df['symbol'].astype(str) + '/USDT:USDT'
df = df.sort_values(by="cluster_number")
- Converts the
clustered_series
into a DataFrame (df
) with columns "symbol" and "cluster_number." - Modifies the “symbol” column to include the ‘/USDT:USDT’ suffix.
- Sorts the DataFrame based on the “cluster_number.”
Grouping Symbols by Cluster:
grouped_data = df.groupby('cluster_number')['symbol'].agg(lambda x: ', '.join(f'"{symbol}"' for symbol in x))
- Groups the DataFrame by the “cluster_number” column.
- Aggregates the symbols within each cluster into a comma-separated string.
Converting Grouped Data to a Dictionary:
cluster_lists = grouped_data.to_dict()
Converts the grouped data to a dictionary (cluster_lists
) for easy access.
Step 15: Printing and Writing to File:
for cluster_number, symbol_list in cluster_lists.items():
print(f'Cluster {cluster_number}: [{symbol_list}]')
with open('./clustered_data_binance_futures/Binance_futures_SimilarVolatileAssets_cluster_data.txt', 'w') as file:
for cluster_number, symbol_list in cluster_lists.items():
file.write(f'Cluster {cluster_number}: [{symbol_list}]\n')
- Prints the clusters and their corresponding symbols.
- Writes the cluster information to a text file (
Binance_futures_SimilarVolatileAssets_cluster_data.txt
).
In summary, this code takes the results of the K-Means clustering, organizes the symbols into clusters, and prints/writes this information for further analysis or reference.
You can find the whole code here — https://patreon.com/pppicasso
Step 16: Viewing the Number of Features in Each Cluster
plt.figure(figsize=(10, 5))
plt.bar(range(len(clustered_series.value_counts())), clustered_series.value_counts())
plt.title("Clusters")
plt.xlabel("Cluster")
plt.ylabel("Features Count")
plt.show()
- This code uses Matplotlib to create a bar chart.
- The x-axis represents the cluster numbers, and the y-axis represents the count of features (data points) in each cluster.
- It provides a visual representation of how the data points are distributed across different clusters.
Step 17: Removing Items if Preferred
# Removing items if preferred
clusters_clean = clustered_series[clustered_series < 4]
print("Feature Number Previous: ", len(clustered_series))
print("Feature Number Current: ", len(clusters_clean))
- Creates a new Series (
clusters_clean
) by filtering out data points with cluster labels less than 4. - Prints the number of features before and after the removal process.
- This step might be useful if you want to exclude or focus on specific clusters based on a certain criterion (in this case, those with cluster labels less than 4).
In summary, the first part visually represents the distribution of features across clusters, and the second part filters out specific clusters based on a condition, providing information on the feature count before and after the removal.
Step 18: Co-integration
Cointegration is a statistical property that measures the long-term equilibrium relationship between two or more time series. In financial markets, it is often used to identify pairs of assets that move together over time and tend to return to a stable relationship after temporary divergences.
Here’s a brief explanation of cointegration and its benefits:
Cointegration:
- Definition: Two time series are said to be cointegrated if a linear combination of them is stationary, meaning it doesn’t have a trend.
- Stationarity: While individual time series might have trends, their combination (the cointegrated series) does not. This is valuable because it implies a stable, long-term relationship.
- Example: Consider two stocks, A and B, that are cointegrated. Even if each stock’s price might individually increase or decrease over time, a linear combination of their prices (e.g., the price of A minus a constant times the price of B) remains stationary.
Benefits of Cointegration in Trading:
- Pairs Trading: Cointegration is often used in pairs trading strategies. Traders identify cointegrated pairs of assets, expecting that any short-term divergence from their historical relationship will eventually revert to the mean. This can be exploited by simultaneously taking a long position in an undervalued asset and a short position in an overvalued one.
- Risk Management: Understanding cointegration helps in managing risk. When two assets are cointegrated, a trader can use the historical relationship between them to assess potential future movements. This information can guide decisions on position sizing and risk exposure.
- Portfolio Diversification: Cointegration analysis can be applied to build diversified portfolios. By selecting assets that are not cointegrated with each other, a portfolio can be constructed to minimize risk and capture different sources of return.
- Mean-Reversion Strategies: Cointegration provides a statistical basis for mean-reversion strategies. Traders can take advantage of short-term deviations from the long-term relationship between cointegrated assets, expecting them to revert to the historical mean.
- Hedging: Cointegrated assets can be used for hedging purposes. For example, if an investor holds a portfolio of stocks and wants to hedge against market risk, they might use cointegrated assets to construct a hedging strategy.
In summary, cointegration is a valuable concept in quantitative finance that helps traders and investors identify stable relationships between assets, leading to various trading and investment strategies aimed at exploiting or managing these relationships for financial gain.
# Calculate cointegration
def calculate_cointegration(series_1, series_2):
coint_flag = 0
coint_res = coint(series_1, series_2)
coint_t = coint_res[0]
p_value = coint_res[1]
critical_value = coint_res[2][1]
model = sm.OLS(series_1, series_2).fit()
hedge_ratio = model.params[0]
coint_flag = 1 if p_value < 0.05 and coint_t < critical_value else 0
return coint_flag, hedge_ratio
- This function (
calculate_cointegration
) takes two time series (series_1
andseries_2
) as input and checks for cointegration. - It uses the
coint
function from thestatsmodels
library to perform the cointegration test. - The p-value and critical value are extracted from the cointegration test results.
- Additionally, it fits an Ordinary Least Squares (OLS) regression model to estimate the hedge ratio.
# Loop through and calculate cointegrated pairs
# Allow 10 - 30 mins for calculation
tested_pairs = []
cointegrated_pairs = []
if not load_coint_pairs:
for base_asset in clusters_clean.index:
base_label = clusters_clean[base_asset]
for compare_asset in clusters_clean.index:
compare_label = clusters_clean[compare_asset]
test_pair = base_asset + compare_asset
test_pair = ''.join(sorted(test_pair))
is_tested = test_pair in tested_pairs
tested_pairs.append(test_pair)
if compare_asset != base_asset and base_label == compare_label and not is_tested:
series_1 = data[base_asset].values.astype(float)
series_2 = data[compare_asset].values.astype(float)
coint_flag, _ = calculate_cointegration(series_1, series_2)
if coint_flag == 1:
cointegrated_pairs.append({"base_asset": base_asset,
"compare_asset": compare_asset,
"label": base_label})
df_coint = pd.DataFrame(cointegrated_pairs).sort_values(by="label")
df_coint.to_csv(file_name_coint)
- This part of the code loops through each pair of assets within the same cluster.
- It checks if the pair has already been tested to avoid redundancy.
- For each untested pair within the same cluster, it calculates cointegration using the
calculate_cointegration
function. - If cointegration is detected (coint_flag equals 1), the pair is added to the
cointegrated_pairs
list. - Finally, the resulting cointegrated pairs are stored in a DataFrame (
df_coint
) and saved to a CSV file (file_name_coint
).
# Load Cointegrated Pairs
df_coint = pd.read_csv(file_name_coint).iloc[:, 1:]
df_coint.head(46)
You can find the whole code here — https://patreon.com/pppicasso
Summary of the Code:
The provided code outlines a comprehensive approach to developing and optimizing algorithmic trading strategies, specifically focusing on cryptocurrency markets. The key components include:
Data Collection:
- Fetching historical price data for a set of crypto assets from the Binance exchange.
- Storing the data in a CSV file for further analysis.
Feature Engineering:
- Extracting relevant features such as returns and volatility from the price data.
- Scaling the features using the StandardScaler.
Clustering with K-Means:
- Grouping assets into clusters based on their historical returns and volatility.
- Determining the optimal number of clusters using the elbow method.
- Applying K-Means clustering to categorize assets into distinct groups.
Cointegration Analysis:
- Assessing cointegration between pairs of assets within the same cluster.
- Identifying pairs that exhibit a stable, long-term relationship.
Visualization:
- Visualizing the clustering results and cointegrated pairs to gain insights.
Next Steps — Hidden Markov Model (HMM):
- Introducing the concept of Hidden Markov Models for further analysis.
- Utilizing HMM to identify and trade only during favorable market conditions, enhancing strategy performance.
Integration with FreqTrade:
- The ultimate goal is to integrate these strategies and insights into a FreqTrade bot.
- Implementing reinforcement learning methodologies for automatic trading.
Benefits and Future Directions:
- Pairs Trading: The cointegration analysis allows for the identification of pairs suitable for pairs trading strategies.
- Risk Management: Understanding asset relationships aids in managing risk and optimizing portfolio diversification.
- Mean-Reversion: Strategies capitalize on short-term deviations from the long-term relationship between cointegrated assets.
- Hedging: Cointegrated assets can be used for effective hedging.
Towards FreqAI Bot and Patreon:
- The ultimate aim is to leverage these strategies in a FreqTrade bot for automatic trading.
- Reinforcement learning methodologies will enhance the bot’s ability to adapt and improve over time.
- The successful operation of the bot is evidenced by its running success.
- Access to exclusive codes and advanced strategies is offered through a Patreon page, providing a community for enthusiasts and supporters.
Next Steps:
- Continue refining strategies, incorporating HMM, and optimizing the trading bot.
- Regularly update and share insights on the Patreon page for community engagement and support.
This holistic approach covers data analysis, clustering, cointegration, and sets the foundation for advanced modeling with HMM, culminating in the development of a sophisticated FreqAI bot for automated and efficient cryptocurrency trading.
You can find the whole code here — https://patreon.com/pppicasso
Future Scope:
- Integration with ML Models: The next step could be integrating machine learning models to predict market movements or to further enhance the strategy. Next one will be on Hidden Markov Models (HMM) to identify trend and execute our strategy during the trend only.
- Live Testing: Test the optimized strategy in a simulated or live trading (Do Paper trading before going live) environment to assess its real-world performance.
- Continual Optimization: Periodically re-run the optimization as market conditions change.
This framework, once established, becomes a valuable tool in any trader’s arsenal, providing a methodical approach to enhancing trading performance.
Additional Resources and Readings
For those eager to delve deeper into the world of algorithmic trading and enhance their knowledge on the subject, consider exploring the following resources:
- “Algorithmic Trading: Winning Strategies and Their Rationale” by Ernie Chan
- “Python for Finance: Mastering Data-Driven Finance” by Yves Hilpisch
- “Technical Analysis of the Financial Markets” by John J. Murphy
- “Building Winning Algorithmic Trading Systems” by Kevin Davey
- Online courses on platforms like Coursera, Udemy, and edX that offer specific classes on algorithmic trading, Python programming, and financial analysis.
These resources will provide a comprehensive foundation for understanding the technical aspects of algo trading and the application of Python in finance. Additionally, participating in online forums and communities such as Stack Overflow, GitHub, and Reddit’s r/algotrading can offer practical insights and peer support.
Thank you, Readers.
I hope you have found this article on Algorithmic strategy to be informative and helpful. As a creator, I am dedicated to providing valuable insights and analysis on cryptocurrency, stock market and other assets management.
If you have enjoyed this article and would like to support my ongoing efforts, I would be honored to have you as a member of my Patreon community. As a member, you will have access to exclusive content, early access to new analysis, and the opportunity to be a part of shaping the direction of my research.
Membership starts at just $10, and you can choose to contribute on a bi-monthly basis. Your support will help me to continue to produce high-quality content and bring you the latest insights on financial analytics.
Patreon — https://patreon.com/pppicasso
Regards,
Puranam Pradeep Picasso
Linkedin — https://www.linkedin.com/in/puranampradeeppicasso/
Patreon — https://patreon.com/pppicasso
Facebook — https://www.facebook.com/puranam.p.picasso/
Twitter — https://twitter.com/picasso_999