Home > Article > Backend Development > Easily get Binance historical transactions using Python
Given that some strategies require a certain level of technical data, while others may only take an hour of your time, the process is not always simple, and infrastructure, availability and Elements such as connectivity can vary greatly depending on the data type.
But why is this article only about getting "transaction" data, and why are we using the Binance API? You may have some questions about the content of my article.
I would say that trading data endpoints are primarily available on 99.99% of exchanges. It is fine-grained, provides enough detail (in some very specific cases) for backtesting high-frequency trading (HFT) strategies, and can be used as OHLC candles (1S to 24H or more if you if desired) basis.
Trading data is versatile and allows for extensive experimentation using strategies with different frequencies.
Why choose Binance?
That's just because it's one of the exchanges I tend to backtrack on due to the sheer volume.
We are going to create a Python script that receives a pair of symbols, a start date, and an end date as command line arguments. It outputs a CSV file containing all transactions to disk. The process can be explained in detail through the following steps:
1. Parse the symbol, starting_date and ending_date arguments.
2. Get the first transaction that occurred on the start date to get the first transaction trade_id.
3. Loop to obtain 1,000 transactions per request (Binance API limit) until ending_date is reached.
4. Finally, save the data to disk. For the example, we saved it as CSV, but you have other options and don't necessarily save it as CSV.
5. We will use pandas, requests, time, sys, and datetime. In the code snippet, the error validation will not be shown as it does not add any value to the description.
The script will use the following parameters:
1. symbol: the symbol of the trading pair, defined by Binance. This can be queried here or copied from the URL of the Binance web application (excluding the _ character).
-starting_date and ending_date: Self-explanatory. The expected format is mm/dd/yyyy, or %m/%d/%Y using Python slang.
To get the parameters we will use the built-in function sys (nothing fancy here), and to parse the date we will use the datetime library.
We will add one day and subtract one microsecond so that the ending_date time portion is always at 23:59:59.999, which makes getting the same day interval more practical.
Using Binance's API and using the aggTrades endpoint, we can get up to 1000 transactions in one request, if we use the start and end parameters, then The interval between them is at most one hour.
After some failures, by using time interval fetching (at some point or another, liquidity would go crazy and I would lose some valuable trades), I decided Try the from_id strategy.
Set aggTrades to be the selected endpoint as it returns compressed trades. This way we won't lose any valuable information.
Get the compressed total transactions. Trades executed at the same time from the same order at the same price will aggregate the quantity.
The from_id strategy is like this:
We want to get the starting_date of the first transaction by sending the date interval to the end point. After that, we will get 1000 transactions starting from the first fetched transaction ID. We will then check if the last transaction occurred after our ending_date.
If so, we have iterated through all time periods and can save the results to a file. Otherwise, we update the from_id variable to get the last transaction ID and start the loop again.
First, we create a new_end_date. That's because we use aggTrades by passing a startTime and endTime parameters.
Now we only need to know the first transaction number of the period, so we will add 60 seconds. In low-liquidity currency pairs, this parameter can be changed, since transactions are not guaranteed to occur on the first day requested.
Then, parse the date using our helper function to convert the date to Unix millisecond representation using the calendar.timegm function. The timegm function is the preferred function because it keeps dates in UTC.
#The response to the request is a list of trade objects sorted by date, in the following format:
So, since we need the first transaction ID, we will return that response[0]["a"] value.
Now that we have the first transaction ID, we can extract 1000 transactions at a time until ending_date is reached. The following code will be called in our main loop. It will execute our request using the from_id parameter, discarding the startDate and endDate parameters.
#Now, this is our main loop that will execute the request and create our DataFrame.
We check if current_time contains the most recently obtained transaction date greater than to_date, if so, then we:
After assembling the DataFrame, we need to perform simple data cleaning. We will remove transactions with duplicate trim and transactions that occurred after to_date (we have this problem because we are getting most of the 1000 transactions, therefore, we are expected to execute some transactions after the target end date).
We can encapsulate our trim function:
and perform our data cleaning:
Now we can save it to a file using the following to_csv method:
We can also use other data stores Mechanics, such as Arctic.
It is important that we trust our data when using a trading strategy. We can easily do this with the fetched transaction data by applying the following validation:
In the code snippet we convert this DataFrame to NumPy array, and iterate row by row, checking whether the transaction ID is incremented by 1 for each row.
Binance transaction IDs are numbered incrementally and are created for each symbol, so it is easy to verify that the data is correct.
PS: The first step in creating a successful trading strategy is having the right data.
The above is the detailed content of Easily get Binance historical transactions using Python. For more information, please follow other related articles on the PHP Chinese website!