Home  >  Article  >  Backend Development  >  How to normalize the elements of an array within a time range?

How to normalize the elements of an array within a time range?

PHPz
PHPzforward
2024-02-08 21:30:35746browse

How to normalize the elements of an array within a time range?

php editor Xinyi introduces to you how to standardize the elements of the array within the time range. In development, we often need to process time series data, and these data may have time jumps or missing situations. In order to ensure the accuracy and completeness of the data, we need to standardize the elements in the array. Normalization puts the elements of an array in chronological order and fills in missing time points. Below, we'll detail how to implement this functionality.

Question content

I am trying to normalize an array of elements within a time range. Suppose you have 20 bank transactions that occurred on January 1, 2022

transaction  1 - 2022/01/01
transaction  2 - 2022/01/01
...
transaction 20 - 2022/01/01

We have no data other than the date they occur, but we still want to assign them an hour of the day, so they end up being:

transaction  1 - 2022/01/01 00:00
transaction  2 - 2022/01/01 ??:??
...
transaction 20 - 2022/01/01 23:59

In go, I have a function that tries to calculate the normalization of a time of day for an index in an array of elements:

func normal(start, end time.time, arraysize, index float64) time.time {
    delta := end.sub(start)
    minutes := delta.minutes()

    duration := minutes * ((index+1) / arraysize)

    return start.add(time.duration(duration) * time.minute)
}

However, I accidentally calculated 2022/1/1 05:59 at index 0 in the 4-element array in the time range from 2022/1/1 00:00 to 2022/1/1 23:59. On the contrary, I would like to see 2022/1/1 00:00. The only one that works correctly under these conditions is index 3.

So, what am I doing wrong with normalization?

edit:

This is the function fixed by @icza

func timeindex(min, max time.time, entries, position float64) time.time {
    delta := max.sub(min)
    minutes := delta.minutes()

    if position < 0 {
        position = 0
    }

    duration := (minutes * (position / (entries - 1)))

    return min.add(time.duration(duration) * time.minute)
}

Here is an example: assuming our start and end dates are 2022/01/01 00:00 - 2022/01/01 00:03, our bank transaction array There are 3 entries in and we want to get the normalized time of transaction No. 3 (2 in the array):

result := timeindex(time.date(2022, time.january, 1, 0, 0, 0, 0, time.utc), time.date(2022, time.january, 1, 0, 3, 0, 0, time.utc), 3, 2)

Since there are only 4 minutes between start time and end time (from 00:00 to 00:03) and want to find array (size 3## The normalized time of the last entry (index 2) in #), the result should be:

fmt.Printf("%t", result.Equal(time.Date(2022, time.January, 1, 0, 3, 0, 0, time.UTC))
// prints "true"

or the last minute in the range, which is

00:03.

Here is a reproducible example: https://go.dev/play/p/ezwkqanv1at

Workaround

Between

n points There are n-1 segments. This means that if you want to include start and end in the interpolation, the number of time periods (i.e. delta) is arraysize - 1 .

Additionally, if you add

1 to index, the result cannot be start (you will skip 00:00 ).

So the correct algorithm is this:

func normal(start, end time.time, arraysize, index float64) time.time {
    minutes := end.sub(start).minutes()

    duration := minutes * (index / (arraysize - 1))

    return start.add(time.duration(duration) * time.minute)
}

Try it on

go playground.

Also note that if you have a lot of transactions (ordered by minute of the day, about a thousand) you can easily end up with multiple transactions with the same timestamp (same hour and minute). If you want to avoid this, use a precision smaller than minutes, such as seconds or milliseconds:

func normal(start, end time.time, arraysize, index float64) time.time {
    sec := end.sub(start).seconds()

    duration := sec * (index / (arraysize - 1))

    return start.add(time.duration(duration) * time.second)
}

Yes, this will result in timestamps with seconds that are not necessarily zero either, but will ensure that higher transaction volumes have different, unique timestamps.

If your transactions are on the order of seconds of a day (i.e. 86400), then you can remove this "unit" entirely and use

time.duration itself (i.e. nanoseconds). This will guarantee timestamp uniqueness even for the largest number of transactions:

func normal(start, end time.time, arraysize, index float64) time.time {
    delta := float64(end.sub(start))

    duration := delta * (index / (arraysize - 1))

    return start.add(time.duration(duration))
}

Testing this with 1 million transactions, here are the first 15 time parts (they are only delayed in the sub-second part):

0 - 00:00:00.00000
1 - 00:00:00.08634
2 - 00:00:00.17268
3 - 00:00:00.25902
4 - 00:00:00.34536
5 - 00:00:00.43170
6 - 00:00:00.51804
7 - 00:00:00.60438
8 - 00:00:00.69072
9 - 00:00:00.77706
10 - 00:00:00.86340
11 - 00:00:00.94974
12 - 00:00:01.03608
13 - 00:00:01.12242
14 - 00:00:01.20876
15 - 00:00:01.29510
16 - 00:00:01.38144
17 - 00:00:01.46778
18 - 00:00:01.55412
19 - 00:00:01.64046

Try this on

go playground.

The above is the detailed content of How to normalize the elements of an array within a time range?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete