More often than not, I've got some kind of data in a postgresql database that is somehow described by timestamps.
Simple test table like this:
create table testdata (
id serial,
d numeric,
ts timestamp without time zone default now()
);
Using bash I'll populate it a bit with not too unrealistic timeintervals.
while true
do
psql -c "insert into testdata (d) values (trunc(random()*100));"
sleep 1.$RANDOM
done
And to make matters "worse" in another window I started a
while true
do
psql -c "insert into testdata (d) values (trunc(random()*1000));"
sleep 3.$RANDOM
done
And let it run for a bit...
So now I have some data and I can start working with them, eventually data of this kind would probably end up being fed though matplotlib, or perhaps converted to json and fed to d3 to make fancy graphs. (Everybody likes a fancy graph)
But looking at the data and the standard functions, I've got something like
select date_trunc('second',ts) as gt,
count(d) as samples,
min(d) as min,
max(d) as max,
round(avg(d)) as avg,
round(stddev_samp(d)) as devsamp,
round(stddev_pop(d)) as devpop
from testdata
group by gt
order by gt desc
limit 10;
Returning an output like
gt | samples | min | max | avg | devsamp | devpop
---------------------+---------+-----+-----+-----+---------+--------
2014-02-23 14:44:07 | 2 | 79 | 543 | 311 | 328 | 232
2014-02-23 14:44:06 | 1 | 62 | 62 | 62 | | 0
2014-02-23 14:44:04 | 1 | 57 | 57 | 57 | | 0
2014-02-23 14:44:03 | 2 | 33 | 65 | 49 | 23 | 16
2014-02-23 14:44:02 | 1 | 36 | 36 | 36 | | 0
2014-02-23 14:44:00 | 2 | 56 | 354 | 205 | 211 | 149
2014-02-23 14:43:59 | 1 | 69 | 69 | 69 | | 0
2014-02-23 14:43:58 | 1 | 11 | 11 | 11 | | 0
2014-02-23 14:43:57 | 1 | 945 | 945 | 945 | | 0
2014-02-23 14:43:56 | 1 | 96 | 96 | 96 | | 0
(10 rows)
And there's
select date_trunc('minute',ts) as gt,
count(d) as samples,
min(d) as min,
max(d) as max,
round(avg(d)) as avg,
round(stddev_samp(d)) as devsamp,
round(stddev_pop(d)) as devpop
from testdata
group by gt
order by gt desc
limit 10;
Returning
gt | samples | min | max | avg | devsamp | devpop
---------------------+---------+-----+-----+-----+---------+--------
2014-02-23 14:45:00 | 38 | 6 | 804 | 182 | 236 | 233
2014-02-23 14:44:00 | 62 | 2 | 879 | 137 | 193 | 191
2014-02-23 14:43:00 | 49 | 0 | 969 | 107 | 202 | 200
2014-02-23 14:42:00 | 44 | 0 | 99 | 54 | 29 | 29
2014-02-23 14:41:00 | 46 | 1 | 99 | 51 | 31 | 31
2014-02-23 14:40:00 | 4 | 7 | 68 | 48 | 29 | 25
2014-02-23 14:39:00 | 58 | 3 | 936 | 188 | 282 | 280
2014-02-23 14:38:00 | 62 | 1 | 908 | 161 | 244 | 242
2014-02-23 14:37:00 | 62 | 10 | 989 | 185 | 273 | 271
2014-02-23 14:36:00 | 62 | 6 | 951 | 173 | 251 | 249
(10 rows)
All giving me very varying sample sizes and a slightly awkward data resolution making a graph drawn from it somewhat ugly and jagged.
Next one up is
select date_trunc('hour',ts) as gt,
count(d) as samples,
min(d) as min,
max(d) as max,
round(avg(d)) as avg,
round(stddev_samp(d)) as devsamp,
round(stddev_pop(d)) as devpop
from testdata
group by gt
order by gt desc
limit 10;
Giving me
gt | samples | min | max | avg | devsamp | devpop
---------------------+---------+-----+-----+-----+---------+--------
2014-02-23 14:00:00 | 1771 | 0 | 989 | 92 | 151 | 151
(1 row)
This will actually work out for the grand overview over time, next one up is 'day' and 'week' and 'month' and 'year' and if I actually want those specific intervals I'm kinda lucky, because that's what I can get.
What I gathered from the net and experiments was a tiny function that aggregates time much more flexibly, allowing me to specify in seconds the size of the aggregation interval, so 60 seconds = 1 minutes, 300 seconds = 5 minutes, 3600 seconds = 1 hour, 86400 seconds = 1 day.
CREATE OR REPLACE FUNCTION round_time(integer,timestamp without time zone)
RETURNS timestamp without time zone AS
$BODY$
SELECT ('1970-01-01 00:00:00'::timestamp + trunc(extract (epoch from $2)/$1)*$1 * interval '1 second')::timestamp without time zone
$BODY$
LANGUAGE sql VOLATILE;
Using it is fairly straight forward
select round_time(300,ts) as gt,
count(d) as samples,
min(d) as min,
max(d) as max,
round(avg(d)) as avg,
round(stddev_samp(d)) as devsamp,
round(stddev_pop(d)) as devpop
from testdata
group by gt
order by gt desc
limit 10;
Gives me 5 minutes interval results
gt | samples | min | max | avg | devsamp | devpop
---------------------+---------+-----+-----+-----+---------+--------
2014-02-23 14:50:00 | 161 | 0 | 950 | 168 | 242 | 241
2014-02-23 14:45:00 | 316 | 0 | 973 | 186 | 249 | 249
2014-02-23 14:40:00 | 205 | 0 | 969 | 91 | 150 | 150
2014-02-23 14:35:00 | 286 | 1 | 989 | 180 | 254 | 253
2014-02-23 14:30:00 | 215 | 0 | 484 | 89 | 103 | 102
2014-02-23 14:25:00 | 223 | 0 | 99 | 50 | 28 | 28
2014-02-23 14:20:00 | 227 | 0 | 99 | 51 | 28 | 28
2014-02-23 14:15:00 | 221 | 0 | 99 | 50 | 30 | 30
2014-02-23 14:10:00 | 191 | 0 | 99 | 50 | 30 | 30
2014-02-23 14:05:00 | 60 | 0 | 99 | 48 | 31 | 31
(10 rows)
select round_time(900,ts) as gt,
count(d) as samples,
min(d) as min,
max(d) as max,
round(avg(d)) as avg,
round(stddev_samp(d)) as devsamp,
round(stddev_pop(d)) as devpop
from testdata
group by gt
order by gt desc
limit 10;
Returns 15 minutes interval results
gt | samples | min | max | avg | devsamp | devpop
---------------------+---------+-----+-----+-----+---------+--------
2014-02-23 14:45:00 | 547 | 0 | 973 | 183 | 253 | 253
2014-02-23 14:30:00 | 706 | 0 | 989 | 126 | 194 | 194
2014-02-23 14:15:00 | 671 | 0 | 99 | 51 | 29 | 29
2014-02-23 14:00:00 | 251 | 0 | 99 | 49 | 31 | 31
(4 rows)
Allowing me to select a much more practical granularity and in the end most likely resulting in much more understandable data output and in the end much prettier graphs.
Oh, and finally, although I don't normally use sub-minute intervals, selecting something like a 10 second inverval.
select round_time(10,ts) as gt,
count(d) as samples,
min(d) as min,
max(d) as max,
round(avg(d)) as avg,
round(stddev_samp(d)) as devsamp,
round(stddev_pop(d)) as devpop
from testdata
group by gt
order by gt desc
limit 10;
gt | samples | min | max | avg | devsamp | devpop
---------------------+---------+-----+-----+-----+---------+--------
2014-02-23 15:01:10 | 10 | 28 | 833 | 201 | 259 | 245
2014-02-23 15:01:00 | 10 | 37 | 628 | 154 | 197 | 187
2014-02-23 15:00:50 | 11 | 1 | 867 | 165 | 275 | 262
2014-02-23 15:00:40 | 10 | 16 | 797 | 152 | 255 | 242
2014-02-23 15:00:30 | 10 | 7 | 868 | 276 | 387 | 367
2014-02-23 15:00:20 | 9 | 6 | 787 | 125 | 250 | 236
2014-02-23 15:00:10 | 10 | 2 | 583 | 137 | 179 | 170
2014-02-23 15:00:00 | 10 | 30 | 806 | 172 | 244 | 231
2014-02-23 14:59:50 | 10 | 0 | 824 | 178 | 287 | 272
2014-02-23 14:59:40 | 10 | 0 | 808 | 139 | 238 | 226
(10 rows)
I find it kind of amusing that even though it's two seperate processes, sleeping different intervals (1 and 3 seconds), they fairly acurately manage to put 10 samples in every 10 second interval.
Comments
comments powered by Disqus