PostgreSQL has pretty awesome date/time functionality. I've used a lot of database and the functionality and thoroughness of the treatment of dates and times (and particularly timezones) is unparalleled. As much as I'm impressed with it, I knew there would come a time where the outcome of all that cleverness would backfire.
Recently, I was doing some data partitioning. I split a couple of largish (approximately billion row) tables up into month segments. I wrote a tiny little pl/pgsql function that takes a parent table, and creates an inherited child with the right indexes, triggers, check constraints (for constraint exclusion) and permissions. I renamed the big table something transient, created a new parent table with the old table's name, made the old unwieldy table a child of that table and then created a whole bunch of new partitions. This allowed me to pretty much ignore my shenanigans from the application side. Once I created the partitions, I need to back populate them. To do this, I did the following:
ALTER TABLE newchildtable_200903 DISABLE TRIGGER ALL;
INSERT INTO newchildtable_200903
SELECT * FROM oldcrappytable
WHERE whence >= '2009-03-01 00:00:00-00'::timestamptz
AND whence < '2009-03-01 00:00:00-00'::timstamptz + '1 month'::interval;
DELETE FROM oldcrappytable
WHERE whence >= '2009-03-01 00:00:00-00'::timestamptz
AND whence < '2009-03-01 00:00:00-00'::timstamptz + '1 month'::interval;
ALTER TABLE newchildtable_200903 ENABLE TRIGGER ALL;
Suffice it to say, this did not do what I wanted at all.
PostgreSQL's interval type is one of its more clever features. The idea that a month isn't always equal to a month is a Gregorian truism. So, PostgreSQL is design to "do the right thing" and consider a month in the context of another argument. A month in the above example is a month with respect to March. Right? No.
The lacking part here is the timezone. I do the partitioning in UTC (because I'm not insane). So, I need the month of March in UTC. Although I explicitly stated '2009-03-01 00:00:00-00' in UTC, PostgreSQL interprets that in the client's timezone... and then adds a month. I'm in US/Eastern which is trailing UTC by four or five hours and thus the reference starting point is actually in February of 2009, which only had 28 days. So, the latter inequality up there does not do through the end of March!
I would argue that this behavior is invalid, because of the extremely unexpected results of the following simple test case:
postgres=# select ('2009-03-01 00:00:00-00'::timestamptz + '1 month'::interval);
2009-03-28 19:00:00-04
postgres=# set timezone = 'utc';
SET
postgres=# select ('2009-03-01 00:00:00-00'::timestamptz + '1 month'::interval);
2009-04-01 00:00:00+00
Here I get a completely different date/time if I ask what appears to be a very unambiguous question depending on whether I'm left or right of the Prime Meridian.
Tuesday, June 16. 2009 at 16:32 (Reply)
I used to maintain a PostgreSQL database of "smart meter" hourly readings for a couple of years for about 30 residences. Timestamp and interval problems were tricky for monthly aggregate reports. Correctly accounting for daylight saving time for "user" data display could also be a problem sometimes (twice per year ;-)).
Wednesday, June 17. 2009 at 00:06 (Reply)
If you're already treating the logic as if it's UTC, but you don't like the results of when the *actual* timezone is *not* UTC, then why use the "timestamptz" data type? Why not just "timestamp"?
postgres=# show timezone;
US/Central
postgres=# select ('2009-03-01 00:00:00-00'::timestamp + '1 month'::interval);
2009-04-01 00:00:00
postgres=# set timezone = 'utc';
postgres=# select ('2009-03-01 00:00:00-00'::timestamp + '1 month'::interval);
2009-04-01 00:00:00
Wednesday, June 17. 2009 at 09:21 (Link) (Reply)
Your example further demonstrates the issue... You're in US/Central and you got 2009-04-01 00:00:00 as the answer to you question. With no timezone on the end of that, I would assume it is in US/Central -- which is wrong.
"my logic" should be in the timezone specified in the representation if I put the timezone in there. Dropping the timezone of (using pure timestamp) is a great way to have the wrong time when you convert it back into something with a real timezone representation.
Wednesday, June 17. 2009 at 03:13 (Reply)
If you need a specific timezone in your database, just set this parameter in the database itself:
ALTER DATABASE dbname SET TimeZone=utc;
All timestamptz-values and calculations will use this timezone.
Otherwise the database has no other option then using the clusterwide settings or even default systemsettings. And those might be completly different...
Wednesday, June 17. 2009 at 11:05 (Link) (Reply)
The problem with that is it's too heavy handed. Most of the time when querying within the database, you want to see the time information in your local time. (I find people, myself included, are generally bad about thinking in abstract time zones...)
I think the heart of Theo's problem was the assumption that specifying a utc offset in his timestamp would also set its time zone; it does not. But, Postgres does have a way to do that, which is via the AT TIME ZONE syntax:
select ('2009-03-01 00:00:00-00'::timestamptz at time zone 'UTC' + '1 month'::interval);