Google+ Followers

donderdag 26 juni 2014

Why I like PostgreSQL more than MySQL: UNSIGNED and ABS()

Let's say that you have a table with unsigned values:

CREATE TEMPORARY TABLE f (d INTEGER UNSIGNED, e INTEGER UNSIGNED);
INSERT INTO f VALUES (0,1);

You want to know the difference between the values of d and e, so you use abs()

SELECT abs(d-e) FROM f;

And you get:

ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(`f`.`d` - `f`.`e`)'

I know that 0-1 is -1, which is negative and yes, that would not fit in an USIGNED BIGINT, but where does the UNSIGNED BIGINT come from?

Apparently MySQL is smart enough to know that substracting two INTEGER values will require a BIGINT to store the result, but for some bizar reason it copies the UNSIGNED bit from the source values, which is odd for a function that is used specifically to work with negative values. It works if you change the column types of d and e to SIGNED, but that's not acceptable because that changes the allowed values, both by allowing negative numbers and by halving the range of values that can be stored.

The only thing you can do about this, apparently, is to cast d and e to SIGNED when putting them into the ABS(). I will let you guess how I feel about this.


PostgreSQL, of course, does not have this problem. PostgreSQL does not do UNSIGNED in the first place, but they also test their functions before publishing them.

zondag 22 juni 2014

Codespaces died an unnecessary death-by-naivety.

Some time ago a company named codespaces had a problem. An evil person had gained access to their cloud and deleted so much data that their core-business was basically erased from existance.

So why do I call this unnecessary and naive? Because what happened to them is textbook stuff and indeed you 'll find this in any book about security. Even "The Daily Worse Than Failure" has stories like this every week.

1. Never put business critical data on servers you do not fully control.
There is a reason why firewalls use IP-based security. Only a handfull of people at codespaces needed to be able to login, the rest of the world should not even have been able to ping the server, let alone get to the login page.
Controlling your own servers also means that you have the ability to physically disconnect them from the internet. When stuff begins to disappear, you unplug, stopping the hacker dead in his tracks. Downtime is a nuissance, losing data can, well, cost you your business.

2. Never use the same credentials for multiple servers.
Credentials get compromised, that's just how it is, and that's why you want to limit the amount of access each of the sers of credentials grant. Putting everything behind the same set is just plain stupid. Sorry, there is no other word for it.

3. Always keep your backups separate from the server.
Backups are meant to help you recover from disasters like, say, a hacker stealing your admin credentials and erasing your data. In the 'good old days' people put data on tapes and physically took those tapes to a safe, and a copy to a safe in another building. Why? In case of a burglary, or a fire, or... heck, whatever happened that caused the data to get destroyed. Today apparently it is perfectly acceptable to just have a file "in the cloud" and stick it behind the same password that can erase the original data...

4. Standby-servers are not a luxury.
When the hacker started to shut down servers, backupservers should have kicked in to continue the service. Having different passwords and separate backups, the customer would probably not even have noticed that there was ever any problem.


For those of you who think "well, hind-sight is always 20/20".... no. These points are common knowledge that any syadmin should know by heart. CodeSpaces fell for the cloud hype and it cost them theyr company. It's sad but any sysadmin would have told them this was a very bad setup.

dinsdag 17 juni 2014

Why I like PostgreSQL more than MySQL: The query log.

MySQL can log queries, but only those that worked. Queries that fail are not logged at all. If you don't build some feature into you code that logs errors, and does so in a reliable way, then you're in for hours of debugging.

PostgreSQL logs errors to it's main log, so you can find them regardless of what the application decides to do.

maandag 16 juni 2014

The power of naivety: mobbing brewers to give up their secrets

Sometimes I wonder just how naive and gullable people can be.

The "foodbabe of http://foodbabe.com/  was concerned that there might be something relly nasty in beer, like gycol, so she started a petition to ask brewers to tell their customers what's in their product. Lots of people signed it  and now brewers are "giving up their secrets".

So what's my problem? It is that apparently, 43k people think multi-billion dollar industries that live of the fact that only they know how to produce a particular drink would ever actually tell you the recipy.. They don't do that, end of story. Not for good ingredients, not for bad ones, and no, not even if they put glycol in there, the will never, ever tell you. 

Yes they are putting up lists of ingredients, and guess what, it's the generic recipy for beer. Wow, I'd never have guessed.

So what has this incredibly naive women achieved? She's made 43k people believe that they have the power to do anything by singing a petition.

Well done.

zondag 8 juni 2014

"SQL is hip again"

When NoSQL first became popular, the entire SQL community tried their best to point out that relationless databases are simply not an option for most applications. Still, people with little or no knowledge of SQL flocked to NoSQL because of how "easy" it was to add data and find data... after learning the new query syntax.

But now people seem to be coming to their senses: http://www.theregister.co.uk/2013/08/30/google_f1_deepdive/

Obviously this is in response to a move by Google (because whatever Google does must be the best thing anybody can do, right? It's not like they started using NoSQL in the first place... oh wait....) but at least this makes people re-think their choices, and perhaps save them from destroying their data by ignorance.

zaterdag 7 juni 2014

Why I like PostgreSQL more than MySQL: GET LOCK works.

MySQL uses GET_LOCK('lock_name'); to create an advisory lock. However, if you successfully get a lock, lets say 'A', and you then try to get lock 'B' during the same session, you will silently release lock 'A'.

Not only does this mean that you can never use more than one lock at a time, which is just silly, it also means that you cannot trust any lock you create to actually stay in place until you release it. Why? Because you have no way of doing that, that's why you are using the database in the first place.

But perhaps most worryingly, MySQL does not even issue a warning about releasing the first lock. You have no way of ever detecting that any lock is released implicitly. You don't notice this problem untill your data is corrupted to the point where users start to complain, and then you'll have a lovely time debugging this.


In PostgreSQL you can use pg_advisory_lock() as often as you like to make any number of locks without ever accidently releasing any of them.

What you want is not always what you need.

A few days ago I read a question on the PostgreSQL mailinglists where someone asked if there is an Arduino library to connect to PostgreSQL. He had a network of hundreds of sensors worldwide, measuring weather data and sending it back to a central database.

The short answer was "No, use HTTP" The poster was not happy with this answer and explained that his MySQL solution works just fine, and MySQL is super for supporting it and it was a real shame that the PostgreSQL community flat out refused to create this.

The community replied with a couple of arguments:
- An arduino has limited memory. A library that can speak native PostgreSQL uses up quite a lot of that memory, while the application itself does not require any of it's functionality.
- Is it wise to have a $20 piece of hardware, located somewhere in the world, be able to login directly into a database?
-What happens when the master database is forced to upgrade and the protocol changes, how can you upgrade all the Arduino's remotely, simultaneously, and with an option to revert if there is a problem?
- Connecting to the database directly requires that the database's port is open in all routers etc between the node and the server, and the first thing any sysadmin will do is close all ports he doesn't think necessary, very much including 3306 and 5432.

Using HTTP solves all these problems, by using ony a handfull of bytes to send the data, using port 80, and connecting to a regular webserver that hides the database behind a protocol that has no need to ever change.

The moral of the story is that if you ask people a question, you may not get the answer you want, but you may get an answer you need. The key is to realise that your knowledge does have limits, as does your experience. You *will* encounter people who have thought things through, or who have more experience than you. It is up to you to investigate the claims and use the knowledge to your advantage.




What is a "slow query"?

A slow query is a query that takes longer than you'd like it to take.

Note the words "longer than you'd like". There is no hard limit, no number of milliseconds that define how long "a query" should take. If it takes 6.2 seconds, then that's how long it takes and that's NOT a problem untill you have a need to do it in less than 6.2 seconds.

I have been in serious discussions (including rude words) with hosters who thought they could just pick '1 second' as a maximum runtime for a query, and kill it if it went over that limit. Their reasoning was literally that queries don't take more than a few milliseconds, so anything over  second must be a serious problem. This stems entirely from MySQL's history of not being able to cope with complex queries, and the subsequent progammingstyle of using lots and lots of very very small queries. Small queries complete quickly, and I you have no experience in working wth databases it's logical to assume that sub-second queries are the way to go.

This is of course complete and utter <rude word>

If you need to update 1.5mln rows there is simply no way that that is going to complete in less than a second, and if you run 1.5mln separate update queries that each take 2ms then the hole operation is going to take 300 seconds, which is many times as long as the single update query.

The only case where a query can take "too long" is when *you* need the task to be completed quicker.

There is no such thing as "too long", there are only "too long for your requirements" and "longer than it needs to".


zondag 1 juni 2014

Use the framework Luke!

Every single developer I have met in my carreer has written his own framework and every single one of them was full of workarounds to avoid several serious mistakes in the design of the framework, which he initially did not have time to fix, and now cannot afford to fix for fear of creating more errors.

I've never seen a homemade framework that implemented any kind of automated testing.

It's simply scary how many developers waste months of developing time just so they can write some code themselves, knowing that what they write will not be up to scratch. It usually takes a new employee to look at the code before they realize what they did wrong, but by that time, again, it's not an option to fix it.

So, do yourself a favour, go with an existing framework. Sure, it will not work they way you think it should be done, but realistically; you are not a framework designer, you have not thought it through. They have, and they have already dealt with all the poop that you have probably not even seen before.