Correct Usage of Sql Transaction in Postgres From Go
Once upon a time I got a task to merge duplicate URLs in our production database. It turned out that there were a lot of same urls like
https://example.com?fbclid=5678 we wanted to merge. Code to normalize URL was easy
to develop. Code to migrate the database looked easy as weel. Until I turned the transaction on. Then very cryptic message appeared
The error message was telling me nothing. And what is worse ducking did not revealed anything. Except notes about Postgres protocol internals., which was something I did not wanted to dig into. It was a hint from @karolhrdina who have explained me the root cause well enough, so I can share my experience in my blog. Lessons learned - always work with a great colleagues, you can learn a lot from them!
Following text assume reader know about Go, particularly
lib/pq, Postgres or SQL databases in general.
Following snippet omits a lot of details, but shows the algorithm
- For all urls matching the bad pattern do
- Try to normalize and if there is normalized version then
- UPSERT new entry or increase a number of visits
- … and program crashes
The problem I was facing is connected with how Postgres protocol works. It turns out that you can’t call
Exec if you did not read all the data from previous
Query. The problem is that most of Postgres drivers including its own
libpq does fetch all the rows by default. It gives the false impression that following code is legal and works. Go driver
lib/pq does not do it.
plpgsql does handle this case well. Other language bindings fetch the data by default.
However fetching the data was not an option as there are millions of entries. And using different language than Go was impractical. Url normalization has been written in Go and there are huge differences between language parsers of URLs between languages. For example PHP handles query strings in really bizzare way incompatible with CGI BIN, Perl and any other language. One can build Go code as (C like) shared library, but that would be somewhat big effort.
Fetching data was an option, however given the number of entries, one would need to play with
OFFSET to achieve reasonable size of input data.
Do it as plpgsql
There is object called cursor. Cursors are intended exactly for this use case. They encapsulates the query and one can read a few results at the time. And
plpgsql uses cursors under the hood. It turns out
CURSOR can be used from Go code as well
So Go code changes a bit. There is no
*sql.Rows and no
Next() method to be used in the loop.
The advantages are
- No new language in the stack, neither complicated integration of Go code into another language
- It is efficient and works with one entry at the time
- There are minimal changes to Go code
- It works, data are written on
Logo by kubina@Flickr: [https://www.flickr.com/photos/kubina/912714753]