Two of the bugs we encountered in our Golang code, which were most easily resolvable, but took us a long time to debug

Programmers may agree that writing application code is a venturesome task in itself, for you can’t predict which line could result in the most bizarre bug ever. Of course, we perform basic testing to eliminate the obvious errors πŸ€·β€β™€ But here I am talking about the most peculiarly perplexing ones. We have all been there, where we keep staring at the same code again and again, praying to every God in the world, for that buggy piece of code to reveal itself πŸ™ 🀯

Here, in this blog, we aim to summarise our journeys through 2 such bugs in our Golang application code. We hope to make your lives easier in case you encounter the same situation ✨

Bug # 1: Unexpected URL Encoding

Context: We, at BranchKey, are proud users of self-implemented bare-metal infrastructure. Hence, as a solution for object-storage, we are are exploring MinIO. A use case demands public access to private data, using pre-signed URLs, similar to what AWS S3 provides. We hoped to use the minio golang client for the same.

Problem: When using this code example from official client documentation, we got the status code 403 Forbidden with the following error:

<Error>
<Code>AccessDenied</Code>
<Message>Access Denied.</Message>
<Key>test-file.txt</Key>
<BucketName>input-files-bucket</BucketName>
<Resource>/input-files-bucket/test-file.txt</Resource
<RequestId>1712DE228B2B5EAB</RequestId>
<HostId>aa545e0e-5a90-42c0-850f-269fbf4ec210</HostId>
</Error>

Debugging: While the error clearly states that is an access issue, we started looking into the bucket access policies. Had to read tonnes of documentation!! It says everywhere that minio simulates S3 implementations, hence it was S3 policy documentation next 😢 The major distraction here was that the URL would return success, and the file was downloaded the moment we changed the bucket access to public . This made us more and more sure that we needed a custom bucket policy to resolve this issue, even though all docs said pre-signed URLs work perfectly on private buckets.

Solution: After 2 days of tedious googling, reading minio and S3 documentation, and all the stack over flow answers on the topic, this appeared! And it worked πŸ₯³ 🀩 The issue turned out to be so orthogonal to our predictions. Basically:

While converting a URL to string, Golang encodes it in a way that all the & characters are replaced by \u0026 . Just by replacing them back in the URL manually, we could make it work 😱

That was one hell of a debugging session for such a dumb error πŸ˜‚

Bug # 2: Be mindful when using goroutines

Context: In BranchKey, our authentication system is a golang application with a PostgreSQL Database. There are different entities, each related to the other in the following way:

BranchKey Entity Relation

In terms of relational schema, each of the child entities keeps a record of its parent via a Foreign Key connection to the parent entity.

[FK] denotes the corresponding Foreign Key connection

Problem: We provide APIs to manage these entities. The APIs to delete a parent entity, for example the DELETE tree API should also delete all the corresponding child branches and leaves, along with the deletion of the tree itself. Similarly, a DELETE branch request should:

We were doing all these tasks concurrently using goroutines and wait-groups, as shown in this code snippet:

wg := &sync.WaitGroup{}
for _, l := range leaves {
    wg.Add(2)
    go deleteLeafSession(ctx, l.ID, wg)
    go deleteLeafEventQueue(ctx, l.ID, wg)
}

wg.Add(7)
go deleteLeavesInTreeConfig(ctx, treeID, len(leaves), wg)
go deleteBranchSession(ctx, branchID, wg)
go deleteLeafRecordsForBranch(ctx, branchID, wg)
go deleteBranchConfig(ctx, branchID, wg)
go deleteBranchRunDetails(ctx, branchID, wg)
go deleteBranchEventExchange(ctx, branchID, wg)
go deleteBranch(ctx, branchID, wg)
wg.Wait()

Everything was working as expected, except the last function call, i.e., the record for this branch was not getting deleted from the database.

Debugging: Firstly, we made them all sequential. This fixed the issue. This meant the code is right, but concurrency is creating a problem. Hence, as a next step, we added detailed logs to each of these function calls.

Solution: Logs helped us devise that

Since all the calls are being fired at the same time (or at very small interval), leaf records are still not deleted from the database when we attempt to delete the branch record. Hence the deleteBranch() call was facing pq: violates foreign key constraint error πŸ€¦β€β™€

Thus, we pulled out the deleteBranch() function call from the asynchronous list, and made it synchronously after all the other calls were completed.

wg := &sync.WaitGroup{}
for _, l := range leaves {
    wg.Add(2)
    go deleteLeafSession(ctx, l.ID, wg)
    go deleteLeafEventQueue(ctx, l.ID, wg)
}

wg.Add(6)
go deleteLeavesInTreeConfig(ctx, treeID, len(leaves), wg)
go deleteBranchSession(ctx, branchID, wg)
go deleteLeafRecordsForBranch(ctx, branchID, wg)
go deleteBranchConfig(ctx, branchID, wg)
go deleteBranchRunDetails(ctx, branchID, wg)
go deleteBranchEventExchange(ctx, branchID, wg)wg.Wait()deleteBranch(ctx, branchID, wg)

This fixed the issue. We had to perform similar changes in DELETE tree and DELETE user APIs also. As a thumb rule, being a little mindful while invoking concurrent functions never hurts 🀭

Well we continue to learn from our mistakes, and share our learnings hoping to ease up your life. For more such stories of interesting findings at BranchKey, follow this space 😎