Zeta - Cloud Banking | Digital Payments | Tax Benefits | Automated Cafeterias
Contributing to Android, backend, and core infrastructure. Masterminded cost savings of over $250,000 annually.
About the company
Zeta is an India-based FinTech company offering digitized enterprise payment solutions. Additionally, Zeta is the backbone of the technical implementations of Sodexo.
Product suite: Cloud banking, corporate tax benefits, automated cafeterias, and employee gifting.
Presence: India, Brazil, Philippines, Vietnam, United Kingdom, Spain, Italy, Indonesia, and Romania.
Active users:10 million.
I was contributing to multiple repositories at Zeta, primarily the Android and server codebases. My contributions were to the following:
Entire Android monorepo consisting of over 25 modules used in 10+ apps shipped by Zeta.
In-house dynamic view rendering framework for mobile clients(both Android client and server side implementations).
Communication service(sends SMS, push notifications, emails, and in-app notifications to users).
Server-side client(a service that aggregates data from multiple other services and relays a filtered result to the mobile clients).
Deployment service that deploys dynamic app components onto persistent storage through which mobile clients sync data directly.
Software Engineer II*
December 2016 - Present
My path to Zeta
Back in 2016, I was pursuing my bachelor’s degree in computer science at SASTRA University. One fine morning, I found this amazing dribbble shot and got carried away. I was thinking if I could try implementing it, as I had a fairly good grasp of Android by then. Eventually, I nailed it after some iterations to get the interaction pixel perfect. By then I had isolated the interaction into an Android library and hosted it in GitHub post which GitHub rated it as one of the top trending libraries(To know more, check my Card Printer library). People at Zeta had seen that and they were pleased to hire me. I was interviewing with Zeta and I found their products extremely appealing in terms of the quality of work and the scale. Eventually, I ended up as a software engineering intern at Zeta during the final semester of my bachelor's program.
Software Engineering Intern
Implement a Roulette Wheel in Android for an internal project(we did not ship it to the public). My goal was to build it in such a way that it gives the user a verisimilitude of a Roulette wheel. I cannot share more information here due to NDA.
Fixed bugs and crashes in different parts of the codebase which eventually improved the stability of the Android app from 97% to 99.9%.
Improved the user experience of the app by providing feedbacks for user actions. Minimized the number of clicks needed by a user to make a payment.
Writing production-level code and following best practices.
Application of software engineering principles on a real-world software solution.
One of the ways to understand a huge codebase is to fix bugs and crashes that are present in the existing system.
Software Engineer I
Having exhibited traits of maintaining the codebase and improving the overall quality of the Android app, I was offered the position of Software Development Engineer I at Zeta.
My manager and one of my senior developers in the team had just moved to another team. The team was now a pair of senior developers and I. Realizing the vacuum left out in the team, I felt that I should take more responsibilities to bring down the pressure on my seniors. At crunch situations, I was agile and open to whatever tasks that came to the team, which kept the senior developers focused on time-constrained projects.
Improving UX of the app
By that time, I had already fallen in love with writing blazingly responsive apps that woo the customers with best in class user experience. Since I wanted to give the users of the Zeta app also a similar experience, I worked closely with the design team to offer the best possible experience to the users of the Zeta app. I thought of two things that could potentially make the users more comfortable with the app:
Tweak the existing features to provide a better experience to users(with priority for the low hanging fruits).
Build the new features meticulously, with a great emphasis on user experience.
Adding support for multiple IFIs
A Supercard is a debit card that has multiple cloud cards within it. When a transaction through a supercard is initiated, the best-suited cloud card will be debited based on the merchant category code(Eg: Meal, Medical, Communication, etc). Zeta has been supporting only one International Finance Institution(IFI) at that time - Ratnagar Bank Limited(RBL). Each IFI can issue its supercard on the Zeta platform. Time had come where we had to support multiple IFIs. The following changes had to be done:
The app should be revamped in such a way that it’s intuitive for a user when he has super cards of multiple IFIs.
At any screen, the user should be able to recognize the current IFI he’s on, as his current IFI will be used for transactions(there could be multiple supercards per IFI too).
When the user relaunches the app, his current super card should be the one that he last set in the previous session of the app.
Solution: App wide theming with configurability
User wants to check the current IFI
Use custom themes for each IFI and use it across the app.
Toggle between IFIs
Allow users to swipe between supercards to toggle the primary IFI. As a result, change the app theme accordingly.
Retain current supercard on app kill
Persist current supercard’s metadata on shared preferences. Avoid storing sensitive card information in the client app.
This solution reduces the cognitive workload on the user in remembering the current IFI of his wallet. At any point in time, he can use the current app theme to identify his current IFI. When the user switches between IFIs, sync the new IFI to the server only if it’s different from the current IFI(as there can be multiple supercards under an IFI).
Use a view pager that holds super cards. When the user swipes, gradually transition between the theme colours for a fluid user interface. One of the important challenges was achieving the 60fps mark with a dynamically changing gradient background when the view pager is swiped. For changing the background gradient, create a GradientDrawable. Set the colour based on the configuration dynamically when the super cards are scrolled. For the gradual transition between colors, interpolate the current color by using the distance of swipe as input to ARGB evaluator, which determines what percentage of old and new colours should be used in the current frame.
Organize cloud cards of different supercards intuitively
User had cloud cards in a separate tab on the home screen. Different super cards could have overlapping/non-overlapping cloud cards. Users who had different IFIs didn’t have the luxury to easily differentiate on which cloud cards belonged to which cloud cards.
Solution: Group cloud cards under different super cards
To solve this, we discussed with the design team and the CTO to discuss the possible UX improvements. We decided that grouping the relevant cloud cards under super cards as it provides an intuitive interface for the user to quickly scan through the list of cloud cards under his current super card.
Since I was given the freedom to implement the feature end to end, I had to think of an implementation that not only solves the current problem but also prepares the codebase for future changes. The following were my thoughts before the implementation:
Keep the solution so generic that given a list of entities of type A, a list of entities of type B, and a mapping of entity A to a list of filtered entities of type B. In this case, Type A -> Super cards and Type B -> Cloud cards.
Performance bottlenecks are highly probable, especially when the user switches from one super card to another(Type A -> Type A). In that case, I should refresh the cloud cards(Type B) and show only the new cloud cards under that super card.
Constraints and tricky parts
Performance: Swiping between items of type A(super cards) should refresh the app theme and should update the recycler view with the latest typeB items(cloud cards) for that super card. The frame rate would take a toll if the implementation is incorrect.
View caching: Leverage recycler view’s multi-layer cache for reusing different views across different items of typeB(cloud cards).
Posting to UI thread:The message queue of recycler view and view pager should be used carefully to preserve the order of animations.
Use a recycler view that holds items of type B(cloud cards) below each view in the view pager holding items of type A(super cards). When the user horizontally swipes between items of type A(super card), animate the recycler view subtly to tell the user that the items in the recycler view list(of type B) have been updated.
We’ll see more on the later sections on how this design made it easy for implementing more items of typeA in addition to supercards and typeB items in addition to cloud cards.
October 2017 outage @ Zeta
It was the day when both Amazon and Cisco employees were supposed to start using Zeta for their benefits offered by the respective corporates. A major chunk of our transactions comes through digital meal vouchers, where the number of transactions is high and the amount per transaction is less. Consequently, our servers see high traffic during breakfast, lunch, and dinner. Since we had an additional 20k users that day, our systems couldn’t tolerate additional traffic.
Impact of the incident
On that day due to a significantly higher volume of traffic, our systems failed terribly and almost none of the transactions went through. The message queues of our backend services were unable to handle new requests and started dropping new payment requests. On the other hand, older requests were timing out. Everybody went into a lockdown mode and started contributing to things that could improve the situation and mitigate the impact.
Our users were frustrated and started posting negative reviews in the play store/app store and in social media. Amidst all the chaos, I came across a message from one of the users during the outage:
"Dear Zeta, I’m a pregnant woman waiting for lunch and I’m unable to pay for it. Please fix the issue as soon as possible, as it’s one of the phases of life where I have to take extra care of my health. "
On seeing this mail, I was tormented by the fact that me being directly or indirectly related to that woman’s issue in some way. I tried what best I could do in helping the team out. From the mobile app perspective, we stopped bombarding our servers with redundant and unwanted requests and rolled out a patch release. We reflected on the incident later and learned from our mistakes.
It was one of the turning points of my life and brought the better developer out of me as I realized that I was working on such a huge scale and a blunder from my end is capable of impacting the normal life of people. Post that, I started testing my features thoroughly well so as not to ship out any bugs or code that is capable of a performance bottleneck. As a team, we matured over time and we now have a detailed procedure for deployment and incident response. We had very less number of incidents post that, with most of those falling under low severity category.
Innovation - AI based payments
It was the time when Sodexo was considering Zeta to be their technical platform. Zeta had to pitch what they do and what they are capable of and why Sodexo should use Zeta as their technical platform. CEOs and other high profile executives of Sodexo from different countries had come in. Though we were already well prepared for the pitch, we thought we could do something that is out of the box - that one feature that could seal the deal for Zeta.
Payments through Google Assistant
Though highly ambitious, me and my peer decided to implement payments via Google Assistant platform leveraging the Zeta wallet present in that device. We built it over a weekend, which put us in high spirits. You just say, “Hey Google, get me a sandwich” and the order automatically goes to your cafeteria, the cafeteria prepares your order and delivers it to your place through a butler. You can monitor the status of your order in the Zeta app and you get push notifications when your order passes different stages.
We had to prepare a working prototype that is capable of making payments through the Google Assistant platform on the production environment. Assuming that the user has authorized Google Assistant to make payments through Zeta, we routed the user's intentions from Google Assistant to our proxy server that sits in front of the Zeta’s backend services. This proxy server resolves the user’s intentions into a payment request and then initiates the payment, as the user has already authorized the Google Assistant platform to make payments on behalf of the user. This payment notifies the cafeteria about the order which the user had placed. The cafeteria prepares the food and serves it to the user.
Optimizing developer efficiency
Older way of testing zetlets
A typical testing cycle for a zetlet required the developer to push the changes to the servers in the dev environment and the app had to sync the latest version of the zetlet from the server before verifying the changes. Each step in the testing cycle takes a considerable amount of time and so building a simple zetlet with a couple of iterations took around half an hour for a novice zetlet developer. Experienced developers who build complex zetlets had to write the code in one shot and then deploy to test it. If something broke, the developer had to debug the entire logic as there was no way of knowing what broke. Obviously, this wasted a lot of developers’ time and I wanted to solve it by some means.
Local testing and hot reloading of zetlets
I thought of testing zetlets locally and made an app, zebugger, through which you can visualize the zetlets that you are developing side by side as you write code. Developers can instantly visualize the rendered UI for the updated code in no time, as the response time for updating the UI from the code is less than a second on average. This is faster than Android Studio's instant run and doesn't require a separate APK to be installed in the developer's device. The following diagram shows how zebugger works internally:
Newer way of testing zetlets with zebugger
The new way of testing zetlets now takes very little time as you need not deploy your code to test as zebugger allows you to test locally. The following is the new way of testing zetlets:
Metrics from 7 developers were gathered and the following improvements were observed:
Average time to develop zetlets without zebugger: 30 minutes(5 iterations)
Average time to develop zetlets with zebugger: 5 minutes(5 iterations)
In addition to developer efficiency, developers had the option to attach the test plan for the pull requests through the preview available on zebugger. This in turn reduced the time taken for code review, as it was easier for the reviewers to visualize what’s changed/introduced in the new code.
App to app(A2A) payments
With an increasing demand for letting users pay for their orders through other apps, Zeta was approached to expose a mobile API that would allow users to pay through their Zeta wallet from the app. For example, a third-party food vendor like Swiggy or Zomato could integrate with Zeta’s A2A mobile API to support seamless payments through Zeta. I was given the golden opportunity to build the feature end to end in Android and decide what the implementation would be.
Build a seamless mobile API for making app to app payments through Zeta
The feature should be built with best in class user experience on top of a highly secure payment channel, without breaking PCI-DSS security compliance.
Transitions between the third-party vendor and the Zeta app should be so seamless that the user never notices that the Zeta app was opened for payment authorization.
Since the implementation was one of its kind and no payment app had ever come out with a seamless payment experience, the problem was very interesting and I had to think through all the possible edge cases before designing the solution. The following thoughts were running in my mind:
The Zeta app has an API exposed for the third-party vendor to initiate app to app payments at any time. The following is the flow diagram of an app to app payment:
Saving the deal for the organization
One of the cafeteria solutions that approached us(Tonguestun) for app-to-app payments had a hybrid app implementation and found it difficult to integrate A2A mobile API of Zeta. The deal was at stake, as they had mailed us saying that the integration was not working out and that they want to call it off. I took over and spoke to the executives and developers of Tonguestun and convinced them I would take care of the implementation and help them out in building the feature. I quickly built a hybrid demo app using Apache Cordova with app-to-app payments integration and sent the sample to the Tonguestun team. That was the day when I stood up for Zeta and saved my team and the deal. I was appreciated by my folks for showing the intent and responsibility for handling the project end to end single-handedly.
20k transactions a day through Hungerbox and Tonguestun cafeteria solutions.
Ever since released, no failures till date.
UX Champion honour from the CTO.
Software Development Engineer II
I was given a promotion to Software Development Engineer II as I showed the following traits:
Capability to handle projects end to end; breaking a complex task into small tasks and working on them.
Passion and involvement in building user centric products and features.
Taking care of software design to build scalable solutions.
Monitoring and ensuring app stability proactively.
Taking turns in app releases and being on-call throughout the release cycle.
Optimizing developer efficiency.
Ability to communicate effectively with external stakeholders.
Reacting to production issues and resolve them ASAP if developer intervention is necessary.
By now, we had hired two bright minds fresh out of college and I took that opportunity to mentor them. Having learned what a typical fresher might experience in the industry, I used my accumulated knowledge to guide them on the right path. I was open to discussions and we discussed different possible solutions to a problem before we settled with the best one.
In addition to what I was doing earlier, I strongly felt that I should take more responsibilities, as it was just me with one of the senior developers and two freshers in the team by now. There were a couple of things that I brought into action:
Bring in new knowledge from outside and share it with the team. If something is new and nobody in the team knew about it, very well, take a small session on what you've learned. It need not be precise and that everything has to be covered at one shot. A simple superficial introduction to the topic would do, explaining what it offers, the trade-offs, and how that solution actually solves the problem in hand.
Since Zeta had grown up in terms of the number of products and the headcount, it was necessary to tell people at Zeta on what the mobile engineering team was doing and why. This initiative was done with the intent that our findings and patterns might help someone find something or solve some other similar problem in the organization.
Improving the app performance
Additionally, I started picking up the tech debts that we had. My top priority was to improve the responsiveness of the app and the app startup time.
I performed the following tasks to improve the app performance:
Operations that could be offloaded from the main thread were found out and were transformed into asynchronous callbacks running on a background thread. For example, reads and writes to databases and shared preferences could be offloaded to IO thread pools.
In case of multiple listeners listening to an event, process those listeners on a separate background thread pool and execute them asynchronously, without blocking each other.
Use on-demand cache instead of eagerly populating cache(of data from disk) unless needed.
Removed overdraw to improve the UI rendering performance.
Flattened complex view hierarchies to reduce the number of runs needed to render the UI.
Efficiently used GPU for alpha value computations.
Implementing these changes boosted the app performance, as the 90th percentile of time taken to render a frame was well under 16ms.
Total frames rendered: 1304
Janky frames: 68 (5.21%)
50th percentile: 6ms
90th percentile: 13ms
95th percentile: 16ms
99th percentile: 30ms
Number Missed Vsync: 10
Number High input latency: 1042
Number Slow UI thread: 20
Number Slow bitmap uploads: 1
Number Slow issue draw commands: 4
Number Frame deadline missed: 23
New features on zetlets
Having worked on top of zetlets for some time, I felt that more features could be added to the framework to make it even more powerful. To make it easier for zetlet developers to build complex user interfaces, I did the following:
Support for making API calls with the ability to handle success and error responses.
Integration with databases and support for live data.
Support for storing data in both in-memory and in persistent storage, with support for both scoping out the data based on each zetlet and global scope to share data throughout the app. The global data was designed to be accessible by zetlets, native code and web views.
Zetlets - deployment optimizations
The existing script for deploying all the zetlets on the server was very slow. To optimize the deployment time, I found bottlenecks in the existing deployment script that was leading to poor performance. On investigation, I found that all the network IO requests were processed serially. Since the zetlets are independent components that are decoupled from others, they could be deployed asynchronously. By efficiently using multi-threading, the time to deploy all the templates was brought down significantly. The scripts were based on Ruby.
*TTD - Time to deploy
*All metrics were measured on a 15 inch Macbook Pro => 2.2GHz Intel Core i7 + 16GB RAM
Test driven development
Since Zeta was scaling exponentially, I strongly felt the need for a test-driven development(TDD) environment to ship stable features. A TDD based software would be robust and easy to maintain as tests help developers identify new bugs introduced on changing minor parts of the codebase in addition to testing all the edge cases of the logic.
At Zeta, the zetlet based bugs were mostly around the following:
Missing null/undefined checks. If some data is not defined, the entire zetlet doesn’t render.
Edge cases of some logics being missed.
No record of what test cases are being covered by the existing logic.
The existing zebugger based solution was good for UI testing, but not for testing logic, as the tests for business logic should be decoupled from UI tests. If a unit testing framework is introduced, the efforts and time put in development and fixing the bugs later would be greatly reduced. Thus, I felt that a unit testing framework for zetlets was the need of the hour.
I added support for writing unit tests on top of zetlets, apart from UI testing through Zebugger that I had done already. The unit tests were run on a NodeJS environment. To ensure stability, I created a Jenkins CI job that automatically runs all the unit tests for every deployment and alerts the developers if the tests are failing.
The initial version of the unit testing framework had the following basic functions:
Running all tests
Running individual tests
Support for testing the actual functions involved in computation with ease.
Support for multiple tests with customizable data per test.
Since building the minimum viable test framework was the need at that time, the focus was not given to building a variety of features in the framework.
The introduction of a unit testing framework greatly reduced the chances of shipping buggy code. A pull request for a zetlet change would be approved only if the logic had a sufficient number of unit tests and if all the tests pass. This change, in turn, had the following impact:
The number of runtime errors leading to incorrect UI diminished.
Crashes based on invalid data access were reduced.
Contribution to the backend
Zeta follows a microservices architecture for the backend and I got the opportunity to work on some of the services. There were times when we had a severe shortage of developers around the organization. In that situation where people already had a lot to do in the backend(since Zeta was scaling so fast), it would be unfair to burden them with more mobile client-specific server tasks. I stood up and started picking up server tasks. This in turn improved the efficiency of the mobile engineering team, as we need not wait for backend engineers to finish building a feature. I built features end to end, starting from the backend to the integration of the features on the mobile client. The following are some of the microservices to which I had contributed:
Zetlet deployments: Responsible for taking in zetlets and deploying them on to persistent storage through which mobile clients can sync directly.
Server side client: Responsible for manipulating and sending data to mobile clients to present the same on UI with backward compatibility. This service was brought into action to reduce the number of resources used by the mobile clients, especially the low-end devices.
Over time, I started concentrating more on contributing to the backend services at Zeta. I was then presented with an interesting problem of building the backend of a notification management system.
Build a highly scalable, customizable, multi-tenant notification management system
Decouple backend services from the burden of manually contacting the notification server to send any form of communication to the user
Don't send unwanted notifications if there is no need to send them, to cut down expenditure.
Flexibility to add new channels of notification(like WhatsApp, etc.) on the fly.
Native support for multi-tenancy.
Custom notification content for various business units under a client.
GitOps driven deployment and data storage.
Initially, we wanted to provide a notification management system as a value-added service to our banking suite of products, but then during the ideation phase, we realized that we could generalize a lot of components in the system to make it adhere to any business use case. Hence, we built it as an independent product per se. To achieve what was conceptualized, the notion of 'Platform' and 'Tenant' were introduced. The platform is an organization or a product that is leveraging the notification management system and a tenant is a business unit within an organization or a consumer of a product.
Notifications are one of the areas where a user should be notified as soon as possible to prolong customer engagement. Building high-performance system was one of the key areas of focus in the design. Different perspectives had to be taken to understand the pros and cons of each viewpoint and arrive at the best possible strategy to provide high performance while retaining the dynamism of the system. The main bottleneck was the computation of notification policies which are immensely dynamic and require most of the compute time.
Resilience and fault tolerance
To ensure that no message is dropped at any point in the system, we leveraged Kafka streams by consuming various events occurring in the system in an error-resilient manner. Using Kafka would save a lot of development time as working on the fault tolerance and resilience of the system are already taken care of.
Git driven data storage
We were also building a dashboard for managing the notification policies, content, and mode of communication for every event in the system(with restricted access at various user levels). To better understand the changes done to the data to date, having a version control system was necessary. Hence, we routed all our requests to git repositories where we maintain the data changes with only the current snapshot loaded into the database for quick access. At any point, any version of the data could be deployed.
As we had a basic notification service in our backend ecosystem, whatever was there already had to be ported into the new system with backward compatibility.
The diagram below shows a very basic outline of what the system does internally. However, I'm not supposed to disclose more details as per Zeta's NDA.
The initial version(V1) that I built saves approximately $250,000/year for Zeta by cutting down unwanted expenses on notifications, with the main culprit being the high cost per SMS notification through third-party libraries.
The V2 implementation handles 13 million requests per hour with the 99th percentile latency for notification policy computation being a mere 8ms.
The notification management system is now an independent product, which is being sold to other businesses.
1. The contributions mentioned previously were authored by me. Two developers joined me towards the latter half of the project and they made the following contributions:
Generic modes/media of notification configurable in runtime.
Aggregating metrics to visualize what’s going on in the system.
Receivers and receiver groups(users can opt-out of notifications for specific or all channels of communication for all the categories of available notifications).
2. Front end dashboard for customers was built by a separate team.
Tech Thursdays are fortnightly sync ups between all the tech teams at Zeta. Developers update their peers on what’s changing in the organization in terms of tech and the interesting problems that they’ve solved over the last couple of weeks. I’ve represented the mobile engineering team and have presented talks concerning Android and backend. The talks were about what’s being done interestingly in mobile clients, how we manage to solve those problems given the fewer resources available for computation in mobile clients, and interesting components that I've built to optimize backend developers' productivity.
Zeta Annual Confluence
Zeta’s annual confluence is about what has changed in the organization over the last year and what challenges were overcome by different business units. I’ve spoken in three of the confluences representing the teams I worked for.
Knowledge sharing sessions
Weekly knowledge sharing sessions help developers learn what’s new in the market for them. I initiated this learning process within the team so that we are never behind the world in terms of the latest tech. Developers give a basic overview of some latest tech or a library that solves a problem that we face. People are not expected to be masters of those implementations as such. A superficial talk on what problem is being solved and how that solution could be used by us to build better apps is the expectation from the speaker.
Learnings and takeaways
Collaboration with multiple teams to build a feature
Some features needed collaboration with multiple teams in the organization. To ship a feature successfully, the collaboration between teams is the key. Understanding trade-offs between different implementations and how it affects other teams is the key to understand a system end to end.
Focusing more on software design over implementation
A good design will help you extend the features easily. Jumping directly onto the implementation might not only lead to poor software design but also increases the time spent in refactoring the logic if changes are required at a later point in time.
Writing testable code
Writing tests for your system is one of the best ways to ensure the robustness of the system. Tests give you the confidence when you ship code to production, as you know in advance that there are no chances of introducing bugs if all the cases are covered with tests.
Optimizing developer efficiency of a big team
When the size of the team increases, doing redundant and inefficient work impacts not only you but also the whole team. If you know something that could improve the performance of the team, proactively pick it up to save the developer bandwidth. This is very useful in small teams where there are fewer developers but more features had to be shipped. Few instances were I did this are when I built a local testing framework for zetlets and an annotation-based cache(which reduces the average number of lines of code for cache from 35 to 5 per Java class) for backend services(that use an internal framework) with support for runtime configurations, sync/async operations and sync/async cache.
Analyzing trade-offs of various approaches and taking the best approach
Mentoring people and getting to know their perspectives
Failures are the stepping stones of success. When your juniors make a mistake, accept that it’s a part of the software development process and every human is liable to make mistakes. Even if the error impacts the production environment, keep them comfortable, and do what could be done right away to fix it/mitigate the impact.
Engineered and implemented the entire Android app single-handedly
Contributed to a dynamic UI framework on Android, used by National Payments Corporation of India(NPCI) and Axis Bank.