2015 — 2017 at Omega Group (KetLogistic, LLC)
Project role — Solution architect + PM
About the company
Omega Group is a leading distributor of the automotive parts in Russia, Belarus and Kazakhstan for many commercial vehicle brands, including MERCEDES, MAN, SCANIA, VOLVO, DAF, IVECO, RENAULT. The range of brands is more than 150. The company has a reliable reputation based on deliveries on time and wide assortment in stock.
I joined the IT team in August 2015 as a Deputy Head of IT department. At the time of my work for the company, the key technologies in IT were Microsoft technologies.
The project goals
In 2015, due to the steep change in exchange rates of the Russian ruble, reduction of expenses in IT become important. The same time there was the need to improve reliability.
The progress and results of the project
This project couldn't be only about the infrastructure, it became complex due to the fact that the key system for the company (a full-custom ERP system) required a change in topology.
2015, baseline analysis
The key feature of the ERP system, incorporated deep inside of its architecture, was the autonomy of the regional nodes (instances in the regional shops and warehouses). Synchronization was implemented using database replication. But, this feature wasn't required anymore because in 2015 data channels were reliable and all the data was in the single, central location.
All the database replicas were co-located but unequal — no one would take the load of another one due to the differences in indexes, capacities, pieces of logic (functions and stored procedures) e.t.c. Therefore, there was no fault-tolerance. That was a reliability problem no. 2.
The replication was a reason for multiple failures in the data transmission due to the fact it's a channel where all the changes, even unrelated and distinct in its importance, were queued one-by-one. The more replicas we had, the more conflicts we had. All the conflicts were stopping the queue, so DBA had to react frequently, rapidly, but intelligently. I identified this as a reliability problem no. 1.
We decided that our target solution for ERP data would consist of 2 identical database clusters in the different locations, each able to take the full load. The solution gave us simple support, disaster tolerance and fault tolerance.
In addition to the ERP system, there was a lot of inefficiently deployed enterprise software, and there was also a problem with the cost of further scaling the digital PBX solution.
To consolidate the load, we made some experiments with Resource Governour in SQL Server Enterprise Edition and found out, that we can segregate the load by CPU and RAM provided that I/O isn't a constraint.
We have determined what the throughput and latency should be for the SAN that serves the ERP system DBMS. We determined what additional loads this storage system can carry. We compared several systems taken from suppliers and chose an economically and technically feasible solution (metrics were accurately calculated).
Similarly, we chose a server solution (computing power) for DBMS and confirmed the scaling limits of the entire system by experimenting.
In addition, changes in the ERP system technologies were implemented, including switching to Memory-optimized tables (instead of TempDB), unifying indexes, changing their maintanance plans, and so on.
Except ERP, we performed calculations for consolidation of the other server loads. We started a project for a fault-tolerant virtualization using Hyper-V technology and made various changes in the infrastructure.
In September 2016, we had to re-sign the enterprise Agreement with Microsoft, so all experiments and calculations had to be completed before September and we had to decide which products to renew with Software Assurance, which ones not, and which ones to refuse.
It was a comprehensive modernization of computing power and storage systems, covering hardware, licensing, ERP-system and virtualization, the full list of results of which is as follows:
- Optimizing license agreements (Microsoft EA)
- Optimization of data storage systems (SAN)
- Reducing instances of SQL Server, the transition to the Enterprise edition and the consolidation of the loads (+High-Availability)
- Consolidation of server capacity, transition to virtualization (+High-Availability)
- Reducing the need for additional PBX licenses, switching to software telephony (Asterisk)
My role in the project was to identify possible solutions, formulate hypotheses and experiments to run, consolidate results, coordinate and communicate with the business.
Gradually implementing this project until the middle of 2017, in 2016 I started to conduct the next one related to ERP and CRM system.