To minimize the impact of backing up your data, memoQ server uses Microsoft’s Volume Shadow Copy Service (VSS) that creates a snapshot of your disk. This snapshot allows you to work and make changes to the data in the disks, while at the same time, you can still consume and process the very same data as if it had remained unchanged. Let us put it in simple words: Imagine an artist is working on your portrait painting and takes a picture of you, so you do not need to remain standing still for the rest of the day until the work of art is ready!
The files in the snapshot can be accessed via external code libraries, while the files in the “regular” file system can be accessed natively. To create a backup, memoQ needs to access all of the files in the snapshot and make a copy of them.
When memoQ requested access to a file in the snapshot, our old VSS library built up a catalog of all the files, and passed them on as a giant in-memory array to memoQ. When a memoQ server had thousands of hundreds of files, building up this catalog could take hours for two reasons: first, because it became many gigabytes large. Second, because all the data had to be kept in memory, and Windows had to use the paging file heavily. This led to an insane memory usage, required a lot of computing power, and created a further impact on the server’s performance.
Lately, we updated the library, and now it quickly returns files one by one, instead of building up that gigantic catalog (using an iterator instead of the array). This updated library is being used by memoQ server versions 8.4 and 7.8.13. On servers with HDD storage, this change reduced the backup time by about 50%. The performance improvement is much lower for servers with SSD drives, since they can handle paging files a lot faster. However, if you do not have much RAM in your server, the overall performance gain may be significant even for SSD-based instances – as memory consumption also drops significantly.
Another factor of backup time is the progress bar for the backup task. Until now, during backups, memoQ server had to iterate over all of the files twice: The first time to calculate the total size of each task and the second one to measure progress status on each of them. We observed that this double calculation accounted for 10% of the backup time.
Backup time also depends on external factors, like the burden other applications put on the system, and the possibility of network congestions. This is why the actual progress status cannot be used reliably to determine the time required to finish the task — progress status may be at 87%, but the system may suddenly heat up, slowing down the process.
We were at a crossroads. How could we improve backup time without affecting the progress status information? We certainly could not keep iterating over every single file; this was the reason why the process was so slow.
After some research, we found a solution. We are now able to create a useful map that compares folders, and instead of iterating over all of the files, we now only scan large files and observe their total number. This allows us to make a good estimation of progress status. Our measurements showed that using this kind of estimation produces a 10% drop in backup time.
While doing all of this research, we also found that a big chunk of backed up data consists of deprecated log files and other dead data. Believe me, memoQ server logs collect data over years that may result in tens of gigabytes.
Best practice says these logs should be deleted regularly, but not everyone follows the rule. And because we know this may keep going for a while, from now on, memoQ servers in version 8.4 only include relevant log files in the backups. Don’t panic! We always include those you may need to debug issues or to have Kilgray Support fixed them for you. Specifically for the memoQ server log, only the two latest backup files are included.
Of course, the benefit of this improvement depends on how many old log files are sitting on your server. Those of you who clean up your disks on a regular basis will see little improvement, but those who keep forgetting to delete these files will get more benefits.