您最宝贵的资产是什么?您说对了,是数据! 在过去的十年里,归功于消费总量的增加和云技术等因素,存储成本、计算成本和许可证成本都有所下降--而数据的价值却在增加。
如果一把椅子磨损了,我们可以花少量的钱更换它,企业的PC电脑也是如此。如果我们使用虚拟化环境,用另一台服务器替换一台服务器还是很实惠的。然而,如果我们丢失了数据,就很容易陷入困境。因为不能简单地再次购买数据。业务可能会陷入瘫痪。客户的交付将受到威胁。简而言之,数据丢失会使公司处于失败的边缘,除非我们能有效的保护数据信息安全。
保护数据的一种方式是创建备份,这样您就可以在发生事故时恢复数据。 我们经常遇到硬件/软件故障、恶意软件攻击或意外自然灾害。为了避免这些情况,memoQ Server支持您备份数据,防止数据丢失。通常memoQ Server拥有大量不同性质的数据--可能包括几十万,甚至几百万个文件,以及几十或几百GB的数据。这就是为什么备份数据需要时间。备份memoQ Server中的数据需要停止 Server的运行,即使只是创建当前数据的快照并制作一个副本。如果你要在一个功能齐全的服务器上运行备份,服务器的性能将受到备份磁盘和CPU运行的限制。
为了使数据备份更容易和保护数据,memoQ进行了改进,以减少创建备份所需的时间(让我们称之为备份时间),我们很高兴地宣布,我们已经取得了显著的效果。我相信你一定希望有具体的数字或百分比来显示现在备份服务器的速度有多快! 好吧,这并不是那么直接,你将体验到的好处将取决于你的设置。让我们仔细看看这个问题。
• 如果您的公司有一个memoQ Server,数据存储在传统的SATA硬盘或SAS磁盘上,或者如果您的Server部署在这些磁盘上的虚拟机上,您将看到备份时间减少高达80%。如果以前您的Server备份时间长达10小时,现在不到5小时。 • 在SSD环境中,收益较小。您可能会体验到Server备份时间最多下降10%。 • 一般来说,项目和文档较少的小型服务器实例的效果会比项目较多的大型实例的效果少。
Geek 提醒。如果你对技术细节感兴趣,请继续阅读!
让我们来看看更多的细节,导致备份体验中的性能提升的过程。总的来说,这是三个改进的结果。请继续阅读细节。
To minimize the impact of backing up your data, memoQ server uses Microsoft’s Volume Shadow Copy Service (VSS) that creates a snapshot of your disk. This snapshot allows you to work and make changes to the data in the disks, while at the same time, you can still consume and process the very same data as if it had remained unchanged. Let us put it in simple words: Imagine an artist is working on your portrait painting and takes a picture of you, so you do not need to remain standing still for the rest of the day until the work of art is ready!
The files in the snapshot can be accessed via external code libraries, while the files in the “regular” file system can be accessed natively. To create a backup, memoQ needs to access all of the files in the snapshot and make a copy of them.
When memoQ requested access to a file in the snapshot, our old VSS library built up a catalog of all the files, and passed them on as a giant in-memory array to memoQ. When a memoQ server had thousands of hundreds of files, building up this catalog could take hours for two reasons: first, because it became many gigabytes large. Second, because all the data had to be kept in memory, and Windows had to use the paging file heavily. This led to an insane memory usage, required a lot of computing power, and created a further impact on the server’s performance.
Lately, we updated the library, and now it quickly returns files one by one, instead of building up that gigantic catalog (using an iterator instead of the array). This updated library is being used by memoQ server versions 8.4 and 7.8.13. On servers with HDD storage, this change reduced the backup time by about 50%. The performance improvement is much lower for servers with SSD drives, since they can handle paging files a lot faster. However, if you do not have much RAM in your server, the overall performance gain may be significant even for SSD-based instances – as memory consumption also drops significantly.
Another factor of backup time is the progress bar for the backup task. Until now, during backups, memoQ server had to iterate over all of the files twice: The first time to calculate the total size of each task and the second one to measure progress status on each of them. We observed that this double calculation accounted for 10% of the backup time.
Backup time also depends on external factors, like the burden other applications put on the system, and the possibility of network congestions. This is why the actual progress status cannot be used reliably to determine the time required to finish the task — progress status may be at 87%, but the system may suddenly heat up, slowing down the process.
We were at a crossroads. How could we improve backup time without affecting the progress status information? We certainly could not keep iterating over every single file; this was the reason why the process was so slow.
After some research, we found a solution. We are now able to create a useful map that compares folders, and instead of iterating over all of the files, we now only scan large files and observe their total number. This allows us to make a good estimation of progress status. Our measurements showed that using this kind of estimation produces a 10% drop in backup time.
While doing all of this research, we also found that a big chunk of backed up data consists of deprecated log files and other dead data. Believe me, memoQ server logs collect data over years that may result in tens of gigabytes.
Best practice says these logs should be deleted regularly, but not everyone follows the rule. And because we know this may keep going for a while, from now on, memoQ servers in version 8.4 only include relevant log files in the backups. Don’t panic! We always include those you may need to debug issues or to have Kilgray Support fixed them for you. Specifically for the memoQ server log, only the two latest backup files are included.
Of course, the benefit of this improvement depends on how many old log files are sitting on your server. Those of you who clean up your disks on a regular basis will see little improvement, but those who keep forgetting to delete these files will get more benefits.
© Copyright 2023. 大辞科技 沪ICP备17050550号-1 沪公网安备 31011402006110号