TONT 39723 为什么系统不能只休眠单个进程

一万年过去了……

原文链接:https://blogs.msdn.microsoft.com/oldnewthing/20040420-00/?p=39723

Windows lets you hibernate the entire machine, but why can’t it hibernate just one process? Record the state of the process and then resume it later.

Windows允许你休眠整台机器,但为什么不能只休眠单个进程呢?记录下进程的状态,并在稍后再回复就好了嘛。

Because there is state in the system that is not part of the process.

因为系统中的某些状态并非是进程的一部分。

For example, suppose your program has taken a mutex, and then it gets process-hibernated. Oops, now that mutex is abandoned and is now up for grabs. If that mutex was protecting some state, then when the process is resumed from hibernation, it thinks it still owns the mutex and the state should therefore be safe from tampering, only to find that it doesn’t own the mutex any more and its state is corrupted.

例如,假设你的应用程序获得了一个互斥锁,然后这个进程被休眠了。哎呀,现在这个互斥锁就被抛弃了,开放给了其它进程进行获取。如果这个互斥锁是用来守护某种状态的,那么当刚才那个进程从休眠中恢复之后,它会认为它仍然拥有这个互斥锁,而这个锁保护的状态应该不会被修改,然而实际上不光是这个进程已经不再拥有这个互斥锁,其所守护的状态也已经完全不是那个样子了。

Imagine all the code that does something like this:

设想如下代码:

// assume hmtx is a mutex handle that
// protects some shared object G(假设hmtx是某个互斥锁的句柄,用以守护某个共享的对象G)

WaitForSingleObject(hmtx, INFINITE);

// do stuff with G(对G进行操作)

// do more stuff with G on the assumption that
// G hasn’t changed.(在认为G不会被其它外来操作更改的前提下,对G进行更多的操作)

ReleaseMutex(hmtx);

Nobody expects that the mutex could secretly get released during the “…” (which is what would happen if the process got hibernated). That goes against everything mutexes stand for!

不会有人期望进程所持有的互斥锁在『…』这部分时会被悄悄释放掉(然而如果进程被休眠了的话则有可能发生),这就完全违背了互斥锁存在的意义。

Consider, as another example, the case where you have a file that was opened for exclusive access. The program will happily run on the assumption that nobody can modify the file except that program. But if you process-hibernate it, then some other process can now open the file (the exclusive owner is no longer around), tamper with it, then resume the original program. The original program on resumption will see a tampered-with file and may crash or (worse) be tricked into a security vulnerability.

再举个例子,你打开了一个文件,用的是独占打开的方式,这时候程序会很高兴地认为除了它自己没有人能修改这个文件。但如果此时对程序进行了进程休眠,那么其它进程现在就可以打开这个文件了(因为独占所有权已经不存在了),在里面搞搞乱,然后把被休眠的进程再喊醒。此时从休眠中醒来的程序看到的将是修改后的文件,搞不好会崩溃,或者(更糟糕的是)掉进了一个安全陷阱里。

One alternative would be to keep all objects that belong to a process-hibernated program still open. Then you would have the problem of a file that can’t be deleted because it is being held open by a program that isn’t even running! (And indeed, for the resumption to be successful across a reboot, the file would have to be re-opened upon reboot. So now you have a file that can’t be deleted even after a reboot because it’s being held open by a program that isn’t running. Think of the amazing denial-of-service you could launch against somebody: Create and hold open a 20GB file, then hibernate the process and then delete the hibernation file. Ha-ha, you just created a permanently undeletable 20GB file.)

一种替代方案是将属于被休眠进程的所有对象进行保留,但这样一来会产生新的问题——删不掉的文件,因为它被保持了打开状态,而打开它的进程此时根本就没有在运行呢!(甚至于,为了在重启之后这一机制仍然有效,重启之后这个文件还得重新被置于打开状态,现在你就有一个连重启之后也删不掉的文件了,即使打开它的程序根本没有在运行也一样。基于这一机制,你可以向别人发起绝妙的拒绝服务攻击:建立一个20GB的文件,保持其打开状态,然后将进程休眠,再删掉休眠文件,吼吼,这下你就创建了一个永久无法删除的20GB文件了。)

Now what if the hibernated program had created windows. Should the window handles still be valid while the program is hibernated? What happens if you send it a message? If the window handles should not remain valid, then what happens to broadcast messages? Are they “saved somewhere” to be replayed when the program is resumed? (And what if the broadcast message was something like “I am about to remove this USB hard drive, here is your last chance to flush your data”? The hibernated program wouldn’t get a chance to flush its data. Result: Corrupted USB hard drive.)

那么,如果被休眠的程序创建了窗体,那么当进程处于休眠状态时,这些窗体的句柄是否还应当有效呢?此时向这些句柄发送窗体消息又会怎么样呢?假设这些句柄不应保持有效的话,那广播的窗体消息对其来说又如何呢?这些广播的消息是否应该『保存在某处』,然后等程序从休眠中被唤醒时,再将这些广播的消息『重放』给它呢?(比如,广播的消息是类似『即将卸载USB硬盘,现在是你讲数据进行写入的最后机会』这样,那这个休眠了的程序就没有机会完成数据写入动作,结果就是USB硬盘上的数据损毁。)

And imagine the havoc if you could take the hibernated process and copy it to another machine, and then attempt to restore it there.

还可以再考虑一下将休眠后的程序复制到其它机器上、然后在新机器上对其进行唤醒时将造成的混乱。

If you want some sort of “checkpoint / fast restore” functionality in your program, you’ll have to write it yourself. Then you will have to deal explicitly with issues like the above. (“I want to open this file, but somebody deleted it in the meantime. What should I do?” Or “Okay, I’m about to create a checkpoint, I’d better purge all my buffers and mark all my cached data as invalid because the thing I’m caching might change while I’m in suspended animation.”)

如果你想给自己的程序增加『检查点/快速唤醒』之类的功能,应当自行进行实现,而这也不可避免地要应对如上列出的各种问题。(比如『我想打开这个文件,但(休眠)期间有人把这个文件删掉了,我该怎么办?』、『好的,现在我要创建检查点了,最好是把缓冲区里的数据处理好,然后把所有缓存下来的数据都标记为无效,毕竟休眠期间我还存下来的这些数据都有可能会发生更改』之类的事情。)

Comments

  1. 这篇文章的原文已经是十五年前的了,那时候iOS和Docker还没出生呢。

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注

 剩余字数 ( Characters available )

Your comment will be available after auditing.
您的评论将在通过审核后显示。

Please DO NOT add any links in your comment, otherwise it would be identified as SPAM automatically and never be audited.
请不要在评论中插入任何链接,否则将被自动归类为垃圾评论,且永远不会被提交给博主进行复审。

*