Для !Ъ: раз срач, два срач
Контекст: PyParallel — форк py3 с сохранением GIL и прочего, умеющий в многоядерность и линейно (!) скалящийся при этом. Проект в альфе с прицелом на то, чтобы влиться в py4. На данный момент оно состоит из платформо-пофигистических изменений в cpython и жёстко привязанной к оффтопику платформозависимой части (но если оно будет вливаться в cpython, то кроссплатформенным оно рано или поздно будет). Производительность выглядит неплохой для альфы.
Trent Nelson, разработчик всего этого а заодно core python commiter, высказывается:
(цитата для Ъ)
You could port PyParallel to Linux or OS X — there are two parts to the work I’ve done: a) the changes to the CPython interpreter to facilitate simultaneous multithreading (platform agnostic), and b) the pairing of those changes with Windows kernel primitives that provide completion-oriented thread-agnostic high performance I/O. That part is obviously very tied to Windows currently.
So if you were to port it to POSIX, you’d need to implement all the scaffolding Windows gives you at the kernel level in user space. <...> you’d have to manage your threadpools yourself, and each thread would have to have its own epoll/kqueue event loop. The problem with adding a file descriptor to a per-thread event loop’s epoll/kqueue set is that it’s just not optimal if you want to continually ensure you’re saturating your hardware (either CPU cores or I/O). <...>
As soon as you issue a blocking file I/O call on one of those threads, you have one thread less doing useful work, which means you’re increasing the time before any other file descriptors associated with that thread’s multiplex set can be served, which adversely affects latency. And if you’re using the threads == ncpu pattern, you’re going to have idle CPU cycles because, say, only 6 out of your 8 threads are in a runnable state. <...> The best example of how that manifests as an issue in real life is make –jN world — where N is some magic number derived from experimentation, usually around ncpu*2. Too low, you’ll have idle CPUs at some point, too high and the CPU is spending time doing work that isn’t directly useful. There’s no way to say make –j<just-do-whatever-you-need-to-do-to-either-saturate-my-I/O-channels-or-CPU-cores-or-both>
<...>with Windows, it’s a completely different situation. The whole kernel is architected around the notion of I/O completion and waitable events, not “file descriptor readiness”. This seems subtle but it pervades every single aspect of the system. The cache manager is tightly linked to the memory management and I/O manager – once you factor in asynchronous I/O this becomes incredibly important because of the way you need to handle memory locking for the duration of the I/O request and the conditions for synchronously serving data from the cache manager versus reading it from disk. The waitable events aspect is important too – there’s not really an analog on UNIX. Then there’s the notion of APCs instead of signals which again, are fundamentally different paradigms. The digger you deep the more you appreciate the complexity of what Windows is doing under the hood.
Дискасс.
TL;DR: ему лень писать уйму низкоуровневого клея для линуксов когда в винде внезапно оказались подходящие примитивы, с которыми всё просто работает.
Перемещено true_admin из talks