Discussion:
OpenVPN and Multi-Core processor
(too old to reply)
Mike
2010-05-26 03:17:37 UTC
Permalink
Hi All!

I have found out, that under Windows XP OpenVPN the one-thread program.

How I can involve all four cores of the processor intel i7 for OpenVPN?

And that for FreeBSD? OpenVPN also one-thread program?


Thanks!
Toby Thain
2010-05-26 03:43:59 UTC
Permalink
Post by Mike
Hi All!
I have found out, that under Windows XP OpenVPN the one-thread
program.
How I can involve all four cores of the processor intel i7 for
OpenVPN?
What are you hoping to achieve by this? On either client or server,
OpenVPN is typically not the only, or even most important, role, and
the operating system already schedules competing processes across the
available cores.

--Toby
Post by Mike
And that for FreeBSD? OpenVPN also one-thread program?
Thanks!
------------------------------------------------------------------------------
_______________________________________________
Openvpn-users mailing list
https://lists.sourceforge.net/lists/listinfo/openvpn-users
Jan Just Keijser
2010-05-26 06:55:08 UTC
Permalink
Post by Mike
Hi All!
I have found out, that under Windows XP OpenVPN the one-thread program.
How I can involve all four cores of the processor intel i7 for OpenVPN?
And that for FreeBSD? OpenVPN also one-thread program?
The current version of openvpn (2.x series) is a single-threaded program
on every platform. There are advantages and drawbacks to this . The main
advantages are simplicity of the code and robustness. The main drawbacks
are the fact that it does not scale very well on modern
multicore/multithreaded CPUs. There is currently a discussion ongoing
about how openvpn 3.x can be made more scalable (i.e. multi-threaded etc).

Having said that, there is hardly a need for openvpn to be
multi-threaded unless you want to host a 1000+ VPN on a single server.
There are workarounds for that where you can run multiple instances of
openvpn on a single machine. Thus the real need for a multi-threaded
approach has been also lacking.

HTH,

JJK
Jan Just Keijser
2010-05-26 08:28:39 UTC
Permalink
Hi Mike,
Post by Jan Just Keijser
The current version of openvpn (2.x series) is a single-threaded program
on every platform. There are advantages and drawbacks to this . The main
advantages are simplicity of the code and robustness. The main drawbacks
are the fact that it does not scale very well on modern
multicore/multithreaded CPUs. There is currently a discussion ongoing
about how openvpn 3.x can be made more scalable (i.e. multi-threaded etc).
Having said that, there is hardly a need for openvpn to be
multi-threaded unless you want to host a 1000+ VPN on a single server.
But in my tests, the two connected clients generate 80-percent load on the processor core
that is due to a routing issue , not an openvpn issue. On modern CPUs
openvpn takes up < 1 % CPU for 2 connected clients.
I've seen openvpn go to 80% CPU when it was trying to tunnel traffic
back into itself (which is a routing conflict).
Post by Jan Just Keijser
There are workarounds for that where you can run multiple instances of
openvpn on a single machine. Thus the real need for a multi-threaded
approach has been also lacking.
Yes, but here there are problems with the routing configuration.
It is necessary for me, that subnets of all clients were accessible to
each other.
again, this is a routing issue (or network setup issue). are you using
bridging? if so , why? do you really need it? if not, then how are all
subnets made visible to each other?

cheers,

JJK
Erich Titl
2010-05-26 08:36:05 UTC
Permalink
Hi JJK

at 26.05.2010 08:55, Jan Just Keijser wrote:
...
Post by Jan Just Keijser
Having said that, there is hardly a need for openvpn to be
multi-threaded unless you want to host a 1000+ VPN on a single server.
I am not convinced, that OpenVPN would scale that well in such a big
environment. One of the big assets (and drawbacks) of OpenVPN is that it
is implemented in user space, so the switching overhead, depending on
the architecture, may become quite important.
Post by Jan Just Keijser
There are workarounds for that where you can run multiple instances of
openvpn on a single machine. Thus the real need for a multi-threaded
approach has been also lacking.
cheers

Erich
Eike Lohmann
2010-05-26 11:14:42 UTC
Permalink
We use quagga for dynamic routing and scale ~2k openvpn connections to 4
hosts.
560 tun users with low traffic need:

PID USER PR NI VIRT RES SHR S %*CPU* %*MEM* TIME+
COMMAND
418 root 15 0 1542m 1.0g 2152 R *5.3* *66.0* 2111:25
openvpn
Post by Erich Titl
Hi JJK
...
Post by Jan Just Keijser
Having said that, there is hardly a need for openvpn to be
multi-threaded unless you want to host a 1000+ VPN on a single server.
I am not convinced, that OpenVPN would scale that well in such a big
environment. One of the big assets (and drawbacks) of OpenVPN is that it
is implemented in user space, so the switching overhead, depending on
the architecture, may become quite important.
Post by Jan Just Keijser
There are workarounds for that where you can run multiple instances of
openvpn on a single machine. Thus the real need for a multi-threaded
approach has been also lacking.
cheers
Erich
------------------------------------------------------------------------------
_______________________________________________
Openvpn-users mailing list
https://lists.sourceforge.net/lists/listinfo/openvpn-users
Erich Titl
2010-05-27 07:00:37 UTC
Permalink
Eike
Post by Eike Lohmann
We use quagga for dynamic routing and scale ~2k openvpn connections to 4
hosts.
OK, that scales it down to roughly 500 connections with low traffic. As
I said it also depends on processor architecture, some ARMs allegedly
had trouble with process switching.

But good to know...

cheers

Erich
Alexander Hoogerhuis
2010-05-28 13:48:28 UTC
Permalink
To lighten internal congestion you could run several OpenVPN instances
on different ports and then use a load balancing frontend to multiplex
the different sockets.

That way if you do have multiple cores, you can run multiple instances
that can use the resources better. :)

-A
Post by Eike Lohmann
We use quagga for dynamic routing and scale ~2k openvpn connections to 4
hosts.
PID USER PR NI VIRT RES SHR S %*CPU* %*MEM* TIME+ COMMAND
418 root 15 0 1542m 1.0g 2152 R *5.3* *66.0* 2111:25 openvpn
Post by Erich Titl
Hi JJK
...
Post by Jan Just Keijser
Having said that, there is hardly a need for openvpn to be
multi-threaded unless you want to host a 1000+ VPN on a single server.
I am not convinced, that OpenVPN would scale that well in such a big
environment. One of the big assets (and drawbacks) of OpenVPN is that it
is implemented in user space, so the switching overhead, depending on
the architecture, may become quite important.
Post by Jan Just Keijser
There are workarounds for that where you can run multiple instances of
openvpn on a single machine. Thus the real need for a multi-threaded
approach has been also lacking.
cheers
Erich
------------------------------------------------------------------------------
_______________________________________________
Openvpn-users mailing list
https://lists.sourceforge.net/lists/listinfo/openvpn-users
------------------------------------------------------------------------------
_______________________________________________
Openvpn-users mailing list
https://lists.sourceforge.net/lists/listinfo/openvpn-users
mvh,
A
--
Alexander Hoogerhuis | http://no.linkedin.com/in/alexh
Boxed Solutions AS | +47 908 21 485 - ***@boxed.no
"Given enough eyeballs, all bugs are shallow." -Eric S. Raymond
Goofy
2014-08-04 20:02:32 UTC
Permalink
I think it will be fine to have multiple core support.
I belive it is beneficial when you reboot your Server and
more than 200 VPN connections will be re-connected ?

And all new Hardware have multi core support why should openvpn not use this ?

Goofy
Gert Doering
2014-08-04 20:36:06 UTC
Permalink
Hi,
Post by Goofy
I think it will be fine to have multiple core support.
I belive it is beneficial when you reboot your Server and
more than 200 VPN connections will be re-connected ?
Indeed, it would be great.
Post by Goofy
And all new Hardware have multi core support why should openvpn not use this ?
"Nobody did the code yet".

This is a complex problem. You need a programmer that understands
parallel processes or threads, network, security, and is willing to
spend quite a bit of personal time on it - implementation, code review,
testing.

Pay me for about 6-8 weeks, and I think I can do it... but in my copious
spare time, I won't even start this, as it's too complex a task.

gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany ***@greenie.muc.de
fax: +49-89-35655025 ***@net.informatik.tu-muenchen.de
Mathias Jeschke
2014-08-04 21:24:28 UTC
Permalink
Post by Gert Doering
Pay me for about 6-8 weeks, and I think I can do it... but in my copious
spare time, I won't even start this, as it's too complex a task.
You should think about starting a crowdfunding campaign ;)

Mathias.
Ryan Whelan
2014-08-05 03:58:52 UTC
Permalink
Post by Mathias Jeschke
Post by Gert Doering
Pay me for about 6-8 weeks, and I think I can do it... but in my copious
spare time, I won't even start this, as it's too complex a task.
You should think about starting a crowdfunding campaign ;)
Mathias.
I would so contribute some money to this!
Gert Doering
2014-08-05 07:43:33 UTC
Permalink
Hi,
Post by Mathias Jeschke
You should think about starting a crowdfunding campaign ;)
This is an interesting idea.

It's not going to happen "soonish" (next thing is 2.4, which should see
the light of day in the the next few months), but maybe after that. I'll
keep it in mind...

gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany ***@greenie.muc.de
fax: +49-89-35655025 ***@net.informatik.tu-muenchen.de
Leonardo Rodrigues
2014-08-05 10:48:44 UTC
Permalink
Post by Gert Doering
Hi,
Post by Mathias Jeschke
You should think about starting a crowdfunding campaign ;)
This is an interesting idea.
It's not going to happen "soonish" (next thing is 2.4, which should see
the light of day in the the next few months), but maybe after that. I'll
keep it in mind...
Despite the fact i dont have any OpenVPN running with that much
users and, thus, that wouldnt be much an advantage to me, i would
certainly donate some bucks on that !! :)
--
Atenciosamente / Sincerily,
Leonardo Rodrigues
Solutti Tecnologia
http://www.solutti.com.br

Minha armadilha de SPAM, NÃO mandem email
***@solutti.com.br
My SPAMTRAP, do not email it
Jason Haar
2014-08-05 00:00:53 UTC
Permalink
Post by Gert Doering
"Nobody did the code yet".
This is a complex problem. You need a programmer that understands
parallel processes or threads, network, security, and is willing to
spend quite a bit of personal time on it - implementation, code review,
testing.
I think it can be hacked into place (with the right choice of OS of course)

I've effectively "multi-processor"-ed openvpn by running multiple copies
on different ports, and then using iptables to round-robin new
connections onto those backend services. ie on a 4-core processor, have
4 copies of openvpn (well, I actually have 8: 4 for udp and 4 for tcp)
running. The trick is to use "client-connect" to enable you to use a
shared ip pool amongst the different instances, but it seems to work
well (I haven't tested it at load, all I know is that incoming users are
allocated different openvpn processors and it all seems to work)

eg

iptables -A PREROUTING -i eth1 -p udp -m udp -m multiport --dports
443,500,1194,4500 -j DNAT --to-destination srv.ip.addr:3000-3003 --random
iptables -A PREROUTING -i eth1 -p tcp -m tcp -m multiport --dports
1194,3389,443 -j DNAT --to-destination srv.ip.addr:3000-3003 --random

That enables a complex openvpn client config that can iterate through a
range of UDP ports and then TCP ports before giving up, and any that are
successful at getting out whatever local firewall they have are then
redirected onto local ports 3000-3003: each of which have a separate
copy of openvpn running

I use client-connect to give a local shared ip pool and in fact make the
addresses "sticky" - ie you always get the IP address you got the first
time you connected. Obviously the pool would always need to be bigger
than the maximum number of clients - but that isn't a big deal on our
10/8 network.

This is the biggest thing I love about openvpn: the scripting triggers
it supports. You can basically make it do anything :-)
--
Cheers

Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
Gert Doering
2014-08-05 05:27:32 UTC
Permalink
Hi,
Post by Jason Haar
Post by Gert Doering
"Nobody did the code yet".
This is a complex problem. You need a programmer that understands
parallel processes or threads, network, security, and is willing to
spend quite a bit of personal time on it - implementation, code review,
testing.
I think it can be hacked into place (with the right choice of OS of course)
I've effectively "multi-processor"-ed openvpn by running multiple copies
on different ports, and then using iptables to round-robin new
connections onto those backend services.
Yes, this can be done (and this is what OpenVPN AS does "under the hood",
with slightly more magic regarding the distribution of incoming connections).

It will scale better than just one OpenVPN process, but is still not ideal
from a load distribution perspective, and - as you point out - needs help
from a client-connect script to get IP address assignment right.

gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany ***@greenie.muc.de
fax: +49-89-35655025 ***@net.informatik.tu-muenchen.de
Joe Patterson
2014-08-05 15:14:50 UTC
Permalink
So maybe what's really needed is less having multi-threading support within
a single openvpn process, but more adding some functionality that makes it
easier to get to the desired end-state, like extending the ip persistence
from a flat file to perhaps a database connection, and have a way to define
ip pools within that same mechanism. That would allow not only multiple
processes to operate off the same pool, but multiple processes across
multiple physical endpoints. Then all you'd need is a way to handle
routing the correct IP to the correct process, and I would humbly suggest
that adding support for some sort of routing protocol within openvpn
(probably rip or ospf) would be an *excellent* way of solving this problem.
Granted, this solution won't be for everyone, but for some of us it would
be ideal.

*that's* a crowdfunding campaign I'd throw some cash at.

-Joe
Post by Jason Haar
Hi,
Post by Jason Haar
Post by Gert Doering
"Nobody did the code yet".
This is a complex problem. You need a programmer that understands
parallel processes or threads, network, security, and is willing to
spend quite a bit of personal time on it - implementation, code review,
testing.
I think it can be hacked into place (with the right choice of OS of
course)
Post by Jason Haar
I've effectively "multi-processor"-ed openvpn by running multiple copies
on different ports, and then using iptables to round-robin new
connections onto those backend services.
Yes, this can be done (and this is what OpenVPN AS does "under the hood",
with slightly more magic regarding the distribution of incoming connections).
It will scale better than just one OpenVPN process, but is still not ideal
from a load distribution perspective, and - as you point out - needs help
from a client-connect script to get IP address assignment right.
gert
--
USENET is *not* the non-clickable part of WWW!
//
www.muc.de/~gert/
Gert Doering - Munich, Germany
fax: +49-89-35655025
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Openvpn-users mailing list
https://lists.sourceforge.net/lists/listinfo/openvpn-users
Les Mikesell
2014-08-05 20:21:39 UTC
Permalink
Post by Gert Doering
Post by Jason Haar
Post by Gert Doering
This is a complex problem. You need a programmer that understands
parallel processes or threads, network, security, and is willing to
spend quite a bit of personal time on it - implementation, code review,
testing.
I think it can be hacked into place (with the right choice of OS of course)
I've effectively "multi-processor"-ed openvpn by running multiple copies
on different ports, and then using iptables to round-robin new
connections onto those backend services.
Yes, this can be done (and this is what OpenVPN AS does "under the hood",
with slightly more magic regarding the distribution of incoming connections).
It will scale better than just one OpenVPN process, but is still not ideal
from a load distribution perspective, and - as you point out - needs help
from a client-connect script to get IP address assignment right.
I don't know enough about the rekeying process to know it this is
feasible, but it seems like there should be a way to use something
like apache's prefork model to spin off some number of processes to do
the cpu-intensive parts without a lot of change to the base code or
the complications of making everything thread-safe. And let the OS
spread the processes over different CPUs.
--
Les Mikesell
***@gmail.com
Gert Doering
2014-08-06 09:34:53 UTC
Permalink
Hi,
Post by Les Mikesell
I don't know enough about the rekeying process to know it this is
feasible, but it seems like there should be a way to use something
like apache's prefork model to spin off some number of processes to do
the cpu-intensive parts without a lot of change to the base code or
the complications of making everything thread-safe. And let the OS
spread the processes over different CPUs.
Each worker needs to know about SSL state, replay protection, IP routing
information, etc. - so it won't be much easier than careful usage of
threads for stuff like "here's a packet, go decrypt and hand back to
me for routing" or "here's a packet, go encrypt and stuff down *that*
socket".

gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany ***@greenie.muc.de
fax: +49-89-35655025 ***@net.informatik.tu-muenchen.de
David Sommerseth
2014-08-06 12:12:12 UTC
Permalink
Post by Gert Doering
Hi,
Post by Les Mikesell
I don't know enough about the rekeying process to know it this
is feasible, but it seems like there should be a way to use
something like apache's prefork model to spin off some number of
processes to do the cpu-intensive parts without a lot of change
to the base code or the complications of making everything
thread-safe. And let the OS spread the processes over different
CPUs.
Each worker needs to know about SSL state, replay protection, IP
routing information, etc. - so it won't be much easier than careful
usage of threads for stuff like "here's a packet, go decrypt and
hand back to me for routing" or "here's a packet, go encrypt and
stuff down *that* socket".
Just thinking aloud now, without many filters enabled. (read: uh-oh!)

The encryption and decryption using symmetric keys are really fast.
What is CPU intensive is when asymmetric encryption comes into play,
with the key exchanges and other negotiations etc. The data channel
(the network payload tunnelled over the VPN) uses symmetric
encryption, and all new connections and re-keying processes uses
asymmetric encryption, to agree upon a new symmetric encryption key.

With this in mind, it does make sense to split out the asymmetric
encryption phases to a separate core, which can allow other symmetric
encrypted traffic to flow more freely.

But, this is just splitting stuff into 2 threads. Any modern
computers doesn't really have that *few* CPU cores. (Even cellphones
seems to have at least 4 cores these days). So even though the
benefit of using 2 threads will be noticed, it can be done better.

What *if* there are 3 "thread groups"? One of these groups is a
single thread which is a SSL state manager. It keeps tracks of all
keys being used, and which state each client is in. Then there is a
"thread group" with symmetric encryption work, which basically does
the real tunnelling and takes care of the network traffic flow. But
it receives the keying material from the SSL state manager thread.
And then last "thread group" is the one taking care of asymmetric
encryption and the key negotiations.

On an 8 core box, it could scale out like this:

1 SSL state manager
4 Asymmetric encryption threads
3 Symmetric encryption threads

A 4 core box would then be simiar, just 1, 2 and 1 threads.

But! This is going to be a h*** of a lot of work. And almost
everything regarding the event management/scheduler and SSL code in
OpenVPN will be completely rewritten. In addition, it'll be a lot of
fun with the plug-ins and script support.

The advantage, how I see it, is primarily with SSL manager
process/thread. The SSL manager thread can be completely locked down
and only be accessible via a kind of internal API. *IF* this manager
thread can be a separate process, it can also be possible to lock it
down further (running as a different user than the other threads, on
Linux SELinux can further restrict its possibilities). In addition
this move can enable clustering support, where you can more seamlessly
move clients from one physical OpenVPN server to another one. The SSL
manager can exchange information through a local multicast network
with the other cluster member's SSL manager. However, one of the real
tricky things here is: How to tackle plug-ins and scripts when a
client moves from one box to another one?

Anyhow, clustering is far outside the scope of threading in OpenVPN.
But with a good thread design, this can be more easily be added later on.

<end_of_brain_dump/>


- --
kind regards,

David Sommerseth
Les Mikesell
2014-08-06 12:57:54 UTC
Permalink
On Wed, Aug 6, 2014 at 7:12 AM, David Sommerseth
Post by David Sommerseth
With this in mind, it does make sense to split out the asymmetric
encryption phases to a separate core, which can allow other symmetric
encrypted traffic to flow more freely.
But, this is just splitting stuff into 2 threads. Any modern
computers doesn't really have that *few* CPU cores. (Even cellphones
seems to have at least 4 cores these days). So even though the
benefit of using 2 threads will be noticed, it can be done better.
Every time I've seen a project that wasn't written to be thread-safe
in the first place converted to use threads, it seems like it takes
about 10 years for all of the bugs to be shaken out.
Post by David Sommerseth
What *if* there are 3 "thread groups"? One of these groups is a
single thread which is a SSL state manager. It keeps tracks of all
keys being used, and which state each client is in. Then there is a
"thread group" with symmetric encryption work, which basically does
the real tunnelling and takes care of the network traffic flow. But
it receives the keying material from the SSL state manager thread.
And then last "thread group" is the one taking care of asymmetric
encryption and the key negotiations.
I agree that threads could be more efficient, but I think there would
be low-hanging fruit from just forking a pool of worker processes
connected with sockets and having the main process hand off the slow
part of the rekeying jobs off instead of backing up the main loop.
Post by David Sommerseth
But! This is going to be a h*** of a lot of work. And almost
everything regarding the event management/scheduler and SSL code in
OpenVPN will be completely rewritten. In addition, it'll be a lot of
fun with the plug-ins and script support.
I think you'd just have to add some plumbing to the existing code to
talk to the forked instances. And with no worries about accidentally
shared variables.
--
Les Mikesell
***@gmail.com
Gert Doering
2014-08-06 13:20:46 UTC
Permalink
Hi,
Post by David Sommerseth
Just thinking aloud now, without many filters enabled. (read: uh-oh!)
:)
Post by David Sommerseth
The encryption and decryption using symmetric keys are really fast.
fast, but used very very often...
Post by David Sommerseth
What is CPU intensive is when asymmetric encryption comes into play,
with the key exchanges and other negotiations etc.
slow, but used much more seldom... assuming VPN clients that stay
connected for a reasonable amount of time, and transfer "enough" data.
Post by David Sommerseth
With this in mind, it does make sense to split out the asymmetric
encryption phases to a separate core, which can allow other symmetric
encrypted traffic to flow more freely.
But, this is just splitting stuff into 2 threads. Any modern
computers doesn't really have that *few* CPU cores. (Even cellphones
seems to have at least 4 cores these days). So even though the
benefit of using 2 threads will be noticed, it can be done better.
My rough idea was something like

- one control thread doing the IP routing and "client connection maintenance"
- one group of threads to do "decrypt incoming packet"
- one group of threads to do "encrypt outgoing packet and send away"

and of course the "control thread" could split off the asymmetric crypto
to yet another thread... this should be sufficient to at least utilize
a few more cores, until we get to bottlenecks in the central thread.
Post by David Sommerseth
What *if* there are 3 "thread groups"? One of these groups is a
single thread which is a SSL state manager. It keeps tracks of all
keys being used, and which state each client is in. Then there is a
"thread group" with symmetric encryption work, which basically does
the real tunnelling and takes care of the network traffic flow. But
it receives the keying material from the SSL state manager thread.
And then last "thread group" is the one taking care of asymmetric
encryption and the key negotiations.
Yep. Something like that :-)

Nice design, now it just needs to be done...
Post by David Sommerseth
But! This is going to be a h*** of a lot of work. And almost
everything regarding the event management/scheduler and SSL code in
OpenVPN will be completely rewritten. In addition, it'll be a lot of
fun with the plug-ins and script support.
The advantage, how I see it, is primarily with SSL manager
process/thread. The SSL manager thread can be completely locked down
and only be accessible via a kind of internal API. *IF* this manager
thread can be a separate process, it can also be possible to lock it
down further (running as a different user than the other threads, on
Linux SELinux can further restrict its possibilities). In addition
this move can enable clustering support, where you can more seamlessly
move clients from one physical OpenVPN server to another one. The SSL
manager can exchange information through a local multicast network
with the other cluster member's SSL manager. However, one of the real
tricky things here is: How to tackle plug-ins and scripts when a
client moves from one box to another one?
Hehe, something left for 2.7, I'd say :-)

gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany ***@greenie.muc.de
fax: +49-89-35655025 ***@net.informatik.tu-muenchen.de
Steffan Karger
2014-08-06 13:53:34 UTC
Permalink
Hi,
Post by Gert Doering
Post by David Sommerseth
The encryption and decryption using symmetric keys are really fast.
fast, but used very very often...
Yes, and less difficult to separate. That is why I would tackle this
one first too.
Post by Gert Doering
My rough idea was something like
- one control thread doing the IP routing and "client connection maintenance"
- one group of threads to do "decrypt incoming packet"
- one group of threads to do "encrypt outgoing packet and send away"
and of course the "control thread" could split off the asymmetric crypto
to yet another thread... this should be sufficient to at least utilize
a few more cores, until we get to bottlenecks in the central thread.
More or less what I had in mind too. If possible, I would try to do
just the control/data decision in a thread different from the
connection maintenance. That would prevent the data channels from
choking up during connection attempts.

-Steffan
David Sommerseth
2014-08-06 18:38:07 UTC
Permalink
Post by Gert Doering
Post by David Sommerseth
What is CPU intensive is when asymmetric encryption comes into
play, with the key exchanges and other negotiations etc.
slow, but used much more seldom... assuming VPN clients that stay
connected for a reasonable amount of time, and transfer "enough" data.
True ... until you restart a busy server. Then you'll get a busy
peak, and unless --reneg-* options is disabled, you'll have these
peaks fairly regularly.

Which actually makes me ponder even more, regarding the SSL state
manager. If OpenVPN is killed with a "restart" signal, could it
encrypt the saved state and dump to file (keying material could be the
server --key, or another explicit key for this feature). When it is
started again, it will read and decrypt this file and continue without
re-init of all SSL clients .... but it may actually fail, especially
for TCP, depending on if there are any tight relations to the client
ports.

/me should stop thinking so much


- --
kind regards,

David Sommerseth
Simon Deziel
2014-08-06 19:02:02 UTC
Permalink
Post by David Sommerseth
Post by Gert Doering
Post by David Sommerseth
What is CPU intensive is when asymmetric encryption comes into
play, with the key exchanges and other negotiations etc.
slow, but used much more seldom... assuming VPN clients that stay
connected for a reasonable amount of time, and transfer "enough" data.
True ... until you restart a busy server. Then you'll get a busy
peak, and unless --reneg-* options is disabled, you'll have these
peaks fairly regularly.
I haven't look at the code but I wonder if there is a random margin time
added or substracted to reneg-sec to avoid all clients renegotiating at
the exact same time in that specific scenario?

This feature is well explained here [1] (IPsec implementation), see
"rekeyfuzz".
Post by David Sommerseth
Which actually makes me ponder even more, regarding the SSL state
manager. If OpenVPN is killed with a "restart" signal, could it
encrypt the saved state and dump to file (keying material could be the
server --key, or another explicit key for this feature). When it is
started again, it will read and decrypt this file and continue without
re-init of all SSL clients .... but it may actually fail, especially
for TCP, depending on if there are any tight relations to the client
ports.
I like that idea and maybe the TCP case could be addressed by TCP repair
[2] on Linux.

Regards,
Simon

1: https://wiki.strongswan.org/projects/strongswan/wiki/ExpiryRekey
2: https://lwn.net/Articles/495304/
Joe Patterson
2014-08-06 20:06:58 UTC
Permalink
I still maintain that it would be much simpler and more useful to put less
effort into making a multi-threaded process, and more effort into making it
easier for multiple processes to coordinate amongst one another. That gets
the advantage of more easily being able to allocate multiple clients across
a large number of cores. The only disadvantage I see is that it does
prevent a single client from being able to be serviced by multiple cores,
but if I'm not wrong that's going to be a problem whether it's threaded or
multi-process, unless you're using ecb or ctr ciphers (and I don't see any
ctr ciphers in my openssl, and wouldn't suggest using an ecb one), so I
suspect that problem will be endemic.

-Joe
Post by Gert Doering
Hi,
Post by David Sommerseth
Just thinking aloud now, without many filters enabled. (read: uh-oh!)
:)
Post by David Sommerseth
The encryption and decryption using symmetric keys are really fast.
fast, but used very very often...
Post by David Sommerseth
What is CPU intensive is when asymmetric encryption comes into play,
with the key exchanges and other negotiations etc.
slow, but used much more seldom... assuming VPN clients that stay
connected for a reasonable amount of time, and transfer "enough" data.
Post by David Sommerseth
With this in mind, it does make sense to split out the asymmetric
encryption phases to a separate core, which can allow other symmetric
encrypted traffic to flow more freely.
But, this is just splitting stuff into 2 threads. Any modern
computers doesn't really have that *few* CPU cores. (Even cellphones
seems to have at least 4 cores these days). So even though the
benefit of using 2 threads will be noticed, it can be done better.
My rough idea was something like
- one control thread doing the IP routing and "client connection maintenance"
- one group of threads to do "decrypt incoming packet"
- one group of threads to do "encrypt outgoing packet and send away"
and of course the "control thread" could split off the asymmetric crypto
to yet another thread... this should be sufficient to at least utilize
a few more cores, until we get to bottlenecks in the central thread.
Post by David Sommerseth
What *if* there are 3 "thread groups"? One of these groups is a
single thread which is a SSL state manager. It keeps tracks of all
keys being used, and which state each client is in. Then there is a
"thread group" with symmetric encryption work, which basically does
the real tunnelling and takes care of the network traffic flow. But
it receives the keying material from the SSL state manager thread.
And then last "thread group" is the one taking care of asymmetric
encryption and the key negotiations.
Yep. Something like that :-)
Nice design, now it just needs to be done...
Post by David Sommerseth
But! This is going to be a h*** of a lot of work. And almost
everything regarding the event management/scheduler and SSL code in
OpenVPN will be completely rewritten. In addition, it'll be a lot of
fun with the plug-ins and script support.
The advantage, how I see it, is primarily with SSL manager
process/thread. The SSL manager thread can be completely locked down
and only be accessible via a kind of internal API. *IF* this manager
thread can be a separate process, it can also be possible to lock it
down further (running as a different user than the other threads, on
Linux SELinux can further restrict its possibilities). In addition
this move can enable clustering support, where you can more seamlessly
move clients from one physical OpenVPN server to another one. The SSL
manager can exchange information through a local multicast network
with the other cluster member's SSL manager. However, one of the real
tricky things here is: How to tackle plug-ins and scripts when a
client moves from one box to another one?
Hehe, something left for 2.7, I'd say :-)
gert
--
USENET is *not* the non-clickable part of WWW!
//
www.muc.de/~gert/
Gert Doering - Munich, Germany
fax: +49-89-35655025
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Openvpn-users mailing list
https://lists.sourceforge.net/lists/listinfo/openvpn-users
Gert Doering
2014-08-06 20:09:27 UTC
Permalink
Hi,
Post by Joe Patterson
I still maintain that it would be much simpler and more useful to put less
effort into making a multi-threaded process, and more effort into making it
easier for multiple processes to coordinate amongst one another.
Feel free to convince us with running code :-)

gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany ***@greenie.muc.de
fax: +49-89-35655025 ***@net.informatik.tu-muenchen.de
Steffan Karger
2014-08-06 20:20:40 UTC
Permalink
Hi Joe,
Post by Joe Patterson
The only disadvantage I see is that it does
prevent a single client from being able to be serviced by multiple cores,
but if I'm not wrong that's going to be a problem whether it's threaded or
multi-process, unless you're using ecb or ctr ciphers (and I don't see any
ctr ciphers in my openssl, and wouldn't suggest using an ecb one), so I
suspect that problem will be endemic.
For the typical road-warrior scenario (one server, many clients) you
are probably right that a single user won't benefit much from
data-channel threading. However, in a high-capacity site-to-site link,
many concurrent connection by many users are handled by a single
connection instance. In this case threading would enable processing
multiple network packets concurrently. Though that does not speed up
processing of a single packet, it will speed up the total connection
almost linear to the number of cores.

-Steffan
David Sommerseth
2014-08-06 20:36:14 UTC
Permalink
Post by Steffan Karger
Hi Joe,
On Wed, Aug 6, 2014 at 10:06 PM, Joe Patterson
The only disadvantage I see is that it does prevent a single
client from being able to be serviced by multiple cores, but if
I'm not wrong that's going to be a problem whether it's threaded
or multi-process, unless you're using ecb or ctr ciphers (and I
don't see any ctr ciphers in my openssl, and wouldn't suggest
using an ecb one), so I suspect that problem will be endemic.
For the typical road-warrior scenario (one server, many clients)
you are probably right that a single user won't benefit much from
data-channel threading. However, in a high-capacity site-to-site
link, many concurrent connection by many users are handled by a
single connection instance. In this case threading would enable
processing multiple network packets concurrently. Though that does
not speed up processing of a single packet, it will speed up the
total connection almost linear to the number of cores.
+1

Even though it might look easier to implement a solution similar to
Apache's prefork model, I'm not convinced that approach will be easier
to implement in OpenVPN's context. Plus, if you start adding more
processes than cores, the result will be worse.

Yes, splitting up the tasks OpenVPN does over multiple
threads/processes is a harder task. But I feel quite confident that's
the approach which will scale best. Also due to some work going on in
the Linux kernel to make the TUN/TAP driver able to handle multiple
rx/tx queues as well. The throughput issue is just as well currently
limited to what that driver is able to process, and that's one big
bottle neck right now. Especially the TCP checksum calculations [1].

[1]
<https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux#a10Gigabitnetworks>

So when I propose several threads doing the network rx/tx processing,
it is to prepare OpenVPN for the day the TUN/TAP driver fully supports
multiples queues, where the kernel pins each queue to a separate CPU
cores.


- --
kind regards,

David Sommerseth
Les Mikesell
2014-08-06 20:52:29 UTC
Permalink
On Wed, Aug 6, 2014 at 3:36 PM, David Sommerseth
Post by David Sommerseth
Post by Steffan Karger
For the typical road-warrior scenario (one server, many clients)
you are probably right that a single user won't benefit much from
data-channel threading. However, in a high-capacity site-to-site
link, many concurrent connection by many users are handled by a
single connection instance. In this case threading would enable
processing multiple network packets concurrently. Though that does
not speed up processing of a single packet, it will speed up the
total connection almost linear to the number of cores.
+1
Don't you really want to throw this case at hardware ssl offload?
Post by David Sommerseth
Even though it might look easier to implement a solution similar to
Apache's prefork model, I'm not convinced that approach will be easier
to implement in OpenVPN's context. Plus, if you start adding more
processes than cores, the result will be worse.
Maybe - you are the code expert here, but doesn't this mean you either
have to start from scratch or find every possible thread contention in
the code? Or lock to the point that you essentially serialize anyway?
Post by David Sommerseth
Yes, splitting up the tasks OpenVPN does over multiple
threads/processes is a harder task. But I feel quite confident that's
the approach which will scale best.
Given that scaling beyond one CPU hasn't been a priority at all so
far, does the 'theoretical best' approach justify the debugging
complexity?
--
Les Mikesell
***@gmail.com
David Sommerseth
2014-08-07 09:56:32 UTC
Permalink
Post by Les Mikesell
On Wed, Aug 6, 2014 at 3:36 PM, David Sommerseth
Post by David Sommerseth
Post by Steffan Karger
For the typical road-warrior scenario (one server, many
clients) you are probably right that a single user won't
benefit much from data-channel threading. However, in a
high-capacity site-to-site link, many concurrent connection by
many users are handled by a single connection instance. In this
case threading would enable processing multiple network packets
concurrently. Though that does not speed up processing of a
single packet, it will speed up the total connection almost
linear to the number of cores.
+1
Don't you really want to throw this case at hardware ssl offload?
Yes, but the key exchange accelerators (for ex. RSA accelerators)
isn't that common. More often these days you have CPUs with AES-NI
instructions support, but afaik, they only cover AES symmetric
encryption and decryption operations.
Post by Les Mikesell
Post by David Sommerseth
Even though it might look easier to implement a solution similar
to Apache's prefork model, I'm not convinced that approach will
be easier to implement in OpenVPN's context. Plus, if you start
adding more processes than cores, the result will be worse.
Maybe - you are the code expert here, but doesn't this mean you
either have to start from scratch or find every possible thread
contention in the code? Or lock to the point that you essentially
serialize anyway?
Not necessarily completely from scratch. But there will definitely to
be some challenges of course with mutex locking when threads wants to
access the same memory regions.

However, that is most likely less intrusive and complex than to
basically needing to re-write the event handler which schedules that
each client gets their "time slice" in OpenVPN. OpenVPN's event
handling the only thing which makes OpenVPN tackle more clients at the
same time, inside the same process/thread.

The OpenVPN implementation is also quite different from your apache
pre-fork suggestion, where the connection to a web server is closed
after having served a simple request with limited amount of data. If
you have 100 clients downloading 500MB of data, I'd bet you'll easily
see the need to increase the number of threads/processes in Apache to
avoid new connections to be rejected. Likewise, if you have 100
clients connecting 10 times retrieving data in parallel, this also is
a stressful moment for the web-server. This doesn't happen in
OpenVPN, because each client gets a "time slice" before OpenVPN serves
others again.

In addition, Apache can give quite good throughput, but the latency
can be more unpredictable, which is normally not an issue when you
download data. And if it is latency sensitive, the client have often
implemented some kind of buffering. Such buffering is quite
impossible to do on a VPN connection. But by giving each client a
slice of processing time, the latency gets more predictable and the
throughput is spread more evenly among the clients.
Post by Les Mikesell
Post by David Sommerseth
Yes, splitting up the tasks OpenVPN does over multiple
threads/processes is a harder task. But I feel quite confident
that's the approach which will scale best.
Given that scaling beyond one CPU hasn't been a priority at all so
far, does the 'theoretical best' approach justify the debugging
complexity?
The complexity of implementing a pre-forked model would actually be
far more complex than the alternative. And in addition, there is a
need for a SSL state manager, which keeps tracks of the SSL keying
material for each client, no matter which approach is used. OpenVPN
does already have such a state manager, but it's fairly simple because
it only needs to process each request one client by one client.


- --
kind regards,

David Sommerseth
Les Mikesell
2014-08-07 14:15:34 UTC
Permalink
On Thu, Aug 7, 2014 at 4:56 AM, David Sommerseth
Post by David Sommerseth
However, that is most likely less intrusive and complex than to
basically needing to re-write the event handler which schedules that
each client gets their "time slice" in OpenVPN. OpenVPN's event
handling the only thing which makes OpenVPN tackle more clients at the
same time, inside the same process/thread.
I don't see why you'd need to change that at all. Let the parent
process continue to handle all of the client connections, and just add
a socket to the child process(es) into the event loop. Then instead
of recomputing the keys in the parent, send that work over the socket
to the child, which, being a fork, already has the same event handler.
I think the only extra complexity would be having to track 'work
pending' connection states until the child returns the completed work.
Post by David Sommerseth
The OpenVPN implementation is also quite different from your apache
pre-fork suggestion, where the connection to a web server is closed
after having served a simple request with limited amount of data.
Agreed - I wouldn't have the child processes accept any connections.
The similarity would only be in managing a pool of worker processes.
But for simplicity consider just one forked child where you hand off
rekeying. You'd probably really want a pool of connections to
workers, but even one would double the CPU power available and avoids
the complexity of balancing a pool.
Post by David Sommerseth
Likewise, if you have 100
clients connecting 10 times retrieving data in parallel, this also is
a stressful moment for the web-server. This doesn't happen in
OpenVPN, because each client gets a "time slice" before OpenVPN serves
others again.
Right, but if you keep that same logic but fork the process and push
the packets that involve a lot of CPU work to a different instance you
get time slices out of multiple CPUs instead of just one.
Post by David Sommerseth
The complexity of implementing a pre-forked model would actually be
far more complex than the alternative. And in addition, there is a
need for a SSL state manager, which keeps tracks of the SSL keying
material for each client, no matter which approach is used. OpenVPN
does already have such a state manager, but it's fairly simple because
it only needs to process each request one client by one client.
A forked copy would automatically have the same management code...
And you'd have the option of either passing any needed state over the
socket between processes or using explicitly shared memory and some
sort of lock to arbitrate access- which you'd need with threads
anyway. There would be some extra overhead in passing things over
the sockets, but you might have 2 to 32x the CPU power to do it.
--
Les Mikesll
***@gmail.com
Jan Just Keijser
2014-08-07 15:24:06 UTC
Permalink
Post by Les Mikesell
On Thu, Aug 7, 2014 at 4:56 AM, David Sommerseth
Post by David Sommerseth
However, that is most likely less intrusive and complex than to
basically needing to re-write the event handler which schedules that
each client gets their "time slice" in OpenVPN. OpenVPN's event
handling the only thing which makes OpenVPN tackle more clients at the
same time, inside the same process/thread.
I don't see why you'd need to change that at all. Let the parent
process continue to handle all of the client connections, and just add
a socket to the child process(es) into the event loop. Then instead
of recomputing the keys in the parent, send that work over the socket
to the child, which, being a fork, already has the same event handler.
I think the only extra complexity would be having to track 'work
pending' connection states until the child returns the completed work.
If I were to add multi-core support to OpenVPN I would start with the
Apache httpd 1.3 or 2.x code base (1.3 is simpler as it does not include
apache's MPM stuff). Httpd + mod_ssl has already solved the issue of
accepting multiple client connections and should/will have similar
issues with key renegotation.

I would also opt for function handlers/pointers per connection - that
way you could server both udp+tcp from a single server instance - the
client connection entry element would then contain pointers to the right
handlers for tcp, udp and possibly even tun or tap.

JM2CW,

JJK
Post by Les Mikesell
Post by David Sommerseth
The OpenVPN implementation is also quite different from your apache
pre-fork suggestion, where the connection to a web server is closed
after having served a simple request with limited amount of data.
Agreed - I wouldn't have the child processes accept any connections.
The similarity would only be in managing a pool of worker processes.
But for simplicity consider just one forked child where you hand off
rekeying. You'd probably really want a pool of connections to
workers, but even one would double the CPU power available and avoids
the complexity of balancing a pool.
Post by David Sommerseth
Likewise, if you have 100
clients connecting 10 times retrieving data in parallel, this also is
a stressful moment for the web-server. This doesn't happen in
OpenVPN, because each client gets a "time slice" before OpenVPN serves
others again.
Right, but if you keep that same logic but fork the process and push
the packets that involve a lot of CPU work to a different instance you
get time slices out of multiple CPUs instead of just one.
Post by David Sommerseth
The complexity of implementing a pre-forked model would actually be
far more complex than the alternative. And in addition, there is a
need for a SSL state manager, which keeps tracks of the SSL keying
material for each client, no matter which approach is used. OpenVPN
does already have such a state manager, but it's fairly simple because
it only needs to process each request one client by one client.
A forked copy would automatically have the same management code...
And you'd have the option of either passing any needed state over the
socket between processes or using explicitly shared memory and some
sort of lock to arbitrate access- which you'd need with threads
anyway. There would be some extra overhead in passing things over
the sockets, but you might have 2 to 32x the CPU power to do it.
Jason Haar
2014-08-07 23:23:29 UTC
Permalink
Post by Jan Just Keijser
I would also opt for function handlers/pointers per connection - that
way you could server both udp+tcp from a single server instance
Yes - having one server instance managing both udp and tcp AND being
able to handle multiple ports should be part of any rewrite. We have
found there are tonnes of different firewall variables in (client-end)
networks we've come across - so currently have several openvpn instances
running on the same server to maximize success rates. Having all that
handled by one instance would be much simpler (with threading or forking
- don't care - not a programmer ;-)

If we're asking for ponies, can I also have one that can do some form of
latency test first (in the case of DNS resolving to multiple server IPs)
so that clients go to the "fastest" server? I'd love to have a single
client config that would give users the best performance by default (by
taking them to the openvpn server closest to their current location).
Within our Cisco VPN environment - where the GUI shows users all our VPN
gateways - users (if left to their own devices) will typically chose the
FIRST one and then stick to it - even if they are travelling to other
countries. We have gateways all over the world and users typically don't
use the optimum one - they use the one that "worked last time". And then
they complain how slow VOIP is over it ;-)

In the words of immortal Devo: "Freedom from choice: is what you want" ;-)
--
Cheers

Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
Jason Haar
2014-08-06 22:37:42 UTC
Permalink
Post by David Sommerseth
What is CPU intensive is when asymmetric encryption comes into play,
with the key exchanges and other negotiations etc.
I sooo have to agree with that. Back in the day I could notice even with
only TWO clients how openvpn would completely HANG during key
renegotiation! ie I'd be SSH-ed into some work server via openvpn,
happily typing away, the second client would connect and WHAM! total
freeze for 5+ seconds.

Which is why I changed our reneg-sec from 3600 to 36000 (ie ten hours).
If we had 100 simultaneous clients, I'd even think of increasing that
yet again. The theoretical risk of someone actually brute forcing a key
in that time window is still nearly infinitely less than the actual
impact of key renegotiation on openvpn
--
Cheers

Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
David Sommerseth
2014-08-07 09:36:51 UTC
Permalink
Post by Jason Haar
Post by David Sommerseth
What is CPU intensive is when asymmetric encryption comes into
play, with the key exchanges and other negotiations etc.
I sooo have to agree with that. Back in the day I could notice even
with only TWO clients how openvpn would completely HANG during key
renegotiation! ie I'd be SSH-ed into some work server via openvpn,
happily typing away, the second client would connect and WHAM!
total freeze for 5+ seconds.
Which is why I changed our reneg-sec from 3600 to 36000 (ie ten
hours). If we had 100 simultaneous clients, I'd even think of
increasing that yet again. The theoretical risk of someone actually
brute forcing a key in that time window is still nearly infinitely
less than the actual impact of key renegotiation on openvpn
If --reneg-sec is an issue, I'd probably recommend turning it off
completely (set it to 0) and use enable --reneg-bytes and/or
- --reneg-pkts instead. It is hopefully less likely that many clients
transfer the same amount of data over the tunnel in approximately the
same time window.


- --
kind regards,

David Sommerseth
Continue reading on narkive:
Loading...