Skip to content
This repository was archived by the owner on Aug 2, 2023. It is now read-only.

Multi-core support #33

Closed
3 tasks done
achimnol opened this issue Apr 27, 2017 · 0 comments
Closed
3 tasks done

Multi-core support #33

achimnol opened this issue Apr 27, 2017 · 0 comments
Assignees
Labels

Comments

@achimnol
Copy link
Member

achimnol commented Apr 27, 2017

For scalability, the gateway server must become multi-core enabled.

@achimnol achimnol self-assigned this Apr 27, 2017
achimnol added a commit that referenced this issue Apr 27, 2017
 * Multi-process mode is not enabled yet.
achimnol added a commit that referenced this issue Jun 2, 2017
Still, dispatching mechanims is hand-brewed one...
achimnol added a commit that referenced this issue Aug 7, 2017
 * aiozmq.rpc uses PUSH/PULL ZMQ sockets - we can put a routing proxy
   so that it distributes incoming event messages to multiple manager
   worker processes (instead of complicated external MQ servers!).
achimnol added a commit that referenced this issue Aug 7, 2017
 * For multiprocessing, it uses multiprocessing.Manager().dict().
   This may incur a little performance overhead but simplifies
   maintainability a lot.
achimnol added a commit that referenced this issue Aug 7, 2017
achimnol added a commit that referenced this issue Aug 23, 2017
 * Now manager + gateway runs on multiple CPU cores with sane
   transaction semantics.  Thanks to aiotools!

   - It no longer depends on Redis as a pub-sub broker nor a database.
     All communications are done via ZeroMQ with no centralized queue
     server.

   - Redis is used only for per-keypair rate-limiting.

 * Now the manager searches available agent to spawn new containers
   based on available memory / CPU / GPU capacity units.
   No more hard-coded instance types!

 * The db schema is now prepared for multi-container kernel sessions.

   - User-facing APIs now use "session ID" which is directed to the
     master container of the given session.

   - Each container has unique "kernel ID" and managed individually.

 * Replace asyncpg + asyncpgsa with aiopg for better SQLAlchemy
   supports (especially custom type decorators).

 * TODOs

   - Stabilize accounting of used/available resource units.

   - Still some parts are confused of session/kernel IDs...
achimnol added a commit that referenced this issue Aug 29, 2017
 * No longer repeated "agent-lost" logs (only once!)
 * Moved agent-specific logic in gateway/events.py to gateway/kernel.py
 * Synchronize correctly when updating/reading the "last seen" dictionary shared by all worker processes
achimnol added a commit that referenced this issue Sep 18, 2017
 * Agent status checker coroutine now only runs on the first CPU core
 * Fix SQLAlchemy query in registry's enumerate_instance() method
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant