Skip to content

Apple Silicon standalone build cannot connect to mlagents-learn #5474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TV4Fun opened this issue Jul 24, 2021 · 7 comments
Closed

Apple Silicon standalone build cannot connect to mlagents-learn #5474

TV4Fun opened this issue Jul 24, 2021 · 7 comments
Labels
bug Issue describes a potential bug in ml-agents.

Comments

@TV4Fun
Copy link

TV4Fun commented Jul 24, 2021

Describe the bug
The standard checkout of ml-agents includes prebuilt OSX x64 libraries that can't be used by an Apple Silicon build. For example com.unity.ml-agents/Plugins/ProtoBuffer/runtimes/osx/native/libgrpc_csharp_ext.x64.bundle. This can cause some strange behavior if you're trying to train using a standalone executable on Apple Silicon.

To Reproduce
Steps to reproduce the behavior:

  1. Checkout ml-agents from GitHub and build
  2. Attach your ml-agents build to a unity project
  3. Build the project as a standalone Apple Silicon executable
  4. Run mlagents-learn --env=<PATH_TO_APP>

Console logs / stack traces
Player-0.log:

Mono path[0] = '/Users/jcroteau/New Unity Project/Standalone.app/Contents/Resources/Data/Managed'
Mono config path = '/Users/jcroteau/New Unity Project/Standalone.app/Contents/MonoBleedingEdge/etc'
Found 1 interfaces on host : 0) 192.168.254.27
Multi-casting "[IP] 192.168.254.27 [Port] 55195 [Flags] 2 [Guid] 2162196733 [EditorId] 1012424985 [Version] 1048832 [Id] OSXPlayer(1,Nomad.home) [Debug] 1 [PackageName] OSXPlayer [ProjectName] New Unity Project" to [225.0.0.222:54997]...
Starting managed debugger on port 56733
Using monoOptions --debugger-agent=transport=dt_socket,embedding=1,server=y,suspend=n,address=0.0.0.0:56733
Initialize engine version: 2021.2.0b4 (af9ec38f7da3)
[Subsystems] Discovering subsystems at path /Users/jcroteau/New Unity Project/Standalone.app/Contents/Resources/Data/UnitySubsystems
GfxDevice: creating device client; threaded=1; jobified=0
 preferred device: Apple M1 (high power)
Metal devices available: 1
0: Apple M1 (high power)
Using device Apple M1 (high power)
Initializing Metal device caps: Apple M1
Begin MonoManager ReloadAssembly
- Completed reload, in  0.045 seconds
Metal RecreateSurface[0x11240f150]: surface size 2880x1800
UnloadTime: 3.480250 ms
Unexpected exception when trying to initialize communication: System.IO.IOException: Error loading native library "/Users/jcroteau/New Unity Project/Standalone.app/Contents/Resources/Data/Managed/../../../Plugins/libgrpc_csharp_ext.x64.bundle"
  at Grpc.Core.Internal.UnmanagedLibrary..ctor (System.String[] libraryPathAlternatives) [0x00063] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.Internal.NativeExtension.Load () [0x000d7] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.Internal.NativeExtension..ctor () [0x00006] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.Internal.NativeExtension.Get () [0x00022] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.Internal.NativeMethods.Get () [0x00000] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.GrpcEnvironment.GrpcNativeInit () [0x00000] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.GrpcEnvironment..ctor () [0x0001e] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.GrpcEnvironment.AddRef () [0x00028] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.Channel..ctor (System.String target, Grpc.Core.ChannelCredentials credentials, System.Collections.Generic.IEnumerable`1[T] options) [0x00051] in <2f154ad39ec14cfea604815989d96352>:0 
  at Grpc.Core.Channel..ctor (System.String target, Grpc.Core.ChannelCredentials credentials) [0x00000] in <2f154ad39ec14cfea604815989d96352>:0 
  at Unity.MLAgents.RpcCommunicator.Initialize (System.Int32 port, Unity.MLAgents.CommunicatorObjects.UnityOutputProto unityOutput, Unity.MLAgents.CommunicatorObjects.UnityInputProto& unityInput) [0x00008] in /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs:220 
  at Unity.MLAgents.RpcCommunicator.Initialize (Unity.MLAgents.CommunicatorInitParameters initParameters, Unity.MLAgents.UnityRLInitParameters& initParametersOut) [0x00041] in /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs:111 
UnityEngine.StackTraceUtility:ExtractStackTrace () (at /Users/bokken/buildslave/unity/build/Runtime/Export/Scripting/StackTrace.cs:37)
UnityEngine.DebugLogHandler:LogFormat (UnityEngine.LogType,UnityEngine.Object,string,object[])
UnityEngine.Logger:Log (UnityEngine.LogType,object)
UnityEngine.Debug:Log (object)
Unity.MLAgents.RpcCommunicator:Initialize (Unity.MLAgents.CommunicatorInitParameters,Unity.MLAgents.UnityRLInitParameters&) (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs:140)
Unity.MLAgents.Academy:InitializeEnvironment () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:445)
Unity.MLAgents.Academy:LazyInitialize () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:279)
Unity.MLAgents.Academy:.ctor () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:248)
Unity.MLAgents.Academy/<>c:<.cctor>b__82_0 () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:117)
System.Lazy`1<Unity.MLAgents.Academy>:ViaFactory (System.Threading.LazyThreadSafetyMode)
System.Lazy`1<Unity.MLAgents.Academy>:ExecutionAndPublication (System.LazyHelper,bool)
System.Lazy`1<Unity.MLAgents.Academy>:CreateValue ()
System.Lazy`1<Unity.MLAgents.Academy>:get_Value ()
Unity.MLAgents.Academy:get_Instance () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:132)
Unity.MLAgents.DecisionRequester:Awake () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/DecisionRequester.cs:57)

(Filename: /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs Line: 140)

Couldn't connect to trainer on port 5005 using API version 1.5.0. Will perform inference instead.
UnityEngine.StackTraceUtility:ExtractStackTrace () (at /Users/bokken/buildslave/unity/build/Runtime/Export/Scripting/StackTrace.cs:37)
UnityEngine.DebugLogHandler:LogFormat (UnityEngine.LogType,UnityEngine.Object,string,object[])
UnityEngine.Logger:Log (UnityEngine.LogType,object)
UnityEngine.Debug:Log (object)
Unity.MLAgents.Academy:InitializeEnvironment () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:459)
Unity.MLAgents.Academy:LazyInitialize () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:279)
Unity.MLAgents.Academy:.ctor () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:248)
Unity.MLAgents.Academy/<>c:<.cctor>b__82_0 () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:117)
System.Lazy`1<Unity.MLAgents.Academy>:ViaFactory (System.Threading.LazyThreadSafetyMode)
System.Lazy`1<Unity.MLAgents.Academy>:ExecutionAndPublication (System.LazyHelper,bool)
System.Lazy`1<Unity.MLAgents.Academy>:CreateValue ()
System.Lazy`1<Unity.MLAgents.Academy>:get_Value ()
Unity.MLAgents.Academy:get_Instance () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs:132)
Unity.MLAgents.DecisionRequester:Awake () (at /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/DecisionRequester.cs:57)

(Filename: /Users/jcroteau/ml-agents/com.unity.ml-agents/Runtime/Academy.cs Line: 459)

Output of mlagents-learn --env=Standalone.app --debug:



                        ▄▄▄▓▓▓▓
                   ╓▓▓▓▓▓▓█▓▓▓▓▓
              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
                   `▀█▓▓▓▓▓▓▓▓▓▌
                        ¬`▀▀▀█▓


 Version information:
  ml-agents: 0.28.0.dev0,
  ml-agents-envs: 0.28.0.dev0,
  Communicator API: 1.5.0,
  PyTorch: 1.10.0a0+gite856a45
2021-07-24 13:29:24 DEBUG [learn.py:220] Configuration for this run:
2021-07-24 13:29:24 DEBUG [learn.py:221] {
    "default_settings": null,
    "behaviors": {
        "KinematicMoveToGoal": {
            "trainer_type": "ppo",
            "hyperparameters": {
                "batch_size": 1000,
                "buffer_size": 10000,
                "learning_rate": 0.0003,
                "beta": 0.005,
                "epsilon": 0.2,
                "lambd": 0.95,
                "num_epoch": 5,
                "learning_rate_schedule": "linear"
            },
            "network_settings": {
                "normalize": false,
                "hidden_units": 256,
                "num_layers": 2,
                "vis_encode_type": "simple",
                "memory": null,
                "goal_conditioning_type": "hyper"
            },
            "reward_signals": {
                "extrinsic": {
                    "gamma": 0.95,
                    "strength": 1.0,
                    "network_settings": {
                        "normalize": false,
                        "hidden_units": 128,
                        "num_layers": 2,
                        "vis_encode_type": "simple",
                        "memory": null,
                        "goal_conditioning_type": "hyper"
                    }
                }
            },
            "init_path": null,
            "keep_checkpoints": 5,
            "checkpoint_interval": 500000,
            "max_steps": 5000000,
            "time_horizon": 64,
            "summary_freq": 10000,
            "threaded": true,
            "self_play": null,
            "behavioral_cloning": null
        }
    },
    "env_settings": {
        "env_path": "Standalone.app",
        "env_args": null,
        "base_port": 5005,
        "num_envs": 1,
        "seed": -1
    },
    "engine_settings": {
        "width": 84,
        "height": 84,
        "quality_level": 5,
        "time_scale": 20,
        "target_frame_rate": -1,
        "capture_frame_rate": 60,
        "no_graphics": false
    },
    "environment_parameters": null,
    "checkpoint_settings": {
        "run_id": "KinematicBiggerPenalties",
        "initialize_from": null,
        "load_model": false,
        "resume": true,
        "force": false,
        "train_model": false,
        "inference": false,
        "results_dir": "results"
    },
    "torch_settings": {
        "device": null
    },
    "debug": true
}
2021-07-24 13:29:24 DEBUG [learn.py:245] run_seed set to 5998
2021-07-24 13:29:24 DEBUG [torch.py:58] default Torch device: cpu
2021-07-24 13:29:24 DEBUG [stats_writer.py:60] Initializing StatsWriter plugins: default
2021-07-24 13:29:24 DEBUG [stats_writer.py:63] Found 3 StatsWriters for plugin default
2021-07-24 13:29:24 DEBUG [env_utils.py:33] The true file name is Standalone
2021-07-24 13:29:24 DEBUG [env_utils.py:105] The launch string is /Users/jcroteau/New Unity Project/Standalone.app/Contents/MacOS/New Unity Project
2021-07-24 13:29:24 DEBUG [env_utils.py:106] Running with args ['--mlagents-port', '5005', '-logFile', '/Users/jcroteau/New Unity Project/results/KinematicBiggerPenalties/run_logs/Player-0.log']
[UnityMemory] Configuration Parameters - Can be set up in boot.config
    "memorysetup-bucket-allocator-granularity=16"
    "memorysetup-bucket-allocator-bucket-count=8"
    "memorysetup-bucket-allocator-block-size=4194304"
    "memorysetup-bucket-allocator-block-count=1"
    "memorysetup-main-allocator-block-size=16777216"
    "memorysetup-thread-allocator-block-size=16777216"
    "memorysetup-gfx-main-allocator-block-size=16777216"
    "memorysetup-gfx-thread-allocator-block-size=16777216"
    "memorysetup-cache-allocator-block-size=4194304"
    "memorysetup-typetree-allocator-block-size=2097152"
    "memorysetup-profiler-bucket-allocator-granularity=16"
    "memorysetup-profiler-bucket-allocator-bucket-count=8"
    "memorysetup-profiler-bucket-allocator-block-size=4194304"
    "memorysetup-profiler-bucket-allocator-block-count=1"
    "memorysetup-profiler-allocator-block-size=16777216"
    "memorysetup-profiler-editor-allocator-block-size=1048576"
    "memorysetup-temp-allocator-size-main=4194304"
    "memorysetup-job-temp-allocator-block-size=2097152"
    "memorysetup-job-temp-allocator-block-size-background=1048576"
    "memorysetup-job-temp-allocator-reduction-small-platforms=262144"
    "memorysetup-temp-allocator-size-background-worker=32768"
    "memorysetup-temp-allocator-size-job-worker=262144"
    "memorysetup-temp-allocator-size-preload-manager=262144"
    "memorysetup-temp-allocator-size-nav-mesh-worker=65536"
    "memorysetup-temp-allocator-size-audio-worker=65536"
    "memorysetup-temp-allocator-size-cloud-worker=32768"
    "memorysetup-temp-allocator-size-gfx=262144"
2021-07-24 13:30:24 WARNING [environment.py:431] Environment timed out shutting down. Killing...
2021-07-24 13:30:24 DEBUG [subprocess_env_manager.py:220] UnityEnvironment worker 0: environment stopping.
2021-07-24 13:30:24 DEBUG [subprocess_env_manager.py:234] UnityEnvironment worker 0 closing.
2021-07-24 13:30:24 DEBUG [subprocess_env_manager.py:237] UnityEnvironment worker 0 done.
2021-07-24 13:30:24 DEBUG [trainer_controller.py:81] Saved Model
2021-07-24 13:30:24 DEBUG [subprocess_env_manager.py:368] SubprocessEnvManager closing.
2021-07-24 13:30:24 DEBUG [subprocess_env_manager.py:107] UnityEnvWorker 0 got exception trying to close.
Traceback (most recent call last):
  File "/Users/jcroteau/miniforge3/envs/pytorch-metal/bin/mlagents-learn", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "/Users/jcroteau/ml-agents/ml-agents/mlagents/trainers/learn.py", line 250, in main
    run_cli(parse_command_line())
  File "/Users/jcroteau/ml-agents/ml-agents/mlagents/trainers/learn.py", line 246, in run_cli
    run_training(run_seed, options)
  File "/Users/jcroteau/ml-agents/ml-agents/mlagents/trainers/learn.py", line 125, in run_training
    tc.start_learning(env_manager)
  File "/Users/jcroteau/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/jcroteau/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 173, in start_learning
    self._reset_env(env_manager)
  File "/Users/jcroteau/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/jcroteau/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 105, in _reset_env
    env_manager.reset(config=new_config)
  File "/Users/jcroteau/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 68, in reset
    self.first_step_infos = self._reset_env(config)
  File "/Users/jcroteau/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 333, in _reset_env
    ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})
  File "/Users/jcroteau/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 98, in recv
    raise env_exception
mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
	 The environment does not need user interaction to launch
	 The Agents' Behavior Parameters > Behavior Type is set to "Default"
	 The environment and the Python interface have compatible versions.
	 If you're running on a headless server without graphics support, turn off display by either passing --no-graphics option or build your Unity executable as server build.

Environment (please complete the following information):

  • Unity Version: 2021.2.0b4
  • OS + version: macOS 11.5
  • ML-Agents version: latest main branch from source
  • Torch version: 1.10.0a0+gite856a45 (latest main built from source)
@TV4Fun TV4Fun added the bug Issue describes a potential bug in ml-agents. label Jul 24, 2021
@TV4Fun
Copy link
Author

TV4Fun commented Jul 24, 2021

Note: This also happens when running inside the Apple Silicon build of Unity Editor. The only way I'm able to train is to use the Intel build.

@TV4Fun
Copy link
Author

TV4Fun commented Jul 25, 2021

Trying to get this to work. The default gRPC Unity plugin doesn't include an Arm64 build, but I was able to make my own. The Unity editor and player isn't finding the native library to include with the app, but if I build a standalone and manually add the Apple Silicon dylib I built to it, then it works and has a pretty substantial performance improvement. If anyone can help me in getting the editor to find the right Arm64 file, I would greatly appreciate it.

@ervteng
Copy link
Contributor

ervteng commented Jul 27, 2021

cc: @surfnerd who might have some experience with Apple Silicon

@surfnerd
Copy link
Contributor

I have a PR I am working on that adds the arm64 arch to the native plugin. It should be ready this week.

@surfnerd
Copy link
Contributor

See #5283

@TV4Fun
Copy link
Author

TV4Fun commented Oct 24, 2021

Confirmed that this now works correctly with Unity Arm64 on the latest main branch, so closing this issue.

@TV4Fun TV4Fun closed this as completed Oct 24, 2021
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue describes a potential bug in ml-agents.
Projects
None yet
Development

No branches or pull requests

3 participants