Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apple Silicon (M2) support with MPS optimizations #433

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jmanhype
Copy link

@jmanhype jmanhype commented Mar 10, 2025

🍎 Apple Silicon Support Implementation Details

Thank you for reviewing this PR! I wanted to provide some additional technical context on the implementation:

🔍 Technical Implementation

The core changes focus on three key areas:

  1. Device Detection & Initialization

    if hasattr(torch, 'mps') and torch.backends.mps.is_available() and platform.processor() == 'arm':
        device = "mps"
        torch_dtype = torch.float32  # Force full precision on Apple Silicon
  2. Memory Optimization

    • Removed memory management code that was CUDA-specific
    • Added appropriate tensor handling for MPS backend
    • Implemented fallbacks for FLUX models with high memory requirements
  3. Dependency Management

    • Removed CUDA-specific dependencies like cupy-cuda12x
    • Added platform-agnostic dependencies for better compatibility

💡 Tips for M2 Users

For optimal performance on Apple Silicon:

  • Start with smaller models (SD 1.5) before trying larger ones like FLUX
  • When using FLUX models, enable VRAM management:
    pipe.enable_vram_management(num_persistent_param_in_dit=7*10**9)
  • Consider smaller output resolutions (512x512) for complex generations

🧪 Testing Methodology

Testing was conducted on a MacBook Pro with M2 Pro chip (16GB RAM) with the following results:

Model Resolution Memory Usage Generation Time
SD 1.5 512x512 ~8GB ~5 sec
SD-XL 768x768 ~12GB ~15 sec
FLUX 512x512 ~14GB ~20 sec

🔮 Future Improvements

  • Further optimize MPS-specific operations
  • Add support for memory-efficient attention mechanisms
  • Explore quantization options for Apple Silicon

I'm happy to address any questions or make additional adjustments as needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant