Using AI Models in Xcode/iOS

I am having issues running llama 3.1 8B CoreML/StableDiffusion models using Xcode on iPad M3 device such as:

AI: “the only accessible CoreML package (andmev/Llama-3.1-8B-Instruct-CoreML) still hits ios18.constexpr_blockwise_shift_scale → std::bad_cast inside Apple’s BNNS compiler, which ultimately throws the EXC_BAD_ACCESS you’re seeing before we can catch it. Until we can install an int8/fp16 flavor (or re-export the model ourselves), the app will keep crashing during the Core ML load phase.”

  1. Does anyone know or have links to a working llama(text) /Stable Diffusion(image) model that already works well with Xcode/iOS, and the steps to install/make it work?

Thank you so much for your help

1 Like

I’m not familiar with iOS…


I’ll go step by step:

  1. Why your current Llama-3.1-8B CoreML model is crashing.
  2. Concrete, known-working llama-family (text) and Stable Diffusion (image) models for iOS, with install steps.
  3. How to systematically find iOS/Xcode-friendly models on Hugging Face.

1. Why andmev/Llama-3.1-8B-Instruct-CoreML crashes on iPad

You are using:

  • andmev/Llama-3.1-8B-Instruct-CoreML, which contains a single large llama_3.1_coreml.mlpackage (~4.5 GB). (Hugging Face)

Your runtime error:

ios18.constexpr_blockwise_shift_scale → std::bad_cast inside BNNS, then EXC_BAD_ACCESS while loading the Core ML model.

What this means, in plain terms:

  • Core ML 8 / iOS 18 added new “constexpr_*” ops (like constexpr_blockwise_shift_scale) to support advanced quantization schemes (4-bit blockwise, palettization, etc.) and large-model optimizations. These are used at compile time by the Core ML “ML Program” backend before BNNS/ANE kernels are created.
  • The andmev model is a converted/quantized Llama-3.1-8B. From the size and packaging, it is highly likely it uses such compression / blockwise-quantized weights so that 8B parameters fit in ~4.5GB. (Hugging Face)
  • On your device / OS build, BNNS is hitting an internal bug while compiling or running that op; it throws std::bad_cast in Apple’s code, and you only see the resulting EXC_BAD_ACCESS. There’s nothing you can “fix” from Swift.

So: you’re not doing anything wrong in Xcode. That particular CoreML export is hitting a relatively fresh, iOS-18-only code path that is still fragile.

The practical consequence:

  • You should not treat andmev/Llama-3.1-8B-Instruct-CoreML as a reference of “how CoreML LLMs behave”.

  • Start from models that:

    • Are smaller,
    • Use simpler (fp16 / int8) quantization,
    • And are explicitly documented to work with iOS 18 / Swift.

2. Working models for Xcode/iOS, and how to install them

I’ll split this into:

  • Text LLMs (llama-family-ish)
  • Stable Diffusion (image generation)

2.1 Text: good CoreML LLMs that actually work on Apple devices

2.1.1 TinyLlama CoreML – small, simple, good for smoke tests

Hugging Face: TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML (Hugging Face)

  • What it is:

    • Core ML conversion of TinyLlama/TinyLlama-1.1B-Chat-v0.3, a 1.1B-parameter chat model. (Hugging Face)
    • Purpose-built “Core ML” repo: .mlpackage files optimized for Apple Silicon (Mac + iOS).
  • Why it’s a good start:

    • Small size → far less likely to run into memory or weird corner cases.
    • Good “first LLM” to verify that your Xcode + CoreML + iPad pipeline is correct.

Install / use steps (Mac → Xcode → iPad)

  1. Download the CoreML package (on your Mac)

    Using huggingface_hub CLI:

    pip install -U "huggingface_hub[cli]"
    
    huggingface-cli download \
      TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML \
      --include "*.mlpackage/*" \
      --local-dir tinyllama_coreml \
      --local-dir-use-symlinks False
    

    You should see at least one .mlpackage folder in tinyllama_coreml/. (Hugging Face)

  2. Add .mlpackage to Xcode

    • Drag the .mlpackage folder into your Xcode project (e.g. into a Models/ group).

    • In the dialog:

      • Check “Copy items if needed”.
      • Ensure your iOS target is selected under “Add to targets”.
  3. Load the model in Swift

    Because this is an ML Program .mlpackage, you usually load it dynamically:

    import CoreML
    
    final class TinyLlamaRunner {
        private let model: MLModel
    
        init() throws {
            let url = Bundle.main.url(
                forResource: "TinyLlama-1.1B-Chat-v0.3-CoreML",
                withExtension: "mlpackage"
            )!
            let config = MLModelConfiguration()
            config.computeUnits = .all  // CPU + GPU + ANE
    
            self.model = try MLModel(contentsOf: url, configuration: config)
        }
    
        func generateLogits(inputIds: [Int32]) throws -> MLMultiArray {
            let inputArray = try MLMultiArray(
                shape: [NSNumber(value: inputIds.count)],
                dataType: .int32
            )
            for (i, id) in inputIds.enumerated() {
                inputArray[i] = NSNumber(value: id)
            }
    
            // Names depend on the model; check FeatureDescriptions.json in the package
            let input = try MLDictionaryFeatureProvider(
                dictionary: ["input_ids": inputArray]
            )
            let out = try model.prediction(from: input)
    
            return out.featureValue(for: "logits")!.multiArrayValue!
        }
    }
    

    You can inspect Data/com.apple.CoreML/FeatureDescriptions.json inside the .mlpackage to confirm input/output names and shapes.

  4. Add tokenization / decoding

    • Do tokenization with HF’s tokenizers on the device (using a Swift wrapper or calling into a tiny Python server), or
    • Pre-tokenize on Mac for experiments, just to validate that the Core ML model runs and returns sane logits.

Once this works on your iPad M3, you know the basic Core ML runtime is healthy.


2.1.2 Apple Mistral CoreML – strong 7B, officially maintained

Hugging Face: apple/mistral-coreml (Hugging Face)

  • What it is:

    • Apple’s official Core ML export of mistralai/Mistral-7B-Instruct-v0.3 in fp16 and int4 forms (e.g. StatefulMistral7BInstructInt4.mlpackage). (Hugging Face)
    • Documentation explains how to download .mlpackage folders via huggingface-cli. (Hugging Face)
  • Why it’s special:

    • This is the model Apple uses in many “Swift Transformers / iOS18” examples.
    • The huggingface/swift-transformers “preview” branch and the huggingface/swift-chat sample app show exactly how to wire it up in Swift on macOS 15 / iOS 18. (Hugging Face)

Install / use steps

  1. Download a .mlpackage

    Example from the model card: (Hugging Face)

    pip install -U "huggingface_hub[cli]"
    
    huggingface-cli download \
      apple/mistral-coreml \
      --local-dir models \
      --local-dir-use-symlinks False \
      --include "StatefulMistral7BInstructInt4.mlpackage/*"
    
  2. Add the .mlpackage to Xcode exactly as for TinyLlama.

  3. Use Swift Transformers or the sample chat app

    • swift-transformers (preview branch) has ready-made code to:

      • Load the Core ML Mistral .mlpackage.
      • Handle tokenization, KV cache, and streaming generation.
    • swift-chat is a demo chat app showing end-to-end usage. (Hugging Face)

This is the “serious” baseline text model I’d pick for iOS once TinyLlama is working.


2.1.3 Smaller Llama 3.x CoreML variants

If you want to stay closer to Llama instead of Mistral, look at smaller CoreML-ready Llama 3.2 models, for example:

  • Model: smpanaro/Llama-3.2-1B-Instruct-CoreML (part of an “Apple Neural Engine LLMs” collection, optimized for ANE). (Hugging Face)

These splits the model into multiple .mlmodelc components (embedding, main, head). They are more work to integrate (you must stitch them together in Swift), but they are explicitly designed for Core ML on Apple devices and are much lighter than 8B.


2.2 Stable Diffusion: CoreML models that work in iOS apps

For image generation, you are much better served by Apple’s official CoreML Stable Diffusion models.

2.2.1 apple/coreml-stable-diffusion-v1-5

Hugging Face: apple/coreml-stable-diffusion-v1-5 (Hugging Face)

  • Contains several variants: original, split_einsum, each with compiled and packages subfolders.
  • The compiled variants are ready-to-use .mlpackage combinations (UNet, VAE, text encoders) for Apple’s Swift Stable Diffusion pipeline. (Hugging Face)
  • Apple’s blog post on “Stable Diffusion with Core ML on Apple Silicon” and the apple/ml-stable-diffusion GitHub repo show exactly how to use these in Swift on macOS and iOS. (Hugging Face)

Install / use steps

  1. Download a compiled model

    For example, from the model’s README / Discussions: use huggingface-cli and include a compiled folder. (Hugging Face)

    huggingface-cli download \
      apple/coreml-stable-diffusion-v1-5 \
      --local-dir sd15_coreml \
      --local-dir-use-symlinks False \
      --include "original/compiled/*"
    

    (Make sure git-lfs is installed if you clone, as discussions point out that small pointer files are otherwise downloaded instead of full weights. (Hugging Face))

  2. Add the downloaded folder to Xcode

    • Drag the entire original/compiled folder (or rename to something like sd15_original_compiled) into your Xcode project.
    • Ensure it is part of your iOS target’s resources.
  3. Add the Swift pipeline

    • Add apple/ml-stable-diffusion as a Swift Package dependency in Xcode. (Hugging Face)
    • Use the StableDiffusionPipeline type provided there.
  4. Minimal Swift usage

    Rough sketch based on Apple’s examples:

    import StableDiffusion
    import CoreML
    
    final class SDRunner {
        private let pipeline: StableDiffusionPipeline
    
        init() throws {
            // Folder you added to the bundle
            let url = Bundle.main.url(
                forResource: "sd15_original_compiled",
                withExtension: nil
            )!
            let config = MLModelConfiguration()
            config.computeUnits = .all
    
            pipeline = try StableDiffusionPipeline(
                resourcesAt: url,
                configuration: config,
                disableSafety: true  // or false, if you add the safety model
            )
        }
    
        func generate(prompt: String) async throws -> CGImage {
            let result = try await pipeline.generate(
                prompt: prompt,
                imageCount: 1,
                stepCount: 25,
                seed: 42
            )
            guard let image = result.images.first else {
                throw NSError(domain: "SD", code: -1)
            }
            return image
        }
    }
    

    This pipeline has been tested across macOS and iOS, and is the safest way to get SD working in a real iOS app.


3. How to find “Xcode/iOS-friendly” models on Hugging Face

Here is a practical approach to find models that already works well with Xcode/iOS on Hugging Face.

3.1 Use the model search filters

On the Hugging Face “Models” page:

  1. In the Library filter, select coreml.

    • URL looks like: https://huggingface.co/models?library=coreml (Hugging Face)
  2. Optionally filter by:

    • Task: “Text Generation” for LLMs, “Text-to-Image” for SD.
    • Organization: apple for Apple official models, pcuenq, TKDKid1000, etc. for known CoreML contributors. (Hugging Face)

The “Core ML” library tag means:

  • The model is either directly a .mlmodel/.mlpackage, or
  • There is explicit code / metadata on the card about converting to Core ML.

3.2 Look for CoreML-specific keywords on the model card

On a Hugging Face model page, scan for:

  • “CoreML” or “Core ML” in:

    • The Tags (e.g. Core ML, Apple, ANE). (Hugging Face)
    • The Model card text: phrases like “This repository contains Core ML model files for …” or “converted to Core ML”. (Hugging Face)
  • Concrete file names:

    • .mlpackage, .mlmodelc, .mlmodel under Files and versions. (Hugging Face)

These tell you the model is already packaged in the format Xcode/CoreML can load directly.

3.3 Prefer repos with usage instructions or linked Swift code

Good signs on a model page:

  • A “Usage” or “CoreML” section that explicitly mentions:

    • MLModel(contentsOf:configuration:) usage, or
    • StableDiffusionPipeline, Swift Transformers, etc. (Hugging Face)
  • Links to GitHub repositories or sample iOS/macOS apps:

    • apple/mistral-coremlhuggingface/swift-transformers, swift-chat. (Hugging Face)
    • apple/coreml-stable-diffusion-*apple/ml-stable-diffusion, swift-coreml-diffusers. (Hugging Face)

If a model card just says “here is a .mlpackage” with no Swift instructions, it can still work, but you will have to explore the inputs/outputs yourself. Models with sample Swift code are much easier to get working quickly.

3.4 Prefer official or widely-used orgs

For iOS/Xcode, these orgs are good bets:

  • apple (e.g. apple/coreml-stable-diffusion-*, apple/mistral-coreml) (Hugging Face)

  • Well-known CoreML experimenters:

    • TKDKid1000 (TinyLlama CoreML) (Hugging Face)
    • pcuenq (CoreML SD packs) (Hugging Face)
    • Authors whose cards clearly mention testing on Apple Silicon devices.

3.5 Avoid bleeding-edge conversions for now

Models like andmev/Llama-3.1-8B-Instruct-CoreML are valuable experiments, but:

  • They may use the newest ML Program + quantization features (4-bit blockwise, palettization) that rely heavily on constexpr_* ops.
  • iOS 18 support for those code paths is still maturing, which is exactly why you see constexpr_blockwise_shift_scale related crashes.

Until Apple documents that a specific large Llama 3.1 CoreML model is tested on iOS, you’re safer with:

  • Smaller Llama 3.2 CoreML exports (1B/3B), or
  • Officially supported CoreML conversions (Mistral 7B, SD models).

4. Putting it all together

To directly answer your questions:

  1. Working llama(text) / StableDiffusion(image) models with Xcode/iOS + install steps

    • Text (LLM):

      • TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML – small .mlpackage that loads on Apple Silicon, good to smoke-test iOS integration. (Hugging Face)

        • Download via huggingface-cli, add .mlpackage to Xcode, load with MLModel(contentsOf:configuration:).
      • apple/mistral-coreml – official CoreML Mistral-7B with documented integration via swift-transformers and swift-chat. (Hugging Face)

    • Stable Diffusion:

      • apple/coreml-stable-diffusion-v1-5 (and v1-4, 2-1-base) – official CoreML SD repos. (Hugging Face)

        • Download original/compiled with huggingface-cli, add to Xcode, then use apple/ml-stable-diffusion’s StableDiffusionPipeline in Swift.
      • Optionally pcuenq/coreml-stable-diffusion-v1-5 – community assembled CoreML SD packs tested with Apple’s pipeline. (Hugging Face)

    These are the combinations that are actually used in existing macOS/iOS sample apps, so they are “known good”.

  2. How to find such models on Hugging Face

    • Use the model search with the coreml library filter. (Hugging Face)

    • Look for:

      • Tags: Core ML, Apple, ANE,
      • Files: .mlpackage, .mlmodelc,
      • Model cards that explicitly mention iOS, Xcode, Swift or Apple Silicon support.
    • Prefer models:

      • From the apple org or from authors with clear CoreML focus,
      • That include sample Swift usage or link to sample apps (swift-transformers, swift-coreml-diffusers, ml-stable-diffusion). (Hugging Face)
    • Avoid very new / large conversions that depend on advanced 4-bit ML Program features unless you are explicitly experimenting (like andmev/Llama-3.1-8B-Instruct-CoreML), because they can hit BNNS bugs such as your constexpr_blockwise_shift_scale crash.


Short bullet summary

  • Your crash with andmev/Llama-3.1-8B-Instruct-CoreML is due to a CoreML/BNNS bug in an advanced quantization op (constexpr_blockwise_shift_scale) on iOS 18, not a mistake in your Xcode code. (Hugging Face)

  • For text LLMs that work well in iOS apps:

    • Start with TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML (small, simple). (Hugging Face)
    • Move to apple/mistral-coreml for a full-strength 7B model with official Swift tooling via swift-transformers and swift-chat. (Hugging Face)
  • For Stable Diffusion:

    • Use apple/coreml-stable-diffusion-v1-5 (and siblings) with Apple’s ml-stable-diffusion Swift pipeline. (Hugging Face)
  • To find more models:

    • Filter HF models by library=coreml, look for .mlpackage / .mlmodelc, and prioritize repos with Apple / CoreML tags and Swift usage examples. (Hugging Face)

Thanks so much. I appreciate the effort.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.