I’m not familiar with iOS…
I’ll go step by step:
- Why your current Llama-3.1-8B CoreML model is crashing.
- Concrete, known-working llama-family (text) and Stable Diffusion (image) models for iOS, with install steps.
- How to systematically find iOS/Xcode-friendly models on Hugging Face.
1. Why andmev/Llama-3.1-8B-Instruct-CoreML crashes on iPad
You are using:
andmev/Llama-3.1-8B-Instruct-CoreML, which contains a single large llama_3.1_coreml.mlpackage (~4.5 GB). (Hugging Face)
Your runtime error:
ios18.constexpr_blockwise_shift_scale → std::bad_cast inside BNNS, then EXC_BAD_ACCESS while loading the Core ML model.
What this means, in plain terms:
- Core ML 8 / iOS 18 added new “constexpr_*” ops (like
constexpr_blockwise_shift_scale) to support advanced quantization schemes (4-bit blockwise, palettization, etc.) and large-model optimizations. These are used at compile time by the Core ML “ML Program” backend before BNNS/ANE kernels are created.
- The
andmev model is a converted/quantized Llama-3.1-8B. From the size and packaging, it is highly likely it uses such compression / blockwise-quantized weights so that 8B parameters fit in ~4.5GB. (Hugging Face)
- On your device / OS build, BNNS is hitting an internal bug while compiling or running that op; it throws
std::bad_cast in Apple’s code, and you only see the resulting EXC_BAD_ACCESS. There’s nothing you can “fix” from Swift.
So: you’re not doing anything wrong in Xcode. That particular CoreML export is hitting a relatively fresh, iOS-18-only code path that is still fragile.
The practical consequence:
2. Working models for Xcode/iOS, and how to install them
I’ll split this into:
- Text LLMs (llama-family-ish)
- Stable Diffusion (image generation)
2.1 Text: good CoreML LLMs that actually work on Apple devices
2.1.1 TinyLlama CoreML – small, simple, good for smoke tests
Hugging Face: TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML (Hugging Face)
-
What it is:
- Core ML conversion of
TinyLlama/TinyLlama-1.1B-Chat-v0.3, a 1.1B-parameter chat model. (Hugging Face)
- Purpose-built “Core ML” repo:
.mlpackage files optimized for Apple Silicon (Mac + iOS).
-
Why it’s a good start:
- Small size → far less likely to run into memory or weird corner cases.
- Good “first LLM” to verify that your Xcode + CoreML + iPad pipeline is correct.
Install / use steps (Mac → Xcode → iPad)
-
Download the CoreML package (on your Mac)
Using huggingface_hub CLI:
pip install -U "huggingface_hub[cli]"
huggingface-cli download \
TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML \
--include "*.mlpackage/*" \
--local-dir tinyllama_coreml \
--local-dir-use-symlinks False
You should see at least one .mlpackage folder in tinyllama_coreml/. (Hugging Face)
-
Add .mlpackage to Xcode
-
Load the model in Swift
Because this is an ML Program .mlpackage, you usually load it dynamically:
import CoreML
final class TinyLlamaRunner {
private let model: MLModel
init() throws {
let url = Bundle.main.url(
forResource: "TinyLlama-1.1B-Chat-v0.3-CoreML",
withExtension: "mlpackage"
)!
let config = MLModelConfiguration()
config.computeUnits = .all // CPU + GPU + ANE
self.model = try MLModel(contentsOf: url, configuration: config)
}
func generateLogits(inputIds: [Int32]) throws -> MLMultiArray {
let inputArray = try MLMultiArray(
shape: [NSNumber(value: inputIds.count)],
dataType: .int32
)
for (i, id) in inputIds.enumerated() {
inputArray[i] = NSNumber(value: id)
}
// Names depend on the model; check FeatureDescriptions.json in the package
let input = try MLDictionaryFeatureProvider(
dictionary: ["input_ids": inputArray]
)
let out = try model.prediction(from: input)
return out.featureValue(for: "logits")!.multiArrayValue!
}
}
You can inspect Data/com.apple.CoreML/FeatureDescriptions.json inside the .mlpackage to confirm input/output names and shapes.
-
Add tokenization / decoding
- Do tokenization with HF’s
tokenizers on the device (using a Swift wrapper or calling into a tiny Python server), or
- Pre-tokenize on Mac for experiments, just to validate that the Core ML model runs and returns sane logits.
Once this works on your iPad M3, you know the basic Core ML runtime is healthy.
2.1.2 Apple Mistral CoreML – strong 7B, officially maintained
Hugging Face: apple/mistral-coreml (Hugging Face)
-
What it is:
- Apple’s official Core ML export of
mistralai/Mistral-7B-Instruct-v0.3 in fp16 and int4 forms (e.g. StatefulMistral7BInstructInt4.mlpackage). (Hugging Face)
- Documentation explains how to download
.mlpackage folders via huggingface-cli. (Hugging Face)
-
Why it’s special:
- This is the model Apple uses in many “Swift Transformers / iOS18” examples.
- The
huggingface/swift-transformers “preview” branch and the huggingface/swift-chat sample app show exactly how to wire it up in Swift on macOS 15 / iOS 18. (Hugging Face)
Install / use steps
-
Download a .mlpackage
Example from the model card: (Hugging Face)
pip install -U "huggingface_hub[cli]"
huggingface-cli download \
apple/mistral-coreml \
--local-dir models \
--local-dir-use-symlinks False \
--include "StatefulMistral7BInstructInt4.mlpackage/*"
-
Add the .mlpackage to Xcode exactly as for TinyLlama.
-
Use Swift Transformers or the sample chat app
This is the “serious” baseline text model I’d pick for iOS once TinyLlama is working.
2.1.3 Smaller Llama 3.x CoreML variants
If you want to stay closer to Llama instead of Mistral, look at smaller CoreML-ready Llama 3.2 models, for example:
- Model:
smpanaro/Llama-3.2-1B-Instruct-CoreML (part of an “Apple Neural Engine LLMs” collection, optimized for ANE). (Hugging Face)
These splits the model into multiple .mlmodelc components (embedding, main, head). They are more work to integrate (you must stitch them together in Swift), but they are explicitly designed for Core ML on Apple devices and are much lighter than 8B.
2.2 Stable Diffusion: CoreML models that work in iOS apps
For image generation, you are much better served by Apple’s official CoreML Stable Diffusion models.
2.2.1 apple/coreml-stable-diffusion-v1-5
Hugging Face: apple/coreml-stable-diffusion-v1-5 (Hugging Face)
- Contains several variants:
original, split_einsum, each with compiled and packages subfolders.
- The
compiled variants are ready-to-use .mlpackage combinations (UNet, VAE, text encoders) for Apple’s Swift Stable Diffusion pipeline. (Hugging Face)
- Apple’s blog post on “Stable Diffusion with Core ML on Apple Silicon” and the
apple/ml-stable-diffusion GitHub repo show exactly how to use these in Swift on macOS and iOS. (Hugging Face)
Install / use steps
-
Download a compiled model
For example, from the model’s README / Discussions: use huggingface-cli and include a compiled folder. (Hugging Face)
huggingface-cli download \
apple/coreml-stable-diffusion-v1-5 \
--local-dir sd15_coreml \
--local-dir-use-symlinks False \
--include "original/compiled/*"
(Make sure git-lfs is installed if you clone, as discussions point out that small pointer files are otherwise downloaded instead of full weights. (Hugging Face))
-
Add the downloaded folder to Xcode
- Drag the entire
original/compiled folder (or rename to something like sd15_original_compiled) into your Xcode project.
- Ensure it is part of your iOS target’s resources.
-
Add the Swift pipeline
- Add
apple/ml-stable-diffusion as a Swift Package dependency in Xcode. (Hugging Face)
- Use the
StableDiffusionPipeline type provided there.
-
Minimal Swift usage
Rough sketch based on Apple’s examples:
import StableDiffusion
import CoreML
final class SDRunner {
private let pipeline: StableDiffusionPipeline
init() throws {
// Folder you added to the bundle
let url = Bundle.main.url(
forResource: "sd15_original_compiled",
withExtension: nil
)!
let config = MLModelConfiguration()
config.computeUnits = .all
pipeline = try StableDiffusionPipeline(
resourcesAt: url,
configuration: config,
disableSafety: true // or false, if you add the safety model
)
}
func generate(prompt: String) async throws -> CGImage {
let result = try await pipeline.generate(
prompt: prompt,
imageCount: 1,
stepCount: 25,
seed: 42
)
guard let image = result.images.first else {
throw NSError(domain: "SD", code: -1)
}
return image
}
}
This pipeline has been tested across macOS and iOS, and is the safest way to get SD working in a real iOS app.
3. How to find “Xcode/iOS-friendly” models on Hugging Face
Here is a practical approach to find models that already works well with Xcode/iOS on Hugging Face.
3.1 Use the model search filters
On the Hugging Face “Models” page:
-
In the Library filter, select coreml.
- URL looks like:
https://huggingface.co/models?library=coreml (Hugging Face)
-
Optionally filter by:
- Task: “Text Generation” for LLMs, “Text-to-Image” for SD.
- Organization:
apple for Apple official models, pcuenq, TKDKid1000, etc. for known CoreML contributors. (Hugging Face)
The “Core ML” library tag means:
- The model is either directly a
.mlmodel/.mlpackage, or
- There is explicit code / metadata on the card about converting to Core ML.
3.2 Look for CoreML-specific keywords on the model card
On a Hugging Face model page, scan for:
These tell you the model is already packaged in the format Xcode/CoreML can load directly.
3.3 Prefer repos with usage instructions or linked Swift code
Good signs on a model page:
If a model card just says “here is a .mlpackage” with no Swift instructions, it can still work, but you will have to explore the inputs/outputs yourself. Models with sample Swift code are much easier to get working quickly.
3.4 Prefer official or widely-used orgs
For iOS/Xcode, these orgs are good bets:
-
apple (e.g. apple/coreml-stable-diffusion-*, apple/mistral-coreml) (Hugging Face)
-
Well-known CoreML experimenters:
TKDKid1000 (TinyLlama CoreML) (Hugging Face)
pcuenq (CoreML SD packs) (Hugging Face)
- Authors whose cards clearly mention testing on Apple Silicon devices.
3.5 Avoid bleeding-edge conversions for now
Models like andmev/Llama-3.1-8B-Instruct-CoreML are valuable experiments, but:
- They may use the newest ML Program + quantization features (4-bit blockwise, palettization) that rely heavily on
constexpr_* ops.
- iOS 18 support for those code paths is still maturing, which is exactly why you see
constexpr_blockwise_shift_scale related crashes.
Until Apple documents that a specific large Llama 3.1 CoreML model is tested on iOS, you’re safer with:
- Smaller Llama 3.2 CoreML exports (1B/3B), or
- Officially supported CoreML conversions (Mistral 7B, SD models).
4. Putting it all together
To directly answer your questions:
-
Working llama(text) / StableDiffusion(image) models with Xcode/iOS + install steps
-
Text (LLM):
-
TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML – small .mlpackage that loads on Apple Silicon, good to smoke-test iOS integration. (Hugging Face)
- Download via
huggingface-cli, add .mlpackage to Xcode, load with MLModel(contentsOf:configuration:).
-
apple/mistral-coreml – official CoreML Mistral-7B with documented integration via swift-transformers and swift-chat. (Hugging Face)
-
Stable Diffusion:
-
apple/coreml-stable-diffusion-v1-5 (and v1-4, 2-1-base) – official CoreML SD repos. (Hugging Face)
- Download
original/compiled with huggingface-cli, add to Xcode, then use apple/ml-stable-diffusion’s StableDiffusionPipeline in Swift.
-
Optionally pcuenq/coreml-stable-diffusion-v1-5 – community assembled CoreML SD packs tested with Apple’s pipeline. (Hugging Face)
These are the combinations that are actually used in existing macOS/iOS sample apps, so they are “known good”.
-
How to find such models on Hugging Face
-
Use the model search with the coreml library filter. (Hugging Face)
-
Look for:
- Tags:
Core ML, Apple, ANE,
- Files:
.mlpackage, .mlmodelc,
- Model cards that explicitly mention iOS, Xcode, Swift or Apple Silicon support.
-
Prefer models:
- From the
apple org or from authors with clear CoreML focus,
- That include sample Swift usage or link to sample apps (
swift-transformers, swift-coreml-diffusers, ml-stable-diffusion). (Hugging Face)
-
Avoid very new / large conversions that depend on advanced 4-bit ML Program features unless you are explicitly experimenting (like andmev/Llama-3.1-8B-Instruct-CoreML), because they can hit BNNS bugs such as your constexpr_blockwise_shift_scale crash.
Short bullet summary
-
Your crash with andmev/Llama-3.1-8B-Instruct-CoreML is due to a CoreML/BNNS bug in an advanced quantization op (constexpr_blockwise_shift_scale) on iOS 18, not a mistake in your Xcode code. (Hugging Face)
-
For text LLMs that work well in iOS apps:
- Start with
TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML (small, simple). (Hugging Face)
- Move to
apple/mistral-coreml for a full-strength 7B model with official Swift tooling via swift-transformers and swift-chat. (Hugging Face)
-
For Stable Diffusion:
- Use
apple/coreml-stable-diffusion-v1-5 (and siblings) with Apple’s ml-stable-diffusion Swift pipeline. (Hugging Face)
-
To find more models:
- Filter HF models by
library=coreml, look for .mlpackage / .mlmodelc, and prioritize repos with Apple / CoreML tags and Swift usage examples. (Hugging Face)