Actual iOS usage in app
He everyone.
I tried to use this model with Flutter App using MediaPipe inference, it works on Android but crashes on iOS because of memory issues (Cannot allocate memory [type.googleapis.com/mediapipe.StatusList]).
I used iPhone 16 Pro Max (8GB RAM).
As I understand, the problem is that model is too big to fit into memory constrained iOS requirements for the apps.
I used gemma-3n-E2B-it-int4.task file with 3.14 Gb size. And as I see gemma-3n-E2B-it-int4.litertlm is even bigger.
Could anyone suggest me how to use this model in iOS?
Hello, did you find solution?
2) How do you put the model, deos it inter the app though the app package or you install it later? are we allowed to make a user download the model from us?
3) optional: can you share your flutter code?
Hi @HeyGoodEnough I use the model from here: https://huggingface.co/google/gemma-3n-E2B-it-litert-preview/tree/main (with .task format) but thinking about to switch to .litertlm but not sure whether it working properly. Worth to try.
The iOS solution, as I undersrtand, is to use Memory entitlements in iOS, it needs $100/year subscription. I am still use it in Android only.
Hey @Serjio42 , apologies for the delayed response.
You are correct that this is a memory issue, but it's important to clarify exactly what's happening on IOS.
Even on devices with 8GB RAM, iOS enforces strict per app memory limits via the Jetsam daemon. A 3.14 GB gemma-3n-E2B-it-int4.task model is already very close to the practical per-process limit. once you include runtime locations, the app exceeds it's memory budget and is forcefully terminated with Cannot allocate memory [type.googleapis.com/mediapipe.StatusList]
You can verify few things to optimise this:
- Do not load the model into Dart memory. This duplicates the entire 3+ GB model in the Dart heap before it even reaches the native layer. Instead, pass only the file path to the native layer and let MediaPipe memory map the model directly from storage.
- Reduce runtime memory usage. The KV cache grows with context length. Set a conservative
maxTokensinLlmInferenceOptionsto reduce your peak allocation. - Add iOS memory entitlements. Ensure you have added
com.apple.developer.kernel.increased-memory-limitandcom.apple.developer.kernel.extended-virtual-addressingto your iOS entitlements file. Note that these are requests, not guarantees, iOS may still terminate the process under heavy memory pressure.
Realistically a 3+ GB model is at the absolute edge of what iOS reliably allow in the sigle app process. If you continue t see the crashes even after the above optimisations, the practical solution is to use a smaller quantised model variant for iOS.
This is not a MediaPipe bug, it is the expected iOS memory constraint behaviour when deploying very large on device LLMs.
Thanks