top of page
Writer's pictureDominic Bristow

Apple's MM1: the game changer you haven't heard of




Apple’s MM1 — an absolute game changer for teachers. Heard of it? I’m not surprised.


Apple has quietly released details about the work they have been doing on their own Large Language Model (LLM) via the ‘backdoor’ — in this case, that backdoor was arXiv.org, an open-source repository of research articles about physics, mathematics and computer science amongst other disciplines, the latter of which explains most of its increasing popularity over the last year or so. The research paper from staff at Apple arrived without fanfare last thursday, and outlines work on a model currently named MM1.


So why have Apple released information about their work in this way, what is the result of the work itself, and what are the implications for education?


*


arXiv?


Let’s start with the first question: why use arXiv (as in 'archive') to announce this work? Indeed, why would Apple announce it at all given their history as meticulous purveyors of ground-breaking proprietary technologies? This part of my note, is pure speculation on my part — in a ‘catfight’ of increasing ferocity between the proprietary AI teams and those in the Open Source camp, it makes a lot of sense for Apple to keep one foot either side of the line. What they’ve done with this paper is release a huge amount of IP in the technical details about how they trained their new models, but they are hugely unlikely to release the model itself — uploading all the ‘weights’ of a model itself is what defines it as open sourced. It’s as if they’ve published the recipe, without specifics on the ingredients used, and only offering the meal in their restaurant.


What game are they changing?


Apple have been quiet in the AI race, so far, only putting their head above the parapet to acquire DarwinAI earlier this year, a Canadian company geared towards vision technology, and deploying / optimising AI systems on small devices. I wasn’t surprised when I first heard about that, but it made even more sense when I read the arXiv paper —the paper outlines both technical process and performance statistics for a new family of models which represent a new state of the art in particular areas. They are somewhat characteristically nonchalant or magnanimous about the quality and performance of the models, but it is clear what their focus was in training these new models. Dive through all the detail about how they cooked it up, and it’s clear that the image processing and reasoning capabilities are as good or better than the current state of the art GPT4V model at a comparable number of parameters (the ‘size’, ‘knowledge’ or slightly more loosely ‘power’ of the model). But the real ground that’s been broken is that they retain this image performance when they shrink the model down to a tenth that size, at 3B parameters. This comparatively tiny 3B model is small enough to run on mobile devices. Alongside the DarwinAI acquisition, and Apple’s status as now the best selling mobile phone provider in the world, and you can see the dots joining up before your eyes.


So why does it matter? And for education?


The direct implication of that is simple — an AI model running locally means unlike logging in to OpenAI or any other website to use GPT4 or similar model, instead with no subscription fees, no data charges and, indeed, no need for signal at all. Your phone has a powerful AI system in it you can talk to, talk to other people through, and organise your work and life with. And the fact that this all seems to be focused towards its image processing capabilities, means all the magic of multi-modal language models in your iPhone.


As for education, I’m no soothsayer, but some of what this will make possible represents an inevitable future. When I speak with people in person about what we are doing at Stylus, I use an example question loaded up on my phone using OpenAI’s models. Booting up a stylus-namesake GPT called LearnCycleBot (‘GPT’ here being one of the pre-training apps that you can use through OpenAI GPT4 environment, as Hannah has discussed here), I ask the attendees the question on the screen: “who can draw me a fractional distillation column and describe how it is working with reference to the material entering the column and its behaviour at molecular level?” It’s always a joy to see the senior leaders I’m speaking to grimace at the idea of having to remember their GCSE science, but it’s nothing to the facial expression you see when they have their hurriedly (and normally horribly inaccurately) completed response they have done in biro on the plain paper in front of them marked on the spot by the very same technology. It’s a real penny drop moment for most people. Some are gobsmacked. Some simply repeatedly nod, silently smiling in quiet understand of the possibilities. 


For a model that is even more powerful than this state of the art and expensive to run model to be able to be run from the electronic brick that’s in everyone’s pocket with no subscription or reception for free has profound implications for the support of teachers in classrooms everywhere, and for the time saving possibilities within in the beloved but burdensome job we all know — there’s a lot we’d have our phones do if we could, to allow us to do the irreplaceably human function of teaching a class full of children. We embrace this technology at Stylus, and offer away to harness the potential of these models so that consistent quality of marking and feedback can be delivered at enormous scale. If you have a model running on your iPhone you will be able to get reasoned feedback from a student response right there in your classroom. But it will not mark the work like it would in reference to an exam specification, detailed marking notes and previous student exemplars, neither will it be able to automatically store the outcome of its insight or feedback without an environment that sorts and stores the work and the assessment decisions as part of a cumulative representation of everything we can know about that student’s work. 


An engine on its own goes nowhere


These models are the engine that make a new type of autonomous assessment machine possible. Apple have taken a huge stride forward here with the MM1 model in terms of taking the potential of vision processing technology and making it accessible to all. But there are many other parts to the machine in education. Which, to me, is why Apple are happy to be describing in such exquisite detail the work they’ve done to get a model so small to perform so well. They are the rest of the machine. They produce the parts that make the engine useful, rather than simply spinning its axel in mid-air. And that’s why I like to start conversations with a badly drawn fractional distillation column in biro on paper. Because the technology is there now, and it’s amazing to see. But how do you get that into your school systems? How do you automate the insight into a piece of student work, with consistent and detailed feedback that matches your exam specifications? Well in our paradigm, that’s where Stylus comes in.


But don’t trust me, find out yourself with a free trial of our AI-powered marking and teacher moderated feedback.

68 views0 comments

Comments


bottom of page