Educational accuracy

Through close collaboration with a Singapore university's teaching team, Tutello engineered its platform to achieve 97.5% accuracy.

Consistent accuracy is hard. In the AI industry there is very often a yawning gap between the demo and real life results. It's relatively easy to set up a demo that looks amazing, with everything lined up, inputting preset and well-tested examples. Once human beings start testing the tool with everyday awkward examples, the shine of the demo quickly begins to fade.

What is accuracy?

In the educational environment, accuracy goes beyond just factual correctness.There may be several "correct" answers which are important in the context of the course.How did you get to the answer? There are often different paths; the teaching team may have picked a particular path because it works well with the rest of the curriculum. Showing a student a different path may cause confusion.

How is the answer and the route to it presented? Teaching is rich in variety, culture and nuance. No two universities, faculties or professors will present concepts to students in the same way. If a platform is truly an extension of a tutor, then it must mimic that tutor's style.

The challenge

These accuracy problems were some of the challenges the Tutello team faced when collaborating with the teaching team of a foundational Maths and Statistics course, with a 3000-strong cohort, at a large university in Singapore. The teaching team set a very high bar for Tutello to reach before the platform could be rolled out. To get there, we redefined the way our platform responds to student input.

Content import pipeline

Content meant for human consumption isn't always in the best format for LLMs to make sense of. Instead of just pouring text into a generic database, our import pipeline transforms content into an intermediate format that works better with LLMs. We divide it, relate it and organise the content in a way that means we can return a paragraph, a page, a chapter or section as required by the content researcher. The importer can be connected with LMS APIs, syncing every night so new content added by teaching staff is available for use with no action required. Our API scanning also respects content that is withheld from students.

Content researcher

Prior to this project, we did a simple vector search of content and returned the top results – chunks of text sliced from documents and provided them as content to the LLM. The new content researcher is a much more complex and intelligent component. The researcher uses several different methods to search content to root out the most relevant content and then rate and rank the results. If the results aren't adequate to assist with the query, it adapts the search parameters until it is sure it has the best results. The researcher will then make sure it has a wide enough context – a concept in one particular paragraph might be dependent on an exception from the previous page.

Reference images

Many concepts are best communicated by diagram, chart or image. As well as text, Tutello can return images from your content that are relevant to a student query.

Code generation and execution

Some concepts and examples, particularly in STEM subjects, require complex calculations to illustrate the teaching point. Although an LLM can attempt these types of calculations, it is not their strongest point. They are, though, very good at writing code to do these calculations. When this type of example is required in a question, Tutello writes code on the fly, executes it in an entirely separate sandboxed environment, then uses the results – the calculation itself, tables or charts – as part of the response to the student.

Answer generation

Our platform tunes a base prompt based on example questions and answers that our teams can then further refine until AI responses are in tune with your own.

Post checking

Just to be sure, we then run a cross-check on the draft generated response. The post-check looks for any calculations in the text and cross-checks them deterministically, making any corrections as it goes.

The result

The teaching team had a rigorous testing system. Before the project, we did poorly. After the project, we scored 97.5% in their ranking on the most difficult challenges involving nearly 200 questions. This was well above expectations and well clear of generic LLM reference testing.None of this could have been achieved without the close collaboration of the teaching team themselves. The project is now is now in action helping the 3000 students and their teaching team tackle a difficult course.

We are looking for more of these close collaborations. If you think you have a project that is a challenge, please get in touch.