Teneo vs. Google Dialogflow, IBM Watson Assistant and Microsoft Luis
Our R&D team are often running programmes to help the next generation of conversational AI experts develop their skills. Recently they assigned two students from Sweden’s prestigious KTH Royal Institute of Technology to test how easily and quickly they could carry out a series of tasks in three other competitive products compared to Teneo.
Max Kihlborg and Adam Lilja were assigned to evaluate conversational AI platforms on how well they perform based on a set of criteria: ease-of-use, efficiency, experience working in the software and what results to expect from each platform.
They were to complete the tasks set out in the exercises and evaluate them based on some of Nielsen’s heuristics:
- Consistency and standard
- Recognition rather than recall
- Flexibility and ease of use
Not Everyone Made the Final Cut
Unsurprisingly we did come out top. 😉 The exercises contained some situations we knew our competitors couldn’t handle. But, since they cover some basic tasks that we think enterprises would frequently use, we felt it was justified to include them.
Of the three competitive products chosen: Google Dialogflow, Microsoft Luis and IBM Watson, Luis was removed from the test early on. It functions only as an intent classifier and is used to determine what the customer wants. But since this is only small part of the functionality required of a conversational AI development platform, it was decided that it would be unfair to leave Luis in the test.
Observations Made in Completing the Tasks
The report noted that while Dialogflow has an intuitive way of marking entities from training examples, it isn’t easy to access the same entity from different words. So if, for example, you wanted to match a city to a corresponding food speciality, you’d have your work cut out.
Our intrepid testers used synonyms as a work around but ran into problems again trying to access those synonyms once they were created.
Issues were also encountered when sending values and parameters through different intents. In their report Kihlborg and Lilja noted that there was no easy way to say, “Hey after this flow, if the user asks about this or that I want to go to this flow, and I want to retain these specific pieces of information the user has already provided”.
In general Dialogflow performed well at distinguishing between similar intents such as “What’s the weather like in Stockholm” and “Will it rain today in Stockholm“. Recognizing that one is about weather and the other about a weather condition. However, in another test it failed to recognize the difference between plurals such as ticket and tickets.
Furthermore, although it coped with the conversation being interrupted and changed to a different topic, it couldn’t handle returning to the previous conversation.
The report’s overall conclusion was: “Performs well at simple tasks but falls short when facing tasks that require interaction between user intents“.
IBM Watson Assistant
Similar issues were found with Watson Assistant. It could not always differ between origin and destination in a flight booking scenario.
This became apparent when using a test question of “I want to book a flight to Stockholm” and the bot would respond with “to where?” instead of “from where?“.
The report noted that while “The user interface is easy to understand… the problem may be the lack of functionality. For this example we would’ve needed to create two entities, origin and destination instead of only using the beta-sys.location to be able to distinguish where the user wants to go and where it wants to depart from”.
Slot-filling was also not user friendly. Enabled through a “customized” section of the platform, which seemed hidden in comparison with other functionality, it took some time for the testers to understand.
Kihlborg and Lilja also highlighted that it took a long time for “Watson to train when updating intents, entities etc. which adds up when updating intents multiple times in a session“.
Furthermore, adding user examples was also time consuming because they had to be appended one at a time rather than including multiple examples in one go.
Other time saving functionality, such as some standard pre-built intents and the ability to create an intent with multiple rows, was also missing. Watson Assistant was also unable to handle an interruption during a request for information. Instead, the chatbot continued to re-prompt the same original query.
The report concludes that although “not especially intuitive it held up during the more advanced tasks“.
At first sight Kihlborg and Lilja said they found Teneo a little overwhelming. Small laptop screens weren’t really suited to developing flows and there were a large amount of buttons on the dashboard.
But once they understood how to use Teneo, they felt it made sense to have all the functionality easily accessible and that the layout enabled them to have a better overview of the whole project.
“Teneo has a great visual representation of each flow and it is very easy to follow the end users path. Also when chatting with the bot, you receive inputs on which flows were triggered”.
The report says that the process of writing a condition was not very intuitive and that it needed to refer to the Teneo documentation for the correct syntax. However, it did acknowledge that Teneo was a powerful tool when it came to tasks such as identify certain entities in a specific order.
Kihlborg and Lilja found it very easy to add a link to an existing flow and choose which variables to send along the way. This included setting the “destination/origin” variables as variables from the flow about to be exited.
Handling follow up questions and interruptions was also easy. “When wanting to be able to interrupt a flow, this functionality was simply added by making a node revisitable“.
The report concludes that “Teneo is harder to grasp early on, but after a while the pieces fall in place and developing complex chatbots suddenly becomes simple“.
Number of Steps Versus Time Spent
“The results indicate that on average, Teneo required the fewest steps per task to completion, followed by Watson Assistant and lastly Dialogflow“.
Kihlborg and Lilja argue that because the actual difference in the number of steps is less than 10% that this is not a valid metric.
Instead, they look towards the time spent on tasks. Between creating work arounds and delays in waiting for training updates to complete, the time taken for some tasks was heavily increased in both Dialogflow and Watson Assistant.
In addition, the report highlighted that this could be further exasperated by delays in internet speed since both programmes are web based, whereas Teneo is computer based.
The Overall Results
One of the key conclusions of the report was that “the platforms differ quite widely as tasks become more complicated or complex“.
Teneo was the only platform in which the testers could complete all the ten tasks. Kihlborg and Lilja were only able to complete 6 tasks with Watson Assistant and 5 tasks in Dialogflow.
Despite the initial slow start, Teneo was given the highest average rating in how easy it was to understand the user interface, followed by Watson Assistant and lastly Dialogflow.
“In the beginning Dialogflow and Watson Assistant outperformed Teneo. But as the tasks grew more complicated and complex Teneo rose in rating whilst the others fell”.
The Next Step
Let me stress a point that I made at the start of this article; I’m not trying to pass this off as a fully independent analysis with a statistically valid sample size. The students did all they could to remove bias but inevitably I’m sure some crept in. Nonetheless, the research more than hints at the strengths of Teneo as a robust, enterprise-ready, enterprise-strength conversational AI platform.
Indeed, this is the finding of at least three further independent analyses carried out by partners and prospects – unfortunately all of these are currently protected by NDAs and can’t be published at this moment.
So, what next? Well ,we’re exploring how we might extend this research in cooperation with an independent body such as a university and build out a more statistically robust, unbiased analysis.
In the meantime however, the best way for you to assess the various platforms is to give it a go for yourself and you can put Teneo through its paces in the developer environment available at www.teneo.ai.
Try Before You Buy.
Build. Deploy. Analyze.
Sign up to get your own developer sandbox of Teneo containing all the tools needed to build and manage advanced conversational solutions.