Britons are still more polite – but for how long?
There are numerous stereotypes about the British – their love of tea, the “stiff upper lip”, and of course their politeness. But does this politeness extend to intelligent virtual assistants, such as Artificial Solutions’ “our natural language app”? The opposite of this stereotype is that of the rude Yankee. With our natural language app as our test lab, can we see if there really is a difference between supposedly well-behaved brits and their allegedly more abrupt English speaking cousins across the Atlantic? The question might seem light-hearted, but a common problem in all decision making is the reliance of intuition (and sometimes stereotypes) when data might provide a firmer ground.
The first step in this case is to determine how politeness can be measured. However, a very basic indicator of politeness in English is the use of please. Armed with one month’s worth of our natural language app logs, some 286331 utterances from users in 171 countries, we can see how often British and American natural language app users include please in their inputs. The logs are easily retrieved from the Teneo Analytics API, which is purpose-built for searching and retrieving conversational data. Thanks to the open architecture of the API, an integration library makes it easy to analyse and visualise the results in the statistical platform R.
The figure below compares the proportion of inputs containing please in our natural language app user inputs from the UK, USA, and other countries. The black dots mark the usage of please by users in the UK and USA, whereas the average of all countries is represented by the dotted horizontal line. The vertical axis is the proportion of inputs with please in them, so in this figure the higher regions of the plot mean higher usage of please.
At first sight, it seems that the stereotypes hold water: the British use please around 3 times more often than the Americans. The vertical lines extending from the black dots in the figure indicate confidence intervals, ranges of reasonable uncertainty around the numbers. As the plot shows, the difference between the two groups is far greater than the uncertainty, since there is no overlap between the confidence intervals. Likewise, both are quite different from the overall average usage of please based on users from all countries.
But geographical location is only one indicator of how people speak. Another important factor is age. If we take age groups into consideration, do we still find the same pattern? The Teneo Analytics API makes it easy to retrieve not only linguistic data, but also metadata such as the user’s age alongside the inputs. The figure below shows the proportion of inputs containing please by age group, for UK and US users.
Clearly, the figure tells a different story when age is taken into consideration. We can immediately see the difference for age groups from 50 and up. However, for users below the age of 50, there are no systematic or consistent differences in the usage of please between our natural language app users in the UK and USA.
How are we to interpret these results? One possible interpretation is that a generational shift in the use of please is underway in the UK. However, another possibility is that in conversation with other humans, these British under-50s are still using please at similar rates as their elders, but that they modify their style when speaking to a virtual assistant. Unlike the common stereotype, the our natural language app log data, based on how the users actually speak to the bot, are both more precise and more interesting. Above all, it illustrates the importance of consulting data instead of relying on intuition alone.