Croatia - Flag Croatia

Incoterms:DDP
All prices include duty and customs fees on select shipping methods.

Please confirm your currency selection:

Croatian Kuna
Free shipping on most orders over 400 kn (HRK)
Payment accepted in Credit cards only

Euros
Free shipping on most orders over 50 € (EUR)
All payment options available

US Dollars
Free shipping on most orders over $60 (USD)
All payment options available

Bench Talk for Design Engineers

Bench Talk

rss

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics


Transferring Human Knowledge to AI Michael Matuschek

Sentiment Analysis: The Case for Context- and Culture-Sensitive AI

 

(Source: hoelixDE/Shutterstock.com)

In 2020, more than 4.4 billion internet users were producing a staggering amount of data through social-media posts, reviews, recommendations, and similar interactions. The insight gathered from this data is invaluable in guiding businesses and innovators through product development, marketing, and customer support. However, extracting that insight is challenging as opinion-oriented, customer-provided data is difficult for machines to understand and interpret because of complexities in human language and cultural context. Tools such as natural language processing (NLP) and machine learning (ML) enable computers to understand and derive meaning from human language. Furthermore, an advancing research area in artificial intelligence (AI) called sentiment analysis helps machines understand unstructured, customer-provided data and interpret opinions as positive, negative, or neutral.

Language Complexities in Semantic Analysis

To understand sentiment analysis in NLP, let’s look at this simple statement from a restaurant review: “The soup was good.” An analysis of the sentiment requires three actions:

  • Identify whether a statement, sentence or the whole text contains an opinion.
  • Understand whether the opinion is positive, negative, or neutral (called polarity).
  • Identify the target of the opinion.

In this instance, the sentiment analysis is unambiguously positive concerning a particular food served at the restaurant. However, other examples are less straightforward, as in a seemingly similar clause, “The beer is cold.” Many would consider this opinion positive because they like beer this way, but cold can have a negative polarity in other contexts. For example, “The coffee is cold” uses an identical sentence structure and adjective, but many people would consider cold coffee to be negative.

Other language complexities create additional challenges, such as sentences that contain multiple sentiments, for example: “The food was good, but the soup was cold.” Here, we have a positive, negative, and an ambiguous sentiment, depending on the customer’s preference for soup temperature. Similarly, “The soup was hot, but the beer was cold” would be positive sentiments for most people but ambiguous given potential customer contexts.

Modifiers further blur the line between polarities. For instance, consider the opinion: “The staff were almost too friendly.” Here, we must also think about irony, sarcasm, or figures of speech, making it challenging to identify sentiment correctly. Examples such as “We waited for more than an hour, really great service!” tend to be rare in training data and extremely difficult to encode manually in a systematic manner.

Cultural Variables in Semantic Analysis

Assigning polarity to opinions becomes even more challenging when considering personal, cultural, or circumstantial preferences. For example, consider analyzing customer reviews for a ryokan, a traditional Japanese guest house that is typically fancy and expensive but features a common bathing area rather than private bathrooms. Categorizing the absence or presence of something as positive or negative seems straightforward—for example, “There was dirt in the shower” or “There was a pool for the kids.” However, the ryokan example demonstrates how accounting for cultural variables and personal preferences is essential in attaining useable insights for data. In Japan, guests believe shared bathing areas to be a positive attribute. By contrast, most European travelers would view it negatively, particularly at an expensive hotel. This example highlights just one feature and two cultures.

Addressing Language and Cultural Variables in NLP

In NLP, sentiments can be analyzed at the whole-document level and at the paragraph and sentence levels, with results often then aggregated. Although whole-document analysis is useful, paragraph and sentence-level analysis can yield more granular and correspondingly accurate results (such as identifying sentiment about a particular product feature in addition to the complete product). The challenge comes in developing a lexicon—the set of rules that machines use to classify sentiments as positive, negative, or neutral. Many free tools and resources are trained on public data as a starting point. For instance, software libraries such as Natural Language Toolkit, spaCy, and TextBlob include sentiment models and retraining with user data. If you prefer not to code, cloud offerings such as Google Cloud Platform or Microsoft Azure enable you to get started with sentiment analysis immediately: Simply paste the text to be analyzed into a browser and build your application from there.

Beyond prototyping, data sets and ML models should address language and culture complexities. This means:

  • For planning: Find structured approaches to discovering variables and useful insights. For instance, analyze your data for underlying languages and cultures, tone, sources, author demographics, and then consult linguists to interpret those elements. Further improve your approach by interviewing people who belong to the author group to get a precise understanding of nuance and context.
  • For training data: Identify examples needed to address variables and include human-provided annotations. It might also mean revisiting knowledge bases such as dictionaries, adding more training data for the particular problem, or in some cases, removing problematic or misleading examples from your data if they do more harm than good.
  • For modeling: Find a method of representing sentences in a mathematically processable way. For example, word embeddings, which represent arbitrary text as numerical vectors, are useful for mapping words used in context to corresponding positive, negative, or neutral sentiments. Ideally, data analysis would be based explicitly or implicitly on individual customers’ preferences; however, this analysis is cumbersome and, in many cases, not possible if a user is not identifiable. A more accessible approach is to analyze data according to region and language. Then, model cultural differences with separate training examples.

Conclusion

Customer-provided data from media posts, reviews, recommendations, and the like provide invaluable insights for businesses and innovators. Complexities in natural language and cultures make it difficult for AI-driven machines to understand customer opinions. However, sentiment analysis can help ensure that these aspects are captured and reflected in insights. You can get started by using freely available tools and resources, but addressing complexities in language and culture is challenging, requiring significant planning, data prep, and modeling. Raising awareness about language and culture complexities is an excellent start in gaining useful insights and a highly valuable way to better understand your customers and their needs.



« Back


Michael Matuschek is a Senior Data Scientist form Düsseldorf, Germany. He holds a Master’s Degree in Computer Science and a PhD in Computational Linguistics. He has worked on diverse Natural Language Processing projects across different industries as well as academia. Covered topics include Sentiment Analysis for reviews, client email classification, and ontology enrichment. 


All Authors

Show More Show More
View Blogs by Date