Data science-2013-heekim

A Unified Music Recommender System Using
Users’ Listening Habits and Semantics of Tags
Hyon Hee Kim
Department of Statistics and Information Science,
Dongduk Women’s University

Outline
• Motivation & Objectives
• Overview of the System
• Generation of User Profiles
• A Unified Music Recommendation
• Performance Evaluation
• Related Work
• Conclusions and Future Work

Motivation (1/3)
• In a Social Music Site
– Music recommendation is essential.
– Music recommendation is different from other product recommendation
• Explicit information : Rating system
• Implicit information : the number of plays
• Listening habits-based User Profiling
– Cold Start Problem
• A new users with little information
• A new items with only a few ratings
– Data Sparsity Problem
• Data is very small compared to needed music items

Classic rock
british
pop
rock
• Collaborative Tagging
– A tool for users to represent their preferences about web resources
– Users add keywords which are freely chosen by themselves to web resources
– Using tag data for user profiling in personalized recommender systems
• Tag-based User Profiling
– More Easily added tags without listening to music
– Semantically meaningful tags
Motivation (2/3)

Motivation (3/3)
• In the case of last.fm
• Factual Tags
– 85% of tags
– genre, region, instrumentation
• Emotional Tags
– 10% of tags
– opinion, sentiment, mood
• Personal Tags
– 5% of tags
– to organize, to browse, etc.

Objectives
• A Novel Approach to Music Recommendation
– Combining listening habits and semantics of tags
• Using a Tag Ontology and an Emotion Ontology
– UniTag: Resolving semantic ambiguity of tags
– UniEmotion: Assigning weighted values to the emotional tags
→ Semantically Enhanced Music Recommendation

Outline
• Tag-based User Profiling
– Preprocessing of tags
– Algorithms for generating user profiles
– Preliminary experimental results
• Related Work

Preprocessing of Tags (1/3)
• A tag does not have any pre-defined term or hierarchies of a term
• Problems of tag data
– Synonymy
• Different words represents the same meaning
• E.g., hiphop, hip-hop, hip hop/ R & B, Rhythm and Blues, Blues
– Polysemy
• A single word contains multiple meanings
• E.g., French => French rock, French pop, French artist
– Spelling variants
• misspelling
• Foreign language

• Tag Ontology
– Tags, users, items
• UniTag Ontology
– uniTag:Users
• uniTag:userID, uniTag:hasAdded, uniTag:hasAddedTo
– uniTag:Items
• uniTag:itemID
– uniTag:Tags
• uniTag:tagID, uniTag:tagName, uniTag:RTag, uniTag:subTag,
• uniTag:Rtags {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}
• uniTag:classifiedAs, uniTag:isKindOf, uniTag:istheSameAs, uniTag:tagVariation

• Rules for reasoning prefix
– French rock, progressive rock, post rock=> rock
(Tag (?t) ^ tagPrefix (?t, ?p) ^ Prefix(?p) ^ subTag(?t, ?s) ^ Rtags (?s) ->
classifiedAs (?t, ?s)
• Rules for reasoning expert knowledge
– Soul => rhythm and blues, rhythm and blues => blues then Soul => blues
(Tag (?t) ^ isKindof (?t, ?A) ^ isKindof (?A, ?B) -> isKindof (?t, ?B)
• Rules for reasoning synonym
– Hip-hop, hiphop => hip hop
(Tag(?t) ^tagVariation (?t, ?R) ^ istheSameAs (?t, ?s) -> tagVariation (?s, ?R)

Algorithm for Generating User Profiles (1/2)
Algorithm 1. Generation of A Tag-based Profile
Input: set of Representative tags Tr, set of a user’s tag Tu
Output: set of frequencey for each representative tag of the user FTr
var RTags[] = {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}
var tagFrequency[] = { }, tempFrequency [] = { }
var RTag = null
while ∃next tag t in Tu do
RTag = FindRTag (t)
If Rtag == RTags [i] then
{ tempFrequency[i] = tempFrequency[i] + 1
tagFrequency [i] = tempFrequency [i] }
else
tagFrequency [i] = tempFrequency [i]
endwhile rock hiphop electronic metal jazz rap funk folk blues reggae
user1 6 2 2 3 2 4 3 1 1 1
user2 5 0 0 0 0 0 0 0 1 0
user3 2 2 1 1 1 1 2 0 0 1
user4 10 1 0 1 2 0 2 3 3 1
user5 1 4 0 0 0 4 1 0 0 0
Table 1. An example of tag-based profiles

Algorithm for generating User Profiles (2/2)
Algorithm 2. Generation of A Track-based Profile
Input: set of tracks of a usr TRu, set of Representative tags Tr
Output: set of number of a user’s tracks for each representative musical genre Tn
var RTags[] = {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}
var numTrack[ ] = { }, tempnumTrack [ ] = { }
var RTrack = null
while ∃next tag t in Tu do
RTrack = FindGenre (t)
If Rtrack == RTags [i] then
{ tempnumTrack [i] = tempnumTrack[i] + 1
numTrack[i] = tempnumTrack [i] }
else
numTrack [i] = tempnumTrack [i]
endwhile rock hiphop electronic metal jazz rap funk folk blues reggae
User1 65 176 5 4 0 168 0 3 0 0
User2 411 8 11 109 3 5 8 1 0 0
User3 157 7 11 10 6 2 1 39 4 2
User4 257 20 9 18 2 5 0 9 0 0
User5 110 277 15 8 6 85 10 3 2 7
Table 2. An example of track-based profiles

Preliminary Experimental Results (1/3)
• 1,000 user data set from Last.fm
– Users, tags, music items
• Standardization
– To remove extensive preference
• K-Means clustering algorithm
– Canopy Clustering
– 6 centroid points and 6 clusters

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Cluster1 0.241 1.472 0.626 0.130 1.267 1.621 2.168 0.274 1.078 0.381
Cluster2 2.171 0.032 0.517 3.052 0.011 -0.030 0.328 1.533 1.245 0.162
Cluster3 -0.206 -0.273 -0.517 -0.178 -0.180 -0.294 -0.233 -0.171 -0.204 -0.136
Cluster4 -0.341 0.660 -0.459 -0.284 -0.208 1.178 -0.179 -0.321 -0.166 0.273
Cluster5 -0.074 -0.155 1.320 -0.230 -0.115 -0.261 -0.209 -0.070 -0.172 -0.071
Cluster6 2.815 7.640 5.168 -0.136 9.254 6.135 7.000 4.286 4.421 5.254
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Cluster1 -0.411 0.495 0.406 -0.338 1.565 0.131 1.632 -0.135 0.147 0.812
Cluster2 0.200 -0.444 0.007 -0.341 0.907 -0.468 -0.288 2.617 1.097 0.020
Cluster3 -0.897 1.651 -0.539 -0.442 -0.213 1.836 0.059 -0.507 -0.415 0.034
Cluster4 1.925 -0.590 -0.404 0.852 -0.264 -0.491 0.655 -0.002 2.850 -0.108
Cluster5 0.914 -0.557 -0.216 0.794 -0.296 -0.511 -0.297 0.014 -0.157 -0.147
Cluster6 -0.472 -0.327 0.380 -0.373 -0.184 -0.371 -0.241 -0.205 -0.300 -0.093
Table 3. Values of Centers of Tag-based Profiles
Table 4. Values of Centers of Track-based Profiles
• Clustering Validity
– Inter-cluster distances
– Distances between all pairs of centroids using cosine distance measure

– T-test
• Mean of inter-cluster distances of tag-based profiles
• Mean of inter-cluster distances of track-based profiles
N Mean Std Dev t p-value
Tag-based profiles 15 0.8325 0.6834
2.55 0.0165
Track-based profiles 15 0.3785 0.0885
Table 5. T-test result for the means of inter-cluster distances

Outline
• Generation of User Profiles
– UniEmotion Ontology
– Generation of User Profiles
– Music Recommendation Algorithm
• Related Work

UniEmotion Ontology (1/5)
[Plutchik’s model]

P: 0.625, O: 0.25, N: 0.125
P: 0.375, O: 0.625, N: 0
P: 1.0, O: 0, N: 0
• Definition of the intensity of emotional tags
• SentiWordNet, http://sentiwordnet.isti.cnr.it/

• Intensity of emotional tags
– Strong
• Positive value >= 0.75 or Negative value>= 0.75
– Middle
• 0.25 <= Positive value <= 0.75 or
• 0.25 <= Negative value <= 0.75
– Weak
• Positive value < 0.25 and Negative value < 0.25

• Assigning the weights to the tags
– Factual tags: 1
– Positive tags
• Strong: 2.5
• Middle: 2
• Weak: 1.5
– Negative tags
• Strong: -2.5
• Middle: -2
• Weak: -1.5
• Final score of an item => sum of the weights

• Two classes
– UniEmotion:Positive
• Emotional tags belonging to the positive emotional categories
• trust, surprise, anticipation, and happiness
– UniEmotion:Negative
• Emotional tags belonging to the negative emotional categories
• disgust, anger, fear, and sadness
• Two properties
– UniEmotion:Intensity
• Specifying the intensity of tags
– UniEmotion:Weight
• Specifying the weight of tags

Generation of User Profiles (1/2)
1. Listening habits-based User Profiles
– U1 = {u1, u2, …, um}, I1 = {i1, i2, …, in},
– <u, I, n>
• N: number of plays
2. Tag score-based User Profiles
– U2 = {u1, u2, …, um}, I2 = {i1, i2, …, in},
– <u, I, s>
• S: scores of tags assigned by UniEmotion ontology
3. Hybrid User Profiles
– U3 = {u1, u2, …, um}, I3 = I1 ∩ I2,
– <u, I, m>
• M = α * n +(1- α) * s; α = 0.5

Generation of User Profiles (2/2)
1. Listening habits-based
User profiles
2. Tag score-based
User profiles
3. Hybrid
User profiles

Music Recommendation Algorithm (1/2)
• Finding Similar Users
– Pearson Correlation Similarity
• Calculating scores of items
– Considering the similar users’ rates
• Recommending top n items

Music Recommendation Algorithm (2/2)
Input: a set of user profiles UP
Output: a set of recommended items RI
1. For all yi ∈ U
Compute a similarity s between X and yi.
2. Sort by similarity
3. Select top n neighbors
4.
5. For all
Compute a similarity t between x and
For all
preference +=t * pref
6. Rank by preference
7. Select top n items

Performance Evaluation
• Implementation Environment: Apache Web Server
– User database : MySQL 5.0
– Listening habits collector, tag score generator: PHP
– Recommendation Engine: Apache Mahout
– UniTag and UniEmotion Ontology: JDK6.0
• Experimental Data
– 1, 000 user information from last.fm [http://mir.dcs.gla.ac.uk/]
– Containing 18,700 artist and 12,600 tags
– 70% training data, 30% test data

Performance Evaluation
• Evaluation Model
– Recommended items
• Items which users are interested in (True Positive, TP)
• Items which users are not (False Positive, FP)
– Items which are not recommended
• Items which users are interested in (False Negative, FN)
• Items which users are not interested in (True Negative, TN)
– Precision P = TP/ TP+ FP
• # of correct recommendation/# of all recommended items
– Recall R = TP / TP+FN
• # of correct recommendation/# of preferred items
– F-measure F = 2* P* R / P+R
• Harmonic average between precision and recall

Experimental Results (1/3)
• Precisions
[Number of similar users] [Number of recommended items]
A: Listening habits-based approach
B: Tag-based approach
C: Hybrid approach

• Recalls
C: Hybrid approach

• F-measure
C: Hybrid approach

Statistical Validation
• One-way ANOVA about three groups
– Method1: listening habits-based approach
– Method2: tag-based approach
– Method3: hybrid approach
• Tukey Multiple Comparison Test
– Asymmetric distributions
• Log transformation
– Different characters in case two groups have significant
difference

Method 1 2 3 F
Mean of log(prec) -3.962B -4.036B -2.879A 34.27***
Mean
Precision(SD)
0.020
(0.006)
0.020
(0.009)
0.068
(0.040)
N 24 24 24
Method 1 2 3 F
Mean of log(recall) -3.285B -4.099c -2.635A 26.80***
Mean
Recall (SD)
0.044
(0.023)
0.019
(0.010)
0.093
(0.056)
N 24 24 24
<Table1. test for precision> ***: p<0.001
<Table2. test for recall> ***:p<0.001
Method 1 2 3 F
Mean of log(F-measure) -3.748B -4.117c -2.894A 41.31***
Mean
F-measure (SD)
0.024
(0.006)
0.018
(0.008)
0.06
(0.034)
N 24 24 24
<Table2. test for F-measure> ***: p<0.001

Related Work
• MusicBox
– A personalized music recommender system based on social tags
– 3-order tensors model
– The method improves the recommendation quality
• Foafing the music
– Collecting music information in a semantic web environment
– User information, music information, concert information
– Recommendation of similar music items
• OntoEmotions
– An ontology of emotional categories covering the basic emotions
– Armeteo art portal
– New relations can be inferred by reasoning on the ontology of emotions

Conclusions
• Solution to Cold Start Problem
– It takes time to collect users’ listening habits.
– Adding tags is easily done
– Tags look like word-of-mouth
• Performance Enhancement
– Precision, Recall, F-measure
– Hybrid approach > listening habits-based approach, tag-based approach

Future Work
• Elaborating UniEmotion Ontology
– Emerging Internet Slangs
• Item Selection
– Product Network Analysis Considering Tags
– Analyzing short description

Data science-2013-heekim

More Related Content

Similar to Data science-2013-heekim

More from Haklae Kim

Recently uploaded

Data science-2013-heekim