Using evolving agents to critique subjective music compositions

(1)

Using Evolving Agents

to

Critique

Subjective

Music

Compositions

Chuen-Tsai

Sun, Ji-Lung Hsieh

Department

of Computer Science

National Chiao

Tung

University

1001 TaHsueh Road, Hsinchu, Taiwan, China

{ctsun, gis91572}(cis.nctu.edu.tw

Abstract

The authors describe a recommender model that uses intermediate agents to evaluate a large body

of

subjective data according to a set

of

rules and make

recommendations to users.

After

scoring recommended items, agents adapt their own selection rules via interactive evolutionary computing tofit user tastes,

even when user

preferences

undergo a rapid change. The model can be applied to such tasks as critiquing largenumbers

of

music or written compositions. Inthis paper we use musical selections to illustrate how agents make recommendations and report the results

of

several experiments designed to test the model's abilityto adapttorapidly changing conditions yetstill makeappropriate decisions and recommendations.

1. Introduction

Since the birth of the Netscape web browser in 1994,millions of Internet surfers have spent countless hours searching for current news, research data, and entertainment especially music. Users of Apple's

Musicstore can choose from 2,000,000 songs for

downloading. Havingtodeal withsomanychoices can

feel like a daunting taskto Internet users, who could benefit from efficient recommender systems that filter

outlow-interest items[1-3].

Some of the mostpopular Internet services present statistical data to pointusers to items that they might

be interested in. News websites place stories that

attract the broadest interest on their main pages, and commercial product stores such as amazon.com use

billboardstolistcurrentbook salesfiguresandtomake recommendations that match collected data on user behaviors. However, these statistical methods are less useful for making music, image, or other artistic

product recommendations to users whose subjective preferences can cross many genres. Music selections

areoften made basedonmoodortime ofday [4, 5].

Chung-Yuan

Huang

Department

of Computer Science and

Information Engineering

Chang Gung University

259 Wen Hwa1st

Road,

Taoyuan,

Taiwan,

China

Two classical approaches to personalized

recommender systems are content-based filtering and collaborative filtering. Content-basedfilteringmethods focus on item content analyses and recommend items similar to interested itemsgiven byuserinthe past [1, 6], while the experts use collaboratefilteringmethod to make the group of users with common interests share their accessed information [7-9]. Common design challengesofprevious approachesinclude:

1. When the recommended item is far different from the user's preferences, the user still can only access or select these system-recommended items,

and cannot access the potential good items which never appear in the set of recommended items. This problem can be solved possibly with an

appropriatefeedback mechanism[7].

2. In a collaborative filtering approach, new items maynotbe selected duetosparseratinghistories.

3. User preferences may change over time or

accordingtothe moment,situation,ormood[4, 5].

4. Because of the large body of subjective

compositions, the required large amount of time for forming suitable recommendations needs to

should be reduced[4, 5].

In light of these challenges, we have created a music recommender system model whichwasdesigned to reduce agent training time through user feedback. Model design consists of three steps: a) content-based

filteringmethods are used to extract item features, b)a groupof agents make itemrecommendations,andc)an

evolution mechanism is used to make adjustments according to the subjective emotions and changing tastesofusers.

2. Related Research

2.1. Recommender

Systems

The two major components of recommender systemsareitems andusers. Manycurrentsystemsuse algorithms tomake recommendations regarding music

(2)

[3, 9, 10], images, books [11], movies [ 12, 13], news, and homepages [7, 14, 15]. Depending on the system, the algorithm uses a pre-defined profile or user rating history to make its choices. Most user-based recommender systems focus on grouping users with similar interests [7-9], although some do try to match thepreferences of single users according to their rating histories [1, 6].

Recommender systems play a role to use multiple mapping techniques to connect item and user layers, requiring accurate and appropriate pre-processing and presentation of items for comparison and matching. Item representations can consist of keyword-based profiles provided by content providers or formatted feature descriptions extracted by information retrieval techniques. Accordingly, item feature descriptions in recommender systems can be keyword- or content-based. Features foritems, such as movies or books, are hard to extract because movies are composed of various kinds of media [6] and content analysis of books encounters the problem of natural language

processing. Their keyword-based profiles are often determined by content providers. However, current image and audio processing techniques now allow for programmed extraction of content-based features represented by factors that include tempo and pitch distribution for music and chroma and luminance distribution forimages.

Previous recommender systems can be classified in

terms of content-based filtering versus collaborative filtering. Standard content-based filtering focuses on

classifying and comparing item content without

sharing recommendations with others identified as

having the same preferences. Collaborative filtering method focusesonhowusers areclustered into several groups according to their preference. To avoid drawbacks associated with keyword-based searching (commonly used for online movie or book store databases), other designers emphasize content-based filtering focusing on such features as energy level,

volume, tempo, rhythm, chords, average pitch differences, etc. Many music recommender system

designers acknowledge drawbacks in standard collaborative filtering approaches for instance, they

can't recommend two similar items ifone of them is

unrated. To address the shortcomings of both approaches, somesystemsuse contentfeatures foruser

classification and other systems find out group users

with similartastes [7, 16].

To address challenges tied to human emotion or mood and solve the sparsity problem of collaborative

filtering method, some music and image retrieval system designers use IEC to evaluate item fitness

according to user parameters [4, 5]. We adopted IEC forourproposed model, whichusesagentevolutionary

training for item recommendations. The results of our system tests indicate that trained agents are capable of choosing songs that match both user taste and emotion.

2.2. Interactive Evolutionary Computing

Genetic algorithm (GA) is an artificial intelligence system that allows for searches of solutions to optimization problems. According to GA construction rules, the structure of an individual's chromosome is designed according to the specific problem and genes arerandomly generated once the system is initialized. Following GA procedures include 1) using a fitness function to evaluate the performance of various problem solutions, 2) selecting multiple individuals from current population, 3) modifying the selected individuals by mutation and crossover operators, and 4) deciding which individuals should be preserved or discarded for the next run; discarded solutions are replaced by new ones whose genes are preserved). A GA repeats this evolutionary procedure until an optimal solution emerges. The challenge of music recommendation was defining a fitness function that

accuratelyrepresentssubjectivehumanjudgment. Only

then can such a system be used to makejudgments in art, engineering,and education [4, 5].

Interactive Evolutionary Computing (IEC) which is anoptimization method can meet the need ofdefining

a fitness functionby involvingthe humanpreferences.

IECis a GAtechnique whose fitness of chromosome is measured by a human user [18]. The main factor

affectingIECevaluation is human emotion andfatigue.

Since users cannot make fair judgments when

processing run evaluations, results will change for different occasions according to the user's emotional

state at any particular moment. Furthermore, since

users mayfailto adequately process large populations

due to fatigue, searching for goals with smaller

population sizes within fewer generations is an important factor. Finally, the potential for fluctuating

human evaluations can result in inconsistencies across differentgenerations[19].

3. Using Evolutionary Agents

for

a

Music

Recommender

System

3.1. Model

Description

In our model, intermediate agents play the roles which select music compositions according to their chromosome and recommendto user. Thesystem's six function blocks (track selector, feature extractor, recommendation agent module, evolution manager,

(3)

... *.¢v.User corirng ...T .*!...S! ... .. .... rolution...C....on n... ... Database Select .... ...

Figure 1. Six model components including track selector, feature extractor, database, recommendation agent module, evolution manager, and user interface

A representation component consists of the track

selector, feature extractor, and database function

blocks, all of which are responsible for forming item feature profiles. This component translates the

conceptual properties of music items into useful information with specific values and stores it in a database for later use. In other words, this is a pre-processing component. Previous recommender systems established direct connections between user tastes and item features. In contrast, we use trainable agents to

automaticallymake this connection basedon adetailed item analysis. The track selector is responsible for

translating each music composition into textual file,

while feature extractor is responsible for calculating

several statistical feature measurements (such aspitch

entropy, pitch density, and mean pitch value for all tracks mentioned in Section 4). Finally, database function block stores these statistical features for furtheruses.

An evolution component includes a

recommendation agent module and evolution manager. The former is responsible for building agent selection rules according to music features extracted by the

representation component, while the latter constructs an evolution model based on IEC and applies a GA model totrain the evolutionaryagent. In ourproposed model, user evaluations serve as the engine for agent

adaptation (Fig. 2).

A central part of this component is the

recommendation agent module, which consists of the agentdesignand thealgorithm forselecting items. The first step for standard GAs is chromosome

encoding-that is, designing an agent's chromosomal structure

based on item featurerepresentations. In our proposed model, each agent has one chromosome in which each generespectivelyrepresents one of feature value. The gene value represents item feature preference and the number of item features represents chromosome length.

Each feature needs two genes to express the mean and range value. Take 3 agents' chromosomes listed in Figure 3 for example, fl mean and fl_range represent the 1st agent's preference of tempo feature. It means that 1st agent prefers the tempo between 30 and 40 beats per minute. The 1st agent will select the songs which have the tempo 35 ± 5 bests per minute and velocities 60 ± 10. The value of gene also can be "Don't care". We also perform the real number mutation for each mean and rangevalue,andone-point

crossoverfor selectedpairof agents' chromosomes.

n

xitialization _S,,

AGENTS Music

acXxAgertseecttXxite

Database

xsTheusergrades

EachAet selct the itm thxe mui xiCtemS.

zA ' 4cs bymatchxingthxegenxes.

NextGenxerationx

GA: Generate

new=agents

by

Crossover &Mutation GASelection

Good! Bad!

Figure 2. Evolution component, including agent recommendation module and evolution manager

The evolution manager in our model is responsible for the selection mechanism that preserves valuable genes for generating more effective offspring. The commonprocedureisselecting good agentsto serve as

the parent population, creating new individuals by mixing parental

genes,

andreplacing eliminated agents. However, when dealing with subjective evaluations,

human's preference changing can result in lack of

stability across runs. Accordingly, the best agents in previous rounds may get lowgradesbecause ofchange

of human's preference, and therefore be discarded prematurely. As a solution, we propose the idea of agent fame values that are established according to

previous behaviors. Thehigherthe valueis,the greater thepossibilitythatan agentwill survive. Thesystem's

selection method determines which agents are

discarded or recombined according to weighted fame values and localgradesineachround,with totalscores being summed with an agent's fame value in

subsequentrounds.

Another important GA design issue is deciding

when to stop agent evolution. System convergence is

generally determined via learning curves, but in a subjective system this task (or deciding when an agent's training is complete) is especially difficult in

light of potential change of user preference and emotion. Our solution is basedonthe observation that the stature ofjudges in a music or art competition

(4)

make in previous competitions. In our system, agent fame value varies in each round. The system monitors agent values to determine which ones exceed a pre-defined threshold; those agents are placed in a "V.I.P pool." Pool agents cannot be replaced, but they can share their genes with other agents. Once a sufficient number of stable V.I.P. agents are established, the system terminates the evolution process. For example,

if one of agent got six points fame value and the system pre-define threshold is six points high, the agent will beplaced in a V.I.P pool. This mechanism just sets for preserving the possible good agents.

CHROMOSOME

AgentiD fl_mean fl_range |f2mean f2_range ...

1 35 5 60 10 ...

2 60 3 95 4 ...

3 83 5 120 10 ...

Figure 3. Agent chromosome. Each gene represents a mean or range value of music feature. Whole chromosomes represent selection rules for agents to follow when choosing favorite items. The chromosome in this figure encodestwo music features.

A user component consists of an interface for

evaluating agentrecommendations based on standards suchastechnicality, melody, style,andoriginality. The user interface is also responsible for arranging agents

according to specific application purposes. For

example, for finding joint preference between two

different users, the user interface component will initialize and arrange two set agents for these two users respectively.

Anagentselects items of interest from the database

according to selection rules and makes appropriate

recommendations to the user, who evaluates items via the interface. Evaluations are immediately dispatched tothe agent, whose evolution is controlledaccordingto performance and GA operations (e.g., crossover,

mutation, and selection). The evolution manager is responsible for a convergence test whose results are usedtohalt evolutionaccordingtoagentperformance.

3.2.

Applications

Wedesignedourmodelsothat the chromosomes of

survivingagents contain selection rules that be ableto

represent user profiles. Concurrently, user profiles

formed by agent chromosomes can be compared

among multiple users. Combined, distributing agents

canbe utilized for three kinds ofapplications:

1. Userscantrainsamplegroupsof agents. The agent evaluation function canbe alteredtoreflecta sum

of severaluserprofiles,thusrepresentingthetastes

of multiple users. However, true system

convergence will be difficult to achieve due to disagreements among user opinions. As in the case of scoring entries in art or music competitions,

extremely high and low scores can result in total

scoringbias.

2. Users cantrain their own agents and share profiles. According to this method (which is similar to collaborative filtering), the system compares user profiles formed by the agents' chromosomes and

identifies those that are most similar.

Collaborative recommendations can be

implemented via partial exchanges among agents. 3. Users can train their own agents while verifying

the items selectedbyother users' agents. Inthe art ormusiccompetition scenario,users cantrain their own agents before verifying the agents of other users toachievepartialagreement.Pools of agents from all users will therefore represent a consensus. If one user's choice isrejected by the majority of otherusersfollowing verification, thatuserwill be

encouraged to perform some agent re-training or

face thepossibility that the agent inquestion will be eliminated from the pool. For this usage, the

user interface is responsible for arranging and

exchangingthe agents between differentusers.

4. Experiments

Our experimental procedures can be divided into

twophases:

1. 1. Training phase. Each user was allotted six agents for the purpose ofselectingmusic items

two songs per agent per generation (12 songs per

generation). Since subjective distinctions such as

"good or bad music" are hard to distinguish according to a single grading standard, user give multiple scores to each songs according to

difference standard. Each agent received two sets

ofscores from user, with three scores in each set representing melody, style, and originality. The chromosome of any agent receiving high grades

from a user six times in a row was placed in the

system's V.I.Ppool; the chromosomewasusedto producea newchromosome in thenextgeneration.

This procedure was repeated until the system determined that evolutionary convergence had occurred. The system stoppedat theuser'srequest

or when the V.I.P pool contained four agents, whichevercamefirst.

2. Validation phase. This phase consisted of a

demonstration test for verifying that system-recommend songs matched the user's tastes. Experimentalgroupsconsisted of 20 songs chosen

(5)

by 6 trained agents; control groups consisted of 20

songs chosen by 6 random agents. User

evaluations confirmed or refuted agentcapabilities. Users were not told which selections belonged to the respective groups.

4.1. Model Implementations

Musical items were stored and played in polyphonic MIDI format in our system, because the node data in MIDI files can be extracted easily compared with data in audio wave format [1]. The track selector translates each MIDI file into a textual formatrespectively; we list the beginning part of textual feature file in Table 1 forexample. Polyphonic items consist of one track for

melody and additional tracks for accompanying

instruments or vocals. The melody track (considered the representative track) contains the most semantics. Since the main melody track contains more distinct noteswith different pitches than the other tracks, it was used for feature extraction based on pitch density analysis. According to previous research [3], this

method is capable of achieving an 83 percent

correctness rate. Trackpitch densityis defined as Pitch

density =NP/AP, where NP is the number of distinct

pitches on the track and AP is the number of all

possible distinct pitches in the MIDI standard. After

computing the pitch densities of all targeted music

object tracks, the track with the highest density was

identified as therepresentative polyphonic track.

Table 1. Partof textual MID feature file

Unit Length At Time Track Channel Note Veincity

314 53 1162ms 197ms T4 C4 d2 68

319 50 1181ms 185ms T3 C3 d4 71

321 48 1188ms 178ms T3 C3 b3 74

Purpose of feature extractor is to extract features from the perceptual properties of musical items and transform them into distinct data. We focusedon seven

features for our proposed system; new item features should be also added whenpossible.

1. Tempo, defined as the average note length value derived fromMIDIfiles.

2. Volume, defined as the average value of note

velocities derived fromMIDIfiles.

3. Pitch entropy: PitchEntropy P jogP whereP] N

PithE,,oy - Plo P,werPj -T

where

Nj

is the total number of notes with a corresponding pitch onthe main track and Tis the total number of main tracknotes.

Pitchdensity,asdefined earlier in this section. Meanpitchvalue for all tracks.

6. Pitch value standard deviation. Large standard deviations indicate a user preference for musical complexity.

7. Number of channels, reflecting a preference for solo performers, small ensembles, or large

bands/orchestras.

Genes in standard GA systems are initialized randomly. However, in our proposed system the random agents will probably fail to find items that match their genetic information because the distribution of extracted features is unbalanced. We

therefore suggest pre-analyzing feature value

distribution and using the data to initialize agent chromosomes. By doing so, it is possible to avoid initial agent preferences that are so unusual that they cannot possibly locate preferred items. Furthermore,

this procedure prevents noise and speeds up agent evolution. Here we will use tempo as an example of music featurepre-analysis. Since the average tempo for all songs in our database was approximately 80 beats perminute(Fig. 4),arandom choice of tempo between 35 and 40 beats per minute resulted in eventual agent replacement or elimination and a longer convergence time before convergence for the entire system. For this reason, average values in our system were limited: 60 percent of all initial tempo ranges deviated between 1 and -1 and 80 percent between 2 and -2. This ledto a speedingupof the agent evolution process.

4.2. Recommendation

Quality

Recommendation quality is measured in terms of precision rate and weighted grade. Precision rate is defined as Precision rate = Ns / N, where Ns is the

number of successful samples and N the total number of music items. Weighted grades equals to summation of

Mi

divided byN, where

Mi

represents music item grades and N the total number of music items. Users

were given six levels to choose from for evaluating

chosen items.

Users were asked to evaluate experimental and control group selections. Experimental group agents evaluated songs recommendedbyagentsthattheyhad trained and control group agents evaluated songs at

random. After users completed their tests, the system

calculatesprecisionratesandweighted grades. Finally,

the songs recommended by the trained agents had an

average precision rate of 84 percent and average

weighted grade of7.38,comparedto58.33 percentand 5.54for songs recommendedbythe random agents. 4.

(6)

20

--Accumulatednumberof

.< tempo(Beats perminute) E1

z

16-12

30 50 70 90 110

Beats per minute

Figure 4. Statistical curve for tempo distribution in our sample of 1,036 MIDI files

potential for use by referees to critique large numbers ofsubjective compositions (in such areas as art, music and engineering) and to make recommendations for images by extracting features (e.g., brightness, contrast, or RGB value) and encoding the information into agent chromosomes.

...

C

Curve A:ExpermentalGroup(Trained Agents)

C(urveB:Control group(RandomAgents)

...

4.3.

Convergence

Test

GA-based models commonly perform large

numbers of iterations before arriving at convergence. In order to trace learning progress, we let users

perform one demonstration test after every round;

results are shown inFigure 5. Curve A reflects a steady increase in effectiveness and convergence after eight rounds. Curve B reflects a lack of progress for agents that make random selections withouttraining.

In addition to recommendation quality and

convergence tests,wemadeanattempttoidentifyclear differences between experimental and control group music selectionsby extractingtheirrespectivefeatures. As shown inFigure6, obvious differences were noted in terms of tempo and

entropy,

indicating that the trained agents converged unique preferences and did not blindly select items. Take one user's experimental

result as an example, the user'spreferences of feature tempo is quite different from the average tempo in control group.

5. Conclusion

Our proposed recommendation model can evaluate

a large body of subjective data via a cooperative

processinvolvingboth system agents and humanusers.

Those users train groups of agents to find items that match their preferences, and then provide ongoing feedback on agent selections for purposes of further training. Agent training entails IEC methods and agent fame values to address the issue ofchange in human emotions. The agent fame value concept is also usedas a convergence condition to promote agent population diversity and to propagate useful genes. Model

flexibility was expressed in terms of replacing or altering functionalblocks suchas user interface which allows for usages ofmultiple users. We suggest that with refinement and modifications, our model has

1 2 3 4 5 6 7 8 9 10

Generation

Figure 5. Convergence test and evolution

generation

of 10 users. Curve A represents an

average of fitness values of 60 agents belong

to 10 users

Figure 6. One user resultsexample

References

[1] Kazuhiro, I., Yoshinori, H., Shogo, N.: Content-Based

Filtering System for Music Data. 2004 Symposium on

Applications and the Internet-Workshops. Tokyo Japan.

(2004)480

[2] Ben Schafer, J., Konstan, J.A., Riedl, J.: E-Commerce

RecommendationApplications.DataMining and Knowledge Discovery, Vol.5. (2001)115-153

[3] Chen, H.C., Chen, A.L.P.: A Music Recommendation

System Based on Music and User Grouping. Journal of Intelligent Information Systems, Vol.24.(2005) 113-132

(7)

[4] Cho, S.B.: Emotional Image and Musical Information Retrieval with Interactive Genetic Algorithm, Proceedings of

theIEEE,Vol.92.(2004)702-711

[5] Cho, S.B., Lee,J.Y.: A Human-OrientedImageRetrieval System using Interactive Genetic Algorithm, IEEE

TransactionsonSystems, Manand Cybernetics,PartA, Vol.

32.(2002) 452-458

[6] Li, Q, Myaeng, S.H., Guan, D.H., Kim, B.M.: A

Probabilistic Model for Music Recommendation Considering Audio Features, in Information Retrieval Technology, Vol.

3689.(2005)72-83

[7] Balabanovic, M., Shoham, Y.: Fabs: Content-based,

Collaborative Recommendation, Communication of the

ACM,Vol.40.(1997)66-72

[16] Pazzani,M.J.: A Framework forCollaborative, Content-Based and Demographic Filtering, Artificial Intelligence

Review,Vol. 13.(1999)393-408

[17] Holland, J.H.:

Adaptation

in Natural and Artificial Systems, Ann Arbor:UniversityofMichiganPress.(1975)

[18] Takagi,

H.: Interactive

Evolutionary

Computation:

Fusion of the Capabilities ofEC Optimization andHuman

Evaluation, in Proceedings of the IEEE, Vol. 89. (2001) 1275-1296

[19] Maes, P.:

Agents

that Reduce Work and Information

Overload, Communicationsof the ACM, Vol. 37.(1994) 31-40

[8] Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: GroupLens: Applying Collaborative FilteringtoUsenet News,Communications of theACM,Vol.

40.(1997)77-87

[9] Shardanand, U., Maes, P.: Social Information Filtering:

Algorithms for Automating "Word of Mouth", InKatz, L.R.,

Mack, R., Marks, L., Rosson, M.B., Nielsen, J. (eds.), in

Proceedings of the SIGCHI conference onHuman factors in

computingsystems, Denver, Colorado, UnitedStates.(1995)

210-217

[10] Kuo, F.F., Shan,M.K.: A Personalized MusicFiltering

System Based on Melody Style Classification, in

Proceedings of Second IEEE International Conference on

Data Mining, (Maebashi City, Gumma Prefecture, Japan.

(2002)649-652

[11] Mooney, R.J., Roy, L.: Content-Based Book

Recommending using Learning for Text Categorization, In

Nurnberg, P.J.,Hicks,D.L., Furuta, R.(eds.), inProceedings of the fifth ACM conference on Digital libraries, (San Antonio, Texas,UnitedStates. (2000)195-204

[12] Fisk, D.: An Application of Social Filtering to Movie

Recommendation, Bt Technology Journal, Vol. 14. (1996)

124-132

[13] Mukherjee, R., Sajja, E., Sen, S.: A Movie

Recommendation System-AnApplication of Voting Theory

in User Modeling, User Modeling and User-Adapted Interaction, Vol.13.(2003) 5-33

[14] Chaffee, J., Gauch, S.: Personal Ontologies for Web

Navigation, in Proceedings of the ninth international conference on Information and knowledge management. McLean, Virginia, UnitedStates.(2000)227-234

[15] Chiang, J.H., Chen, Y.C.: An Intelligent News

Recommender Agent for Filtering and Categorizing Large

Volumes ofTextCorpus, International Journal ofIntelligent Systems, Vol. 19.(2004)201-216