Accountability for performance or de facto performance for accountability?

Recent discussions of the relationship between accountability and performance mainly concentrate on the issue of “accountability for performance,” especially driven by the wave of NPM reforms that shift the accountability focus from processes and compliance with rules and input to output and results (Lægreid, 2014).

However, most observations and studies question and challenge the “promise of performance” by accountability (Halachmi, 2002, 2005; Dubnick, 2005; de Lancer Julnes, 2008; Pollitt, 2011; Radin, 2011;

Ossege, 2012). For one thing, someone argues that accountability per se does not encourage performance improvement. As Halachmi (2002: 371) notes, “whereas accountability is about staying within the four corners of the contract, performance is about getting outside the box…” For another, measuring performance in practice has negative effects on performance. The measurement stimulates gaming and distorted behaviors of officials (Smith, 1995; Hood, 2006; Radnor, 2008). The measure fails to reflect the real performance and even induces perverse learning – i.e., “performance paradox” (van Thiel and Leeuw, 2002).

To be sure, the effectiveness of performance measurement highly depends on the measurability of performance. However, the measurability of the public sector performance is highly problematic (de Bruijn, 2001). Those measurable are usually not the results but the surrogates of the results (Dubnick and Frederickson, 2011), but implementers are thus directed to shift to achieve what gets measured rather than the valued goals.

The performance measurement itself is promoted to foster learning and improvement as a core purpose (Behn, 2003; van Dooren et al., 2015), which echoes the call for “accountability for performance.” Although there are few practices that can trigger learning (e.g., benchmarking) (van Dooren et al., 2015), this ideal purpose is not well attained in most cases. To learn and so as to improve requires flexibility, risk taking and creativity in management, but the performance measurement, usually acting as an external accountability setting, imposes more a steering and control function than improvement. While the NPM reform advocates more managerial flexibility and autonomy, a greater degree of managerial accountability is imposed on government officials through performance control. This tension results in indirect top-down manipulation rather than promoting learning and improvement (Sanderson, 2001; Lægreid, 2014). As Mintzberg (1996: 81) notes, “the performance model decentralizes in order to centralize; it loosens up in order to tighten up. And tightening up comes at the expense of flexibility, creativity and individual initiative.” The promise by

“accountability for performance” seems to be a myth and looks more close to “performance for accountability”

in realty.

Despite the negative impact of accountability arrangement, however, it is not supposed that lifting the grasp by accountability would help bring improved performance (Aucoin and Heintzman, 2000). As a compromise, van Dooren et al. (2015) as well as de Bruijn (2001) suggest that we should adopt a “soft” use of performance information, which allow for more room for interpretation, rather than a “hard” use that tightly couplies performance information with judgment.

Insight from implementation literature

The tension between “performance for accountability” and “accountability for performance” somewhat repeats and parallels the debate between top-down and bottom-up approaches in implementation studies in the 1980s (see Sabatier, 1986). The former considers the attainment of formal policy targets offered by the central policy-maker as the criterion of successful implementation, whereas the latter considers implementation as negotiating and bargaining process among different actors and meeting the central targets does not necessarily represent successful implementation. Compromising the targets is allowed and even welcomed if that can solve problems or be conducive to desirable outcomes. Fudge and Barrett (1981: 258), as bottom-uppers, contend that “if implementation is seen as ‘getting something done’, then performance rather than

conformance is the main objective” [italics added]. Lipsky (1980), another bottom-upper, even urges building

accountability mechanisms of street-level bureaucrats to the public so as to improve implementation. In this sense, performance for the bottom-uppers is commensurate with the idea of “accountability for performance,”

whereas conformance is equivalent to “performance for accountability.” Interestingly, the NPM’s call for direct accountability to customers and users of services is seemingly responding to the Lipsky’s viewpoint noted above.

To be sure, conformance with the formal policy targets should not be simply reduced to poor performance when the central policymakers have more legitimacy to set policy goals and design policies. Matland (1995) tried to synthesize the top-down and the bottom-up perspectives by involving two contextual variables into the analysis: degree of policy ambiguity and degree of policy conflict. If the goal and means of a policy is

clear, he suggested that the top-down perspective be superior; and with a low degree of policy conflict, this perspective is absolutely positive; otherwise, the bottom-up perspective prevails. Hence, a different degree of discretion should be assigned to implementers in accordance with the degree of policy ambiguity.

This attempt of synthesis gives us insights into the accountability-performance problem. Whether conformance should be considered a good or poor performance is contingent. The degree of implementation control can be geared in response to different policy domains or issues through various management tools. In fact, the public management movement since the 1990s alongside the concurrent NPM reforms has been addressing the implementation issue but shifting away from the program-based implementation research in the 1970-80s (Kettl, 1993). Some public policy scholars recently started to address the implementation issue from the perspective of public management, considering it as the “missing link” in the previous implementation research (O’Toole and Meier, 2011). Implementation research increasingly becomes a sort of studies of how to manage performance of policies/programs.

Performance management, to a large extent, plays a vehicle to realize such an implementation control.

Degree of discretion allowed by the policymakers can be adjustable through various designs of performance indicators. Somewhat in line with the advocacy of the “soft” use of performance information, some indicators in the measurement can be designed to encourage entrepreneurial and innovative behaviors, and allow for interpretative performance information collected from the evaluation. Furthermore, the setting of performance indicators can be open for participation from policy stakeholders, especially from policy implementers, to allow for negotiating and bargaining before implementation.

This paper examines such an effort in the performance evaluation of social welfare services of local governments in Taiwan since 2001. The evaluation is a comprehensive annual exercise conducted by some central government agencies in charge of social welfare services and budget. Performance criteria and indicators would be evaluated and revised in accordance with the previous experience, opinions of stakeholders and policy change of social welfare before each exercise. It is worth examining the experience of how to manage performance with the engagement of multiple stakeholders in setting performance indicators and its impact in Taiwan.

The Background and Context of Performance Evaluation of Social Welfare Services in Taiwan

In the 1990s, Taiwan underwent a peaceful transformation from a party-state authoritarian regime to a democracy. This transformation just run into the global wave of the NPM reforms that induced NPM-oriented administrative reforms alongside political reforms in Taiwan (Tang, 2004). These include various decentralization reforms in which the autonomy of local governments was enhanced especially with the enactment of Local Government Act in 1999. Concerning the social welfare policies, before 2000, all social welfare programs were directly controlled by the central government. It directly funded the programs executed by local governments and non-government social service providers. Since 2000, social welfare services have become the domain of local self-government in accordance with the Local Government Act.

While local governments are now allowed to compile their own budgets for social welfare provision and can develop their own local policies, major social welfare policies are still formulated at the level of central policy-making bodies (i.e., central government and national legislature), and more important, the fiscal resources remain centralized and so local governments highly rely on the budgetary allocation from the central government for most public services (Kuo and So, 2013). In order to realize the local self-government, the central government shifted to allow for more flexibility in managing the local public finance. Instead of the previous program-based grant, now a formula-based “general grant” is set up to cover education, social welfare, and infrastructure expenditures of local governments. In addition, in 1999, a state-sponsored public welfare lottery was issued, which has become a supplementary source of social welfare funding. 50% of the financial surplus is earmarked for the expenditure of local social welfare services.

Even though the norm of local self-government was firmly consented, the local governments were in turn held accountable for their performance, including welfare service provision and budgetary allocation. Under the pressure of the national legislature and social pressure groups, which worried that the welfare service

provision would get worse without a central monitoring, the central government set up an annual evaluation mechanism to oversee and steer social welfare policy implementation by local governments (especially those mandatory nation-wide welfare services). That is aimed to ensure the adequacy of budgetary allocation and that local governments would keep up the standard of service. The result of each evaluation would be adopted as one of the bases for marginal adjustment of the general grant, so the evaluation exercise becomes an important factor swaying the local public finance. In this sense, the context of this case quite looks like the spirit of NPM-styled performance movement.

Mechanism of Evaluation Exercise

The performance evaluation exercise started from 2001. The performance scrutiny works in form of written-report review and on-the-site visit. But in light of overburden of such a nation-wide annual exercise, since 2005, the on-the-site visit has been only carried out in an alternate year. The scope of the evaluation covers four dimensions: 1) performance of social welfare programs; 2) budgetary allocation and compiling, and its execution of earmarked funding for mandatory social welfare services; 3) self-financing social welfare services; 4) other related programs. This evaluation exercise is jointly conducted by various central government agencies. Ministry of Health and Welfare (MHW) takes charge of the evaluation for part of service provision in the dimension 1, 3 and 4; Directorate General of Budget, Accounting and Statistics takes charge of fiscal performance in the dimension 2, 3 and 4. In addition, the fiscal performance of the fund drawn from

the surplus of the public welfare lottery is separately taken charge by National Treasury Administration.

The exercise is further divided into 10 groups to cover various social welfare aspects (see Figure 1).

Hence, the local governments need to prepare separate reports for the 10 groups for evaluation. But for community development and voluntary service, it only requires a self-evaluation. External evaluation would apply to the other 8 groups, and accordingly 8 evaluation teams are formed, each of which is composed of officials from the central government agencies in charge of the policy, scholars and delegates from social welfare organizations. They would examine the self-evaluation report and rate every item in accordance with performance rating criteria. The evaluation of fiscal performance, for which only a written-report review is arranged, accounts for 20% of the total score; the service performance, for which an on-the-site visit is arranged, accounts for 80%. The above 10 aspects also account for varying proportions of the total score (see Figure 1). For the on-the-site visit, the 8 evaluation teams are sent to each local government for one day to scrutinize their self-evaluation reports and, if necessary, to question the officer-in-charge. Each item would be rated by these team members as evaluators in accordance with rating criteria. The performance scores of all local governments would be released publicly at the end. Those local governments whose aggregate scores are less than 80 (full score is 100) would suffer from the cutting of the general grant.

For preparing the evaluation, every local government would be informed of the details of requirements, including performance indicators, one year before the exercise. The local governments then perform the duties accordingly, and register and update all performance information for future review. The setting of the performance indicators as well as criteria is not solely determined by the central government agencies. Review committees would be formed for formulating the indicators respectively for all aspects. Except for the public welfare lottery, each committee is composed of delegates from the central government agencies, scholars and delegates from social welfare organizations and all 22 local governments (6 special municipal governments and 16 county-level governments now; the latter are relatively poor especially those in outlying islands). They make resolutions on the adjustment of indicator setting for each exercise by a majority rule. The same function for the public welfare lottery is run by a separate committee for supervising the public welfare lottery that is also composed of the government delegates, and scholars and delegates from social welfare organizations, but the former cannot account for more than half of the members. Hence, only 5 delegates from the local governments are allowed.

Figure 1. Weight of 10 Aspects of Performance Evaluation

The formulation of performance indicators looks quite participatory in the sense that local governments as implementers, scholars as independent experts, social welfare organizations as representatives of clients, and central government agencies as policy makers, all players, especially for the delegates from local governments that form the majority in most of the committees, can shape the mechanism of the evaluation.

All players have been learning from each round of the evaluation exercise, the feedback of which would be collectively reflected in the new set of performance indicators. What is the impact of this decision-making mechanism? Is it leading to more accountability for performance or performance for accountability?

Methods of this Study

This paper examines the evolution of the performance indicators to answer the above questions. As the MHW only offers the performance-indicator sets for the years of 2009, 2011, 2013 and 2015 where the on-the-site visits were arranged, so this study is confined to this time period. In addition, due to the uniqueness of the aspects of voluntary service and community development in this evaluation, so this study does not include these two aspects. The analysis is conducted by classifying all indicators of each exercise into different categories so as to figure out the pattern of change. The interpretation of the change would be substantiated by interviews with various stakeholders in the exercise, including officers-in-charge from the central government agencies and local departments of social welfare, non-official evaluators, and managers of social welfare organizations.

The formats of the list of indicators varied between years and between different evaluation aspects, but they usually contain at least three columns of information: evaluation item, indicator, and rating criteria.

The list of each aspect is divided into numerous evaluation items; each item contains a number of indicators that list concrete job targets or requirements; then the third column specifies rating criteria of each indicator.

According to a preliminary overview, all indicators can be divided into 10 categories in light of their target requirements and rating criteria (see Table 1). Then all indicators are classified into these categories in accordance with a “four-eyes principle” to increase the reliability of the coding process.

It can be noted that almost all types of indicators tend to be quantitative in nature except the judgment type that allows room for evaluators to interpret the performance. The absolute type further requires full compliance with the target; otherwise, no score is accorded and even score deduction is incurred. Other types of quantitative indicators allow various degrees of compliance or flexibility in implementation. In addition,

public welfare

the inter-locality type takes a relative approach to measure the performance between different local governments, which seems to spur a competition between them. The null type means no score would be accorded for some performances. This type of indicators plays two functions. One is to announce new indicators in advance so that the local governments can take a trial; the other is to give a signal to the local governments that some local extra practices would not be considered as a credit in the evaluation, such as subsidy for nursery service and maternity allowance (see Table 1). Furthermore, most aspects contain a couple of items for some particular performances that incur extra credits or discredits. The former, for example, includes providing extra non-mandatory and innovative services; the latter includes some non-mandatory cash allowances to specific populations such as the disabled and the elderly, and implementation failure.

Table 1. Categories of Indicators Category Rating Criteria Example

Absolute Only 100% accomplishment of targets scores; otherwise, no score or even negative score.

The coverage of service for the solitary elderly:

1. 100%: 0.5

2. Less than 100%: 0 Quantity More amount attained, higher

score accorded. Types of service promotion for assisting the elderly:

1. 3 types: 1 2. 2 types: 0.75 3. 1 type: 0.5 4. No promotion: 0 Percentile Rating in accordance with

percentage or proportion of achievement.

Budget implementation rate for the recreational center for the elderly:

1. Above 80%: 1

Training courses for senior citizens:

1. Yearly growth in the number of classes: 1

2. Yearly stagnation in the number of classes: 0.5 multiple targets or criteria.

Is a panel for promoting the welfare for the elderly established?

1. Yes, and its member composition accords with the requirement of the Senior Citizens Welfare Act:

0.5

2. Yes, but its member composition does not accord with the requirement of the Senior Citizens Welfare Act: 0

3. No, -0.5 Inter-locality Rating in accordance with the

performance compared with other localities.

Cases of overpaying low-income allowance for the elderly. Ordering the localities by a formula: no. of overpaying cases / no. of applicants of allowance:

1. Top one-third: 0.5 2. Middle one-third: 0.25 3. Lower one-third: 0 Timeliness Rating in accordance with how

far a target is timely achieved.

On-line reporting the result of the project of friendly caring for the elderly:

1. In due time: 1

Liaison briefing of home-care service.

1. 4 times in a year: 1 professional assessment, intervention plan, resource connection, etc.

Maximum 1 point for each record.

Null Zero score for some performances

Subsidy for nursery service, maternity allowance

Transformation of Performance Indicators and Dynamics of Change

Statistical Analysis

According to our statistical analysis, the amount of indicators was expanding from 609 in 2009 to 742 in 2015 (if including indicators for extra credit and discredit, it was from 630 to 789). The rating criteria were very detailed and trivial. The minimum score for one indicator could be down to 0.1. The full score for most aspects was more than 100, the maximum one is up to 501. The evaluation procedure looked like a checklist rating, verifying the attainment of objective targets. There were some indicators (judgment-typed) requiring subjective assessment by evaluators, such as service innovation. But the share was shrinking from about 25%

in 2009 to less than 15% in 2015. And these indicators concentrated on two aspects: 1) child and adolescence, and 2) control of domestic violence and sexual assault and harassment. That means the evaluators in other aspects exercised less discretion but the assessment was more objective. What the evaluators were required to do is to verify the data and information, and to rate these items in accordance with rigid criteria. The inter-locality-typed indicators accounted for no more than 3%. It represents that the inter-local competition in the evaluation played a minor role (see Table 1).

Table 2. Statistics of the Amount of Indicators and Classification

indicators

years Absolute Quantity Percentile Increment

在文檔中我國行政機關內部人力市場人力流動之空間分析 (頁 46-53)