Urbanization, Education and the Growth Backlog of Africa ∗

Human capital accumulation and urbanization play a decisive role in the analysis of growth and development. Stylized facts reveal a positive association between human capital accumulation, urbanization and growth for both over time and across countries. While Africa has the fastest increasing human capital accumulation and urbanization growth, it is the region experiencing the slowest economic growth. This paper argues that the adjustment-costs result-ing from rapid urbanization can explain this paradox. In other words, low or negative social return to education in the short-run might be due to transitory adjustment or urbanization costs. We build a simple growth model with two sectors, calibrate its parameters and then use it to simulate the African trajectory of human capital, urban population and GDP per capita. While we predict greater growth rates in future decades (backlog) convergence with high-income countries will be limited. The current levels of GDP per capita could have been reached 15 years earlier if it were not for the adjustment costs. in urban areas and their various mathe-matical transformations, and u t provided better results in the econometric calibration analysis.


Introduction
The United Nations member states and other international organizations agreed to pursue eight human development goals in the Millennium Declaration of 2008 (United Nations, 2008). Among the top priorities were: 100% enrolment in primary education, reduced illiteracy rates and an end to gender discrimination denying access to education. In addition, the World Bank committed to promoting sustainable cities and towns that would fulfil the promise of development for their inhabitants. Data show that from 1955 to 2000 the proportion of college graduates grew 6.2 percent per year in sub-Saharan Africa, followed by 3.8 percent in South Asia. Over the same period, the average annual growth rate of urban population increased by 2.1 percent per year in sub-Saharan Africa, followed by 1.9 percent in South Asia. However, the annual GDP growth rate was only 0.5 percent in sub-Saharan Africa, while it was 1.8 in the Middle East and North Africa, 1.5 percent in Latin America and the Caribbean and 2.4 in South Asia. Sub-Saharan Africa is the region experiencing with the fastest increase in human capital and urban population in the world, but has the slowest economic growth. It is crucial to address why educational investment and urbanization have not delivered on their promise of higher economic development in sub-Saharan Africa.
In contrast to the numerous pessimistic answers that have been suggested, such as a lack of institutional structure, political instability or low levels of investment in infrastructure, all of which imply that education investments in Africa so far have been deadweight losses, this paper argues that the temporary adjustment costs of urbanization could be what is delaying growth in sub-Saharan Africa. We argue that although investment in education has not generated growth in the short-run, it should not be seen as a deadweight loss. Countries that have experienced an increase in human capital but have not experienced as much growth as expected will grow when urbanization and human capital accumulation are attuned. In sub-Saharan Africa, rapid urbanization and human capital development have not yet delivered high economic growth rates because of temporary urbanization costs. However, economic growth will materialize when urban population and human capital growth rates slow down. We name this as the "African growth backlog".
Our paper aims to build a growth model with endogenized human capital accumulation and urbanization, and then forecast the evolution of economic growth over time of the sub-Saharan African countries. As in Lucas (2009), a structural macroeconometric assessment of the relationship between human capital accumulation, urbanization and growth, rather than a model with a microfoundation, is presented here. We then confront the theory with data and calibrate the parameters of our model using panel regressions. Hence, we can analyse the dynamics of the model, enabling us to make predictions for growth and convergence. The model can also be a good basis on which to forecast the evolution of human capital, urban population and per capita GDP in coming decades. Adjustment costs are assumed to cause delays in observing economic growth but not to affect long-run outcomes.
Since Lucas (1988) and Azariadis and Drazen (1990), human capital disparities have played a central role in growth and development analysis. Historical analyses confirm that the transition from economic stagnation to growth is first preceded (Cipolla, 1969) and then accompanied (Maddison, 1995) by enormous increases in literacy and average level of schooling. Lucas (1988) discussed the principal effects cities and urban development have on national economic growth. Bertinelli and Zou (2008) argue that "urbanization plays a non-negligible role in speeding human capital accumulation. [...] Closeness between people favors interactions, which may be at the root of spillovers from human capital. In return, incentives to invest in education are reinforced, leading hence to higher levels of education." We build a model based on Lucas's (2009) model to study the link between human capital and urbanization. Lucas (2009) built a model with two sectors (rural and urban), and assumed that more educated workers reside only in the urban sector. Countries with a large share of their population working in a traditional rural sector have a low ability to absorb technology from leading economies. 1 This implies that migration away from traditional agriculture is crucial for growth, and countries with low initial endowment in human capital, or a high proportion of rural workers, will have a late take-off. Moreover, he focused on the reasons for cross-country spillover and argued that migration from rural to urban areas seems to be an important factor for convergence.
As the literature on economic growth suggests the nexus between urbanization, human capital investment and development is an important source of take-off for countries. However, the process of urbanization also has its disadvantages. Keiser et al. (2004) studied the impact rapid urbanization in sub-Saharan African countries has had on malaria control and they concluded that "rapid and unprecedented urbanization, going hand-in-hand with often declining economies, might have profound implications for the epidemiology and control of malaria, as the relative disease burden increases among urban dwellers." Todaro (2000) argues that rapid urbanization causes a "prolific growth of huge slums and shantytowns" in African urban areas. According to the African Development Bank's figures in 2012, 65% of the dwellings in sub-Saharan Africa are in slums, which is the highest proportion in the World. Our model obtains adjustment costs to be correlated with the proportion of urban people living in slum dwellings in sub-Saharan African countries. This positive relation suggests that adjustment costs due to rapid urbanization are capturing poor urban planning, since the proliferation of slum dwellers is related to higher adjustment costs that, in our model, lower per capita GDP growth in the short-run. The simulation and counterfactual exercises predict a current delay of 15 years before expected per capita GDP levels are reached because of the transitory urbanization costs in sub-Saharan Africa.
Forecasting exercises show that countries in each region converge towards a specific steady state. In other words, only conditional convergence can be obtained in the long-run and developing countries do not catch up with the leading region (highincome countries). Besides this, we find sub-Saharan Africa to be a distinct case because higher levels of educational investment in that region have been unable to accomplish in the short-run. Aside from the optimistic prediction for the African backlog, our analysis also anticipates a pessimistic result, no long-run convergence in income levels in sub-Saharan Africa with respect to the leading region. As the rate of urbanization and the proportion of college graduates increase, deceleration will eventually take place or, if development policies are less generous after the redemption date of the Millennium Declaration (1990 -2015), this could happen sooner than later. In the latter case, our calibrated model predicts that the GDP per capita in sub-Saharan Africa could multiply by 1.5 within approximately 30 years. In another scenario, where progress in schooling continues through until 2030, the African takeoff will be delayed, but the long-run GDP per capita will be multiplied by 2 in comparison with the current level.
The remainder of the paper is organized as follows. Section 2 provides some stylized facts on urbanization, education and development. The model and its calibration are described in Section 3. Section 4 presents the explanations for Africa's growth backlog. Section 5 concludes the paper.

Stylized facts and the African Paradox
Historically, Cipolla (1969) documented that the spread of literacy started between the 17th and 19th century, 5, 000 years after the first rudimentary appearance of writing. Before that period, the arts of reading and writing remained the monopoly of the elite. It would be mainly in the 19th century when the advance of literacy and the development of education occurred in the West, something which would invariably be connected with urbanization, the emergence of public schools, and the Industrial Revolution.
Causation is obviously difficult to establish. On the one hand, urbanization facilitated access to schooling and the Industrial Revolution generated demand for skills and human capital. On the other hand, education increased workers' capacity to adapt to new technologies and to innovate. Figures 1(a) and 1(b) illustrate the historical associations between education, GDP per capita, and urbanization. In Figure  1(a), we use data on the average years of education and GDP per capita from 1870 to 2000 for a number of highly industrialized countries (the US, the UK, Japan, France and Italy), along with India and Chile. Figure 1(a) shows that, apart from Italy, an increase in years of schooling preceded the increase in income. In addition, Figure 1(a) seems to point to the accumulation of human capital stimulating GDP per capita, and that the growth of GDP per capita becomes much higher when countries have better educational prospects. In Figure 1(b), because of data limitations, we can only use data from 1870 to 2000 for the US and from 1950 to 2000 for Japan, France, Italy, India and Chile. Figure 1(b) shows that urbanization increased at a faster pace than education in the early stages of development. This conclusion, however, cannot be applied to developing countries such as India. As the US was a pioneer in the processes of urbanization and adult literacy, it is also used as the leading region in the calibration analysis. Country-specific factors to the US, such as the population density and religion (e.g. Protestantism), may also have facilitated the take-off.  The same patterns occur when using a cross-country perspective based on recent data. Figure 1(c) shows the association between the proportion of college graduates in the population aged 25 and over, and the level of GDP per capita in 2000 (both in logs). In our sample of 177 countries, the coefficient of correlation is close to 0.60. Turning to the association between the urbanization rate (proportion of population living in urban areas) and the proportion of college graduates, Figure 1(d) shows a coefficient of correlation that amounts to 0.40.  We also provide the linear fitted curve. A negative slope (resp. positive slope) reflects convergence (resp. divergence).
However, a quick look at the data also reveals that the links between urbanization, education and growth are far more complex. Figure 2 shows convergence in income and human capital. In Figure 2(b), we regress the average annual growth rate 1975 − 2000 of the proportion of college graduates in the adult population to its 1975 level (in logs). The slope is negative and the speed of the human capital convergence is equal to one percent per year. In Figure 2(a), the same exercise is carried out for GDP per capita. There is no sign of convergence: the 1975−2000 growth rate is independent of the level observed in 1975. We could add that urbanization (proportion of population in urban areas) tripled in sub-Saharan Africa between 1950 and 2000 (from 11 to 33 percent), whereas it multiplied by 1.4 in high-income countries (from 53 to 73 percent), 1.9 in South-Central Asia (from 16 to 30 percent), and 1.1 in Latin America (from 73 to 80 percent).
We also look at the stylized facts on development and the urban-to-rural production ratio. To construct this ratio, we divide the total GDP per capita (constant 2000 US$) in the urban sector in 2000 by the total GDP per capita (constant 2000 US$) in rural sector in the same year. Moreover, the total GDP per capita in each sector is calculated by the product of the GDP per capita in a specific sector with the value added from that sector. Data on agricultural value added (as % of GDP) is obtained from WDI (2008). First, we check the association between the urban-to-rural production ratio and the level of GDP per capita. As we can observe in Figure 3(a), the slope is positive, hence, rich countries have a higher ratio of urban-to-rural production than poor countries do. Moreover, in Figure 3(b), we regress the average annual growth rate between the years 1975 and 2000 of the urban-to-rural production ratio, to its 1975 level (in logs). Because the slope is negative, this indicates convergence of urban-to-rural production ratios among countries between the years 1975 and 2000.
The stylized facts presented above demonstrate that economic development is connected with urbanization and education. However, Figures 2 and 3 show that there is no convergence in GDP, even though we observe a convergence pattern in the urban-to-rural production ratio and educational investment on a tertiary level which has a positive impact on GDP. It is a paradox that in the second half of the past century, despite the greatest progress in schooling and urbanization being observed in sub-Saharan Africa (henceforth referred to as SSA), it is still the region with the lowest GDP per capita growth rates. As shown in Figure 4, between 1955 and 2000 the proportion of college graduates grew by 6.2 percent a year in SSA, compared with 3.2 percent in the Middle East and Northern Africa (MENA), 3.1 percent in Latin America and the Caribbean (LAC) and 3.8 percent in South Asia (ASIA). Over the same period, the urban-to-total-population increased by 2.1 percent in SSA, to be compared with 1.3 percent in MENA, 1.1 percent in LAC and 1.9 percent in ASIA. However, the annual GDP growth rates were 0.5 percent in SSA, they were 1.8 in MENA, 1.5 percent in LAC and 2.4 in Asia.

(a) Development and Production Ratio (b) Convergence in Production Ratio
Notes: In Figure 3(a), we plot GDP per capita in 2000 (in logs) against the urban-torural production ratio in 2000 (in logs). And in Figure 3(b), we plot the growth of the production ratio between 1975 and 2000 as a function of the log of their initial 1975 value (beta-convergence analysis). We also provide the linear fitted curve. A negative slope (resp. positive slope) reflects convergence (resp. divergence). Pritchett (2001) documented the negative association between educational investment and output growth rates and asked: "Where has all education gone?". Other empirical studies bring into question the existence and the magnitude the causal impact of education has on development. 2 How can this be reconciled with the strong cross-country or historical associations between literacy, schooling and development? There are some traditional explanations for the African growth paradox. One of these being that the quality of education does not adhere to the quantity of investment in SSA. Therefore, educational investment cannot be transferred to productivity gains. Manuelli and Seshadri (2007) showed that effective human capital has a strong impact on economic performances when corrected for differences in the quality of education. Jones (2008) showed that, despite increases in educational attainment, technology adoption is slower when knowledge traps are at work: poor countries, given the coordination cost imposed by a "specialist" economy, invest too much in "generalist" education and not enough in "specialist" education. In the same vein, Vandenbussche et al. (2006) showed that different types of human capital are needed at the various stages of development. Pritchett (2001) argued that the institutional environment in poor countries has been perverse and the accumulated human capital has been applied to activities that served to reduce economic growth. Furthermore, Easterly and Levine (1997) underlined the importance of public policies in the growth processes and argued that "Africa's growth tragedy is associated with low schooling, political instability, underdeveloped financial systems, distorted foreign exchange markets, high government deficits, and insufficient infrastructure." As can be appreciated, these explanations are quite pessimistic because they lead to the conclusion that investments in Africa to-date have been nothing more than deadweight losses. There is, however, a more optimistic and appealing explanation for the African situation. This paper focuses on the nexus between human capital accumulation and urbanization, and argues that low or negative social return from education observed in the short-run might be due to transitory adjustment or urbanization costs. 3 Highly-skilled workers mainly operate in cities and more education increases labour demand for low-skilled employees in urban areas. Urbanization makes access to schooling easier, increases schooling levels, accentuates the urbanization process (i.e. a virtuous circle), but at the same time generates adjustment and congestion costs. As argued by Henderson (2003), the population shift from rural to urban areas is a transitory process which can be socially and economically traumatic. Cities constantly increasing in size, along with increasing numbers of businesses and urban employment demands, require enormous public infrastructure investments (in particular health, safety, commuting, and congestion cost) which affect the quality of urban life. Moreover, rapid urbanization has often occurred in the face of low or negative economic growth over some decades, and over-or under-concentration can be very costly in terms of growth in productivity.

Model
In this section, we describe a theoretical model of endogenous human capital formation, urbanization and development. Then, we calibrate its general parameters using panel data econometric techniques. Finally, the dynamics of human capital accumulation, urbanization and GDP per capita for high-income countries and developing regions are depicted.

Theory
We consider an economy with two sectors, urban and rural, producing a single homogeneous good. The price of the homogeneous good is the numeraire. The population is made up of two types of individual, the highly educated and the less educated. The proportion of highly-educated workers at time t is denoted by h t . We assume that highly-skilled people work in the urban sector, whereas the remaining 1−h t lesseducated workers can choose between the two sectors. Hence, less-educated workers work either in the agricultural sector in rural areas or in low-skilled jobs in urban areas.
Human capital accumulation. Since we focus on the role of urbanization, we formalize human capital accumulation using the following predetermined process: where H t is the proportion of highly-educated workers in the leading countries, β is the speed of convergence towards the long-run equilibrium, φ(u t ) is an increasing function of the variable u t , which measures the degree of urban concentration of less-educated workers, and a t is a scale factor representing the quality and quantity of education infrastructure in the country. 4 Looking at the urban concentration of the less-educated workers is equivalent, in our model, to looking at the urban concentration of people in the country overall, because all the highly-skilled workers are assumed to work in the urban sector. We consider the flow of ideas among people in two dimensions. On the one hand, we take into account the transmission of knowledge from leading countries to developing countries and, on the other hand, we consider the concentration of people in urban areas as a means to transfer knowledge among people outside of the traditional agricultural sector. Equation (1) is compatible with the stylized facts sketched out above: there are convergent forces guiding the dynamics of human capital, and urbanization facilitates the access to schooling. Lucas (2009) used a similar hypothesis. 5 Entrepreneurs' behaviour. At any time t, highly-skilled workers, in proportion h t , are entrepreneurs operating in the urban area. Each of them hires t less-educated workers to produce y ut = A ut α t units of output, where A ut is the total factor productivity in the urban sector, and α ∈ [0, 1] is a parameter of decreasing marginal labour productivity in the urban sector. The labour market for less-educated workers is competitive and the urban wage rate equals the marginal labour productivity. Moreover, when the proportion of entrepreneurs increases, each entrepreneur incurs a congestion cost per firm, c t , which is proportional to the change in the number h t − h t−1 of firms and divided by the number h t of entrepreneurs, Although σ-convergence expresses the equitableness of the distribution of income across economies, we preferred to use a β-convergence model since β-convergence is, first, a required condition for σ-convergence and, second, it is the main concern of the empirical literature on growth. 5 While Lucas (2009) uses a similar hypothesis, in his model GDP per capita is proportional to human capital and uses human capital to explain the dynamics of GDP per capita.
Note that the introduction of A ut in the cost function reflects the higher costs associated with producing in a more productive and specialized urban sector. Besides, q is the adjustment scale factor in urban areas.
The profit function of each entrepreneur is provided that π u,t ≥ 0, otherwise entrepreneurs do not have incentives to produce. Congestion costs are also deducted from the earnings so as to reach the profit per firm. These congestion costs can be interpreted as either the opportunity cost of time that is spent in training the immigrants from rural areas, or as the cost of reduction in market share due to an increase in the number of firms in the cities.
Maximizing π u,t with respect to t determines the wage rate of the low-skilled workers in the urban sector.
Then, the profit rate in the urban sector is We can observe that in the steady state (where c t = 0), income inequality between highly-educated and less-educated workers is Less-educated worker's location decisions. The less-educated can work either in the urban sector, as described above, or in the rural sector, where productivity of each worker is w f t = A f t . The assumption of not having a decreasing marginal productivity in the rural sector is rooted in the idea of surplus labour or surplus land. Sen (1966) also discusses that output does not decrease if workers migrate from rural to urban areas, and this is due to the disguised unemployment found in the rural sector. Moreover, we assume that less-educated workers are free to move between sectors and are allocated to equalize net wages. Thus, the equilibrium number of less-educated employees per entrepreneur is . (3) Note that the number of less-educated workers is bounded by (1 − h t ). Hence, in case that the total demand D t = h t t of workers in the urban sector is higher than (1−h t ), the equilibrium number t of less-educated workers per entrepreneur is (1 − h t )/h t , wages in the urban sector are given by (2), and the rural sector disappears. This case would happen if the wage in the urban sector for low-skilled workers is higher than that in the rural sector. We disregard this case because it requires no rural population, which only happens in a limited number of countries, such as Singapore or Hong Kong, which are small in land area terms and highly industrialized.
From the previous expressions, we can obtain income per capita to be and the degree of urban concentration of less-educated workers is which measures the degree of urban concentration of less-educated workers. Knowing that majority of rural-urban migrants in sub-Saharan Africa are lowly educated (Todaro, 2000), u t can be regarded as a good measurement of urbanization. Besides, since our model assumes that all highly-skilled workers live in urban areas, the urbanization measurement above denotes the urban concentration of total population in a country. 6 Endogenous growth. The levels of total factor productivity in the urban and rural sectors are endogenous. Following Lucas (1988), Azariadis and Drazen (1990), or Benhabib and Spiegel (2005), we assume that these are determined by the proportion of highly-educated workers in the country, A ut = A u (h t ) and with ∂ t /∂h t < 0 if the derivative of A f , with respect to h t is higher than the derivative of A u . It follows then, that income inequality between highly-skilled and less-educated workers (π u,t /w u,t ) decreases with the proportion of highly-skilled workers.
Dynamics. Plugging (4) into (1), we can rewrite the dynamics of the economy as follows: where a t denotes the efficiency of the education system. Along the transition path, income per capita is given by In summary, a rise in h t increases the levels of total factor productivity A ut and A f t , increases the number of entrepreneurs in the urban sector, but reduces the number of employees per entrepreneur t and induces congestion costs c t .

Calibration
In this section, we confront the theory with data to calibrate the parameters of Equations (1) and (5) using panel data. Then, other parameters are calibrated so as to match certain data moments.
Human capital accumulation. First, we look at the impact urbanization has on human capital accumulation. To this aim, we estimate an empirical convergence model in line with a logarithmic transformation of equation (1). In particular, we estimate the equation below: where H t stands for the proportion of highly-skilled workers in the leader economy (in our case, the US), h i,t is the percentage of highly-skilled workers in the resident population of country i at time t, u i,t is the proportion of less-educated workers living in cities in country i at time t, as defined in (4), a 0 is the general intercept, a i is the country fixed effects, a t is time fixed effects which capture common time-dependent shocks, β is a parameter that captures the speed of convergence to the level in long-run equilibrium (the higher it is, the faster the human capital level of country i converges to the human capital level of the US), and δ measures the effect of urbanization on human capital accumulation. Note that we assume the following functional form: φ(u i,t ) = exp(u i,t ) because the econometric calibration exercise revealed that the loglinear form of (u t ) is better in fitting the data; therefore, we have only presented the regression outcomes with φ(u t ) = exp(u t ).
We use data from Docquier, Lowell and Marfouk (2009) (henceforth referred to as DLM (2009)) for the highly-skilled workers H t and h i,t . The proportion of highlyskilled workers corresponds to those with tertiary education and 25 percent of the total secondary-educated population as a percentage of total population. DLM (2009) construct human capital indicators from De La Fuente and Domenech (2002) for OECD countries and from Barro and Lee (2001) for non-OECD countries. In addition, for countries where Barro and Lee measures are missing, they predict the proportion of educated workers using Cohen-Soto's measures (see Cohen and Soto, 2007). The urbanization parameter u i,t is defined as a function of h i,t and i,t . The data for the low-skilled labour in urban areas is calculated as the difference of the highly-skilled population from the urban population. The data on the urban population is obtained from the WDI (2008), where urban population is defined as "the midyear population of areas defined as urban in each country and reported to the United Nations". Data is available for each five-year time period between 1975 to 2000. Number of countries in the data is 136. 7 To supplement the stylized facts depicted in Figure 2 and establish whether there is absolute or conditional convergence in human capital, the model without urbanization effects is estimated in the first step. Table 1(a) presents the results. In the model without fixed effects, we obtain a significant convergence rate of eight percent a year to a common steady state (column 1). This is in line with Figure 2(b). However, this simple regression suffers from unobserved heterogeneity. To solve this problem we add country and time fixed effects. Islam (1995) points to the importance of captur- ing unobserved individual effects in studying convergence (since they are positively correlated with the initial level of human capital) and argues that ignoring them will result in a biased convergence coefficient β. In the fourth column of Table 1(a), we show that the convergence speed increases from eight to fifty percent if unobserved heterogeneity is dealt with, i.e., country fixed effects are significant. This means that the model generates conditional convergence: each country converges to a specific steady state, increasing with the level of human capital the leader has. Besides this, the second and third columns in Table 1(a) show that this jump in convergence speed is not due to time fixed effects, but rather caused by country fixed effects.
In the second step, we introduce the urbanization effect. Amongst the specifications with linear, logarithmic or exponential functional forms we tried, we found that the linear specification defined in (4) provides the best fit. The results in Table  1(b) show that there is a significant convergence rate of eleven percent without fixed effects, which rises to above fifty percent when we control for the country and year fixed effects. We use the parameters from the second column of Table 1(b) in the simulation exercises, and we can argue that there is conditional convergence, even aftercontrolling for the impact of urbanization, which is significantly positive. 8 Urban employment. To explain urban employment in (5), we need to specify and estimate the TFP function in the urban and rural sectors. We assume the following functional forms: (5), this gives where ρ 0 = αA u0 , ρ i is the country fixed effects, and ρ t stands for time fixed effects.
To estimate (8), we use data on the proportion h t of highly-educated workers and the number t of less-educated workers in cities for each country in our dataset from 1975 to 2000. Table 2 shows the results. First, the result of a cross-country regression on 2000 data is presented, and then, the panel estimate results with fixed effects are provided. Both cross-country and panel data estimates show that as the number of entrepreneurs in cities increases, the number of less-educated workers per entrepreneur decreases, which is in line with the parameter definitions. Moreover, the negative value of in Table 2 implies that rural productivity, not urban productivity, is more responsive to the proportion of highly-skilled workers. This result is in line with the evidence presented by Caselli (2005), Restuccia, Yang and Zhu (2008), and Gollin, Lagakos and Waugh (2014a, 2014b), among others. They find that labour productivity differences between rich and poor countries in the agricultural sector are much higher than in non-agricultural activities. The lower responsiveness of the urban (mostly non-agricultural) sector reflects lower differences in productivity between the poor countries with less highly-educated individuals and the rich countries with many highly-educated individuals, than the differences between the poor and rich countries in the rural sector (mostly agricultural). Hence, our estimate captures more general effects, such as the selection of low-ability individuals in low-income countries in the agricultural sector (Lagakos and Waugh, 2013), or barriers to technology adoption in low-income countries (Restuccia, Yang and Zhu, 2008), that explain the agricultural productivity gap. For the simulation exercise, we preferred to use the parameter values from the cross-section data to reduce the number of parameters. In the next section, we show that using the fixed effects from high-income countries has no impact on the performance of sub-Saharan Africa in the long run. -.606 -1.248 Notes: * p<0.05; * * p<0.01; * * * p<0.001. a Regional intercepts are obtained by substracting ln(h 2000 ) from ln l t .
GDP per capita. The remainder of parameters used in the numerical simulation do not come from panel data estimation techniques. As a benchmark parameter for the labour share α of income in the urban sector, we chose 0.6 following the value reported by Caselli (2005). In Figure A.3 (see Appendix) we also present how the results change with an α equal to 0.8, because Gollin (2002) estimated that labour's share oscillates between 0.65 and 0.8 in most countries. Increasing α reduces the difference between the data and the model between 1970 and 1990, but increases the difference between 1995 and 2000, where we match better the data on urban population. The value of α mainly affects long-run outcomes, which are higher with the lower alpha, but short-run results on delayed growth due to rapid urbanization are not modified. Next, we set the elasticity z of TFP to human capital to 0.3, following Mankiw, Romer and Weil's (1992) estimation. Using different specifications to estimate the effect of human capital on growth, and measures for human capital stock such as years of schooling or the working-age population in secondary school, Mankiw, Romer and Weil (1992), Benhabib and Spiegel (1994), and Knowles and Owen (1995) obtained an estimate for the elasticity between 0.1 and 0.3. In Figure  A.4 (see Appendix) we show how the results change for these values. The effect of reducing z from 0.3 to 0.1 is equivalent to increasing α. Finally, for every country or region i, we calibrate the initial urban TFP A i u0 and the adjustment scale parameter q i . We choose the parameter values that match the level of GDP per capita observed in 1975 and in 2000. The calibrated values obtained for the adjustment scale parameter q for the different regions are (SSA, LAC, MENA, ASIA)=(13, 2.5, 7.6, 1.3). They are larger for SSA than for the rest of the regions, this implies that the effect human capital has on technological progress and economic growth takes longer in SSA than in the rest of the regions. In Figure 5, we also plot the estimated scale parameter q and the proportion of the urban population living in slums. A slum household is defined as a group of individuals living under the same roof lacking one or more of the following conditions: access to improved water, access to improved sanitation, sufficient living area, and/or durability of housing. This positive relation suggests that q is capturing poor urban planning, since rapid urbanization is related to higher urbanization costs that lower per capita GDP growth in the short-run. Tables 3, 4, and 5 in the Appendix summarize the parameter values obtained from the estimation techniques used in the numerical exercises carried out in the remainder of the paper.

Human capital dynamics per region
Once all parameters are calibrated, they can be included in the dynamic equation (6). Figure 6(a) depicts the dynamics of human capital in each developing region and in high-income countries. Regional fixed effects are weighted averages of country fixed effects in each region. Given those region-specific parameters, the steady state differs across regions. The long-run human capital stock equals 0.32 in high-income countries, 0.25 in LAC, 0.22 in the MENA, 0.13 in South Asia and .09 in sub-Saharan Africa. These steady states are locally stable. Unbounded growth could be obtained if countries started with a proportion of college graduates above 60 percent. When we compare these simulation results with the observed human capital levels in each region, it seems that LAC and MENA have the highest distance to their steady-state levels compared to other regions. Moreover, as it will be discussed in greater detail in Section 4, SSA needs about 20 years to reach its steady-state human capital level. Furthermore, it is predicted that there is no room for the absolute convergence among regions since the efficiency of the education system in each region is different and this is an important factor in human capital dynamics. It is exemplified in Figure 6(b) for SSA. Figure 6(b) is a counterfactual exercise in which we substitute the sub-Saharan fixed effect by those observed in high-income countries. The simulation exercise counterf 2 shows that substituting the value for ρ i in equation (8) does not modify the steady state very much, whereas, substituting the value for a i in equation (7) does have a major impact on human capital accumulation. The counterfactual steady state in the simulation exercise counterf 1 would be almost identical to that obtained in high-income countries. This shows that the technological function of human capital formation plays a key role determining long-run performances of developing countries. The different steady state values of human capital predict the different levels of income in the long run, i.e. there is no absolute convergence of income across regions. The calibrated parameter values used in the simulations predict that these regions will not catch up with high-income regions unless, as observed in high-income countries, they modify country-specific characteristics such as institutions or education quality. However, as will be made clearer in the next section, this does not imply that the investment in education in SSA will not deliver economic growth. Note: In Figure 6(a), we plot the proportion of high-skilled workers in t + 5 as a function of the proportion in t, using the fixed effects obtained for different regions. An intersection with the 45-degree line is a long-run steady state. In Figure 6(b), we focus on sub-Saharan Africa and compare the dynamics obtained with regional fixed effects (SSA), with highincome fixed effects for the human capital equation (counterf 1 ) and with the high-income fixed effect for the urban-to-rural productivity ratio (counterf 2 ). We also represent the high-income dynamics for comparison.

The growth backlog of Africa
In this section, the results from the dynamic simulations for sub-Saharan Africa are discussed and interpreted in detail. In the first sub-section, the observed levels of human capital in 1975 are used as the starting point. Then, the model simulates the human capital, urbanization and GDP per capita for 1975 to 2060. In the second sub-section, the comparison of the simulations with and without urbanization costs is presented. Figure 7 presents the simulated transition paths of human capital (Fig 7a), urbanization (Fig 7b) and GDP per capita (Fig 7c)   will reach thirty-two percent in the pessimistic scenario. Those long-run values will be attained around 2020. As education and urbanization growth rates decline, the economic take-off takes place and GDP per capita will be multiplied by 1.5 in the long-run, as shown in Figure 7(c). It takes about 30 years for GDP per capita to reach its long-run value. As in Lucas (2009), the growth in human capital leads to higher income in the long-term, but this will induce temporary costs in the medium term and that is why we observe a time lag between reaching the steady-state in GDP per capita, human capital and urbanization.

Predictions for sub-Saharan Africa
In the more optimistic scenario, depicted by the lines with circles, the trend in human capital, along with the urbanization rate, continues to increase until 2030. The take-off for Africa is then delayed by a few decades or so, but the long-run effect will be stronger since GDP per capita will be multiplied by 2. It is also interesting to note that, in that scenario the increase in human capital is higher than the increase in urbanization. It can be argued that temporary urbanization costs are, to some extent, overwhelmed by the positive externality of urbanization on human capital, which in turn causes a higher GDP per capita growth in the long-run. According to this theory, educational investment in Africa cannot be seen as a deadweight loss, and continued progress in education should be promoted in the region to ensure higher growth in the future.

Different adjustment cost specifications
In this section, we perform a robustness analysis on different adjustment cost specifications. To be more precise, we consider the following two alternative specifications to the one presented above: These alternative specifications take into account the change in the number of entrepreneurs in the urban sector not only in the current period, but also in previous periods. To perform a numerical exercise with these specifications we recalibrate parameter q i to match the level of GDP per capita in 2000. Figure 8 plots the evolution of GDP per capita under different cost-adjustment functions. First, note that long-run levels of GDP are not modified due to the different specifications. Next, the convergence to the long-run steady state is slower as more inertia is introduced into the cost function. In this case, we increase the time needed for the economy to benefit from the new entrepreneurs, which in turn increases the time required before the result of the increased inertia on the economic growth become apparent. Furthermore, introducing more inertia matches the data better but does not modify the main conclusions.

The counterfactual with no adjustment cost
In the previous sections, we have emphasized the role of adjustment costs to explain the fact that GDP per capita in sub-Saharan countries is not at its expected level despite the growth in human capital accumulation and urbanization rates. In this section, we perform the following exercise: What would have happened without adjustment costs? Would the results change in the long term? Figure 9(a) shows that, in the absence of urbanization costs, the rise in educational attainment would have generated a direct increase in the GDP per capita after 1975 and sub-Saharan Africa would have been much richer in 2000 than was actually observed. Furthermore, although the long-run level would not be modified, the (a) Predicted GDP per capita with and without urbanization costs (b) GDP per capita with respect to the US transition would be smoother than with adjustment costs. The reason behind the long-run level not being modified is that adjustment costs in the steady state are assumed to be 0. Therefore, this graph reinforces our conclusion that the adjustment cost of urbanization is one of the main obstacles to take-off in sub-Saharan Africa.
As a final exercise, we compare the evolution of the GDP per capita in SSA with respect to the level in the US in the year 2000 (see Figure 9(b)). In the absence of urbanization costs, the transition towards the long-run level of income would have been faster. Urbanization costs are delaying an increase in the current per capita GDP levels by about 15 years .

Conclusion
This paper analyses how urbanization influences economic development and growth through the impact it has on human capital accumulation. We build a theoretical model of endogenous human capital accumulation and calibrate it to account for the effects of urbanization on human capital dynamics.
The theoretical model includes urbanization in human capital dynamics, in such a way that urbanization externalities are taken into account in the convergence process. Besides this, negative externalities of urbanization, namely congestion costs, are also considered in the generation process of economic growth. In the model, a rise in human capital increases the level of productivity but also induces congestion costs that may delay the positive effects that the increase in human capital should have. The positive externality associated with the higher level of human capital accumulation can be considered as carrying a latent growth potential rather than a deadweight loss. In other words, countries that have experienced a rise in human capital but have not experienced as much growth as expected will, according to our model, grow when urbanization and human capital accumulation adjust. Moreover, the theoretical analysis reveals a negative relation between income inequality (defined as the income ratio between high-and low-skilled workers) and the level of development. Note that, we considered a constant labour force and so our model does not allow for population size changes.
The calibration exercise for the human capital dynamics per region indicates that there is conditional convergence between regions: high-income countries, sub-Saharan Africa, the Middle East and Northern Africa, Latin America and the Caribbean, and South Asia. Hence, there is no absolute converge, each region converges to a different steady state and poor countries do not achieve the income levels of rich countries. Moreover, it is found that the technological function of human capital formation plays a key role in determining the long-run performances of developing countries. Furthermore, the counterfactual analysis provides an optimistic view on the Africa's growth tragedy and shows that the reason behind the fact that educational investments in sub-Saharan Africa do not generate an economic growth is the temporary adjustment costs generated by urbanization. We perform an exercise that considers the case where these urbanization costs are absent and show that the rise in educational investments would trigger a direct increase in GDP. Hence, the calibration exercise reinforces the idea of an African growth backlog. Notes: a values are weighted averages from country fixed effects from estimation of human capital equation (7). ρ values (in logs) are from estimation of equation (8), Table 2. In Figure 6(b) we substitute a SSA with a H and keep ρ SSA in counterf 1 , whereas in counterf 2 we substitute ρ SSA with ρ H and keep a SSA .