### Study area

The Yellow River, China’s second-longest river, originates from the northern foothills of the Bayan Har Mountains on the Qinghai–Tibet Plateau. With a total length of 5,464 km, it flows through nine provinces (autonomous regions) and covers a basin area of 7.95 × 10^{5} km^{2}. In the “Report on Ecological Protection and High-Quality Development of the YRB” released in 2021, it is mentioned that the high-quality development of the YRB aims to establish a spatial pattern of “one axis, three zones, five river sections, and seven major urban agglomerations”^{42}. The seven major urban agglomerations include the Lanzhou-Xining Urban Agglomeration (LXUA), Ningxia Along the Yellow River Urban Agglomeration (NXUA), Guanzhong Plain Urban Agglomeration (GPUA), Hohhot-Baotou-Ordos-Yulin Urban Agglomeration (HBUA), Central Shanxi Urban Agglomeration (CSUA), Central Plains Urban Agglomeration (CPUA), and Shandong Peninsula Urban Agglomeration (SPUA). These urban agglomerations include most of the major cities in the basin, accommodating approximately 63% of the population and contributing to 70% of the GDP in the basin. The study area of this paper is determined based on the development planning documents of the seven major urban agglomerations in the YRB, encompassing 615 county-level units (Fig. 1). It is worth noting that in the development planning of urban agglomerations, Yuncheng City belongs to both the GPUA and the CPUA, while Heze City and Liaocheng City belong to both the SPUA and the CPUA. For analytical convenience, considering the closeness of regional economic ties and administrative factors, Yuncheng City is categorized under the GPUA, and Heze City and Liaocheng City are categorized under the SPUA in the subsequent analysis. In addition, the CPUA includes Xingtai City and Handan City in Hebei Province, as well as five cities in Anhui Province: Suzhou, Fuyang, Huaibei, Bengbu, and Bozhou.

### Data source and processing

The administrative vector data for the 615 county-level units in the study area were obtained from the 1:1,000,000 China National Vector Dataset released by the National Geomatics Center of China ( and extracted using ArcGIS 10.6. The data for main functional zones were sourced from the “Main Functional Zone Planning” of each province ( Land use data were obtained from the Recourse and Environment Science Data Center of the Chinese Academy of Sciences ( and after reclassification using ArcGIS 10.6, six land use types were derived: farmland, forestland, grassland, water area, construction land, and unused land. Socio-economic and energy consumption data were sourced from various provincial and municipal statistical yearbooks, urban yearbooks, county statistical yearbooks, and the “China Energy Statistical Yearbook” ( Carbon emission coefficient data were sourced from the “IPCC National Greenhouse Gas Inventory Guidelines”.

The DMSP-OLS and NPP-VIIRS nighttime light data used in this study can be downloaded from the National Oceanic and Atmospheric Administration of the United States ( The original data from both sources did not meet the research requirements. Therefore, following the approach in relevant literature^{43,44}, we applied the invariant region method to correct these two types of nighttime light data. Firstly, the DMSP/OLS data underwent image reprojection, resampling, and clipping, as well as mutual correction, saturation correction, and continuity correction. Secondly, the NPP/VIIRS data images were reprojected, resampled, clipped, subjected to noise processing, and synthesized into annual data. Thirdly, a fitting relationship was established for the DMSP/OLS and NPP/VIIRS nighttime light data for continuous correction of DN values. In the end, a consistent set of remote sensing light data for the YRB urban agglomerations from 2000 to 2020 was obtained.

### Research methods

#### Carbon budget calculation methods

Carbon budget estimation of land use mainly involves two aspects: the role of forestland, grassland, water area and unused land in carbon absorption, and the role of farmland and construction land in carbon emissions^{36}. In this study, the carbon emissions from farmland, forestland, grassland, water area and unused land are calculated using the direct carbon emission coefficient method. The calculation formula is as follows:

$$E_k = \sum e_i = \sum S_i \times Q_i $$

(1)

where *E*_{k} represents the direct carbon emissions, *e*_{i} is the carbon emissions of land use type *i*, *S*_{i} is the area of land use type *i*, and *Q*_{i} is the carbon emission coefficient of land type *i*. According to relevant studies^{44,45,46}, the carbon emission coefficients for farmland, forestland, grassland, water area and unused land are 0.422, − 5.770, − 0.021, − 0.253, and − 0.005 t·hm^{2}, respectively.

The carbon emissions from construction land are calculated based on the indirect carbon emission coefficient method from the “IPCC National Greenhouse Gas Inventory Guidelines”^{13}. The formula is as follows:

$$E_\textc = \sum E_i \times b_i \times \theta_i $$

(2)

where *E*_{c} represents the carbon emissions from construction land in each province, *E*_{i} represents the consumption of various types of energy, *b*_{i} denotes the conversion coefficient to standard coal for each type of energy, and *θ*_{i} represents the carbon emission coefficient for each type of energy. References^{47,48} were consulted for selecting the standard coal conversion factors and carbon emission coefficients, as shown in Table 1.

#### Fitting estimation of county-level carbon emissions from energy consumption based on nighttime light data

Due to the lack of county-level data in China’s energy consumption statistics, this study refers to relevant research^{44,49,50}. An estimation of county-level energy consumption carbon emissions is achieved by constructing a fitting equation using provincial-level data on energy consumption carbon emissions and nighttime light data. Considering the accuracy issue in downscaling simulation, we choose to use a linear equation without intercept for fitting^{51}. The calculation formula is as follows:

$$Y_\textij = \alpha \times \textx_{\textij}$$

(3)

where *Y*_{ij} is the simulated value of energy consumption carbon emissions for region *i* in year *j*, *α* is the regression coefficient, and *x*_{ij} is the total sum of brightness values of all stable nighttime light image pixels for region *i* in year *j* after correction. The fitting equation for nighttime light data and energy consumption carbon emissions for each province is shown in Table 2.

There is a strong linear correlation between nighttime light image brightness and energy consumption carbon emissions. The regression coefficients of the fitting equations for all 10 provinces pass the 1% significance test, and the R^{2} values are all greater than 0.9. The mean elative error (MRE) between the statistical and simulated values of energy consumption carbon emissions for all provinces are less than 25%, with 8 provinces having MRE less than 20%. The fitting results are better than some scholars’ research results^{49,50}. Therefore, the estimated carbon emissions from the fitting equations have high credibility, and the data can meet the research needs.

To minimize the errors introduced by the fitting equations, a provincial-level zero-error correction method is applied to the simulated carbon emission values at the unit pixel scale, referring to relevant research^{50}. The two calculation formulas are as follows:

$$Q_n = \fracA_r(n) A_i(n) $$

(4)

$$A_\textr(n) T_k = A_\texti(n) T_k \times Q_n$$

(5)

where *Q*_{n} is the ratio coefficient between the carbon emission statistical value and the initial simulated value in the *n* year, *A*_{r(n)} is the actual carbon emission amount statistical value in the *n* year, *A*_{i(n)} is the initial simulated value of carbon emission in the *n* year, *A*_{r(n)}*T*_{k} is the corrected carbon emission amount distributed on the *k* grid cell in the *n* year, and *A*_{i(n)}*T*_{k} is the initial simulated carbon emission value distributed on the *k* grid cell.

#### Moran’s I index

Global Moran’s I reflect whether the spatial data of net carbon emissions in the study area’s counties show a trend of clustering or dispersion, as well as the strength and significance of this trend. The calculation formula is as follows:

$$I = \frac{n\sum\nolimits_i = 1^\textn \sum\nolimits_j = 1^n W_ij (x_i – \overlinex)(x_j – \overlinex) }{{\sum\nolimits_i = 1^n \sum\nolimits_j = 1^n W_ij \sum\nolimits_i = 1^n \left( x_i – \overlinex \right)^2 }}$$

(6)

Local Moran’s I reflect the degree of spatial dissimilarity and significance between the net carbon emissions of each county and its surrounding areas. The calculation formula is as follows:

$$I_c = \frac(x_i – \overlinex)\sum\nolimits_j = 1^n W_ij (x_j – \overlinex) {\sum\nolimits_i = 1^n \left( x_i – \overlinex \right)^2 }$$

(7)

where *n* represents the number of counties in the study area, *x*_{i} and *x*_{j} are the net carbon emissions of counties *i* and *j*, \(\overlinex\) is the average net carbon emissions of counties, and *W*_{ij} represents the spatial weight matrix. *I* is Global Moran’s I, with a range of [− 1,1]. If *I* > 0, it indicates positive spatial correlation, and the spatial distribution pattern is clustering; if *I* < 0, it is the opposite. *I*_{c} is Local Moran’s I, and it is categorized into high–high clustering (H–H), high-low clustering (H–L), low–high clustering (L–H), low-low clustering (L–L) using ArcGIS 10.6.

#### Dagum gini coefficient

The Dagum Gini coefficient is employed to analyze the regional differences and sources of net carbon emissions in the urban agglomerations of the YRB. The calculation formula is as follows:

$$G = \frac{\mathop \sum \limits_j = 1^k \mathop \sum \limits_h = 1^k \mathop \sum \limits_i = 1^n_j \mathop \sum \limits_r = 1^n_h Y_ji – Y_hr }2n^2 \overlineY$$

(8)

where *n* is the total number of counties, *k* is the number of urban agglomerations, *n*_{j}/*n*_{h} represents the number of counties in the *j/h* urban agglomeration, *Y*_{ji}/*Y*_{hr} represents the net carbon emissions of the *i/r* county in the *j/h* urban agglomeration, and \(\overlineY\) is defined as the average net carbon emissions for all counties.

The Dagum Gini coefficient can be decomposed into the Gini coefficient within the group (Gw), Gini coefficient between groups (Gb), and Gini coefficient of hyper-variable density (Gt), satisfying G = G_{w} + G_{b} + G_{t}^{52}. In this study, the Gw reflects the disparity in net carbon emissions levels within each urban agglomeration, the Gb reflects the disparity in net carbon emissions levels between urban agglomerations, and the Gt reflects the phenomenon of overlapping carbon emissions in various urban agglomerations, reflecting the relative disparity. The formulas for the three coefficients are as follows:

$$G_w = \sum\nolimits_j = 1^k G_jj P_j S_j$$

(9)

$$G_b = \sum\nolimits_j = 1^k \sum\nolimits_h \ne j G_jh P_j S_h D_jh $$

(10)

$$G_t = \sum\nolimits_j = 1^k \sum\nolimits_h \ne j^ G_jh P_j S_h (1 – D_jh ) $$

(11)

where *G*_{jj} represents the intra-group Gini coefficient of group *j*, *G*_{jh} represents the Gini coefficient between group *j* and group *h*, *P*_{j} is the ratio of the number of counties in group *j* to the total number of counties, *S*_{h} is the ratio of carbon emissions within group *h* to the total carbon emissions, and *D*_{jh} is the degree of difference in carbon emissions between group *j* and group *h*.

#### Carbon compensation zoning based on NRCA index

The Normalized Revealed Comparative Advantage Index (NRCA index) is originally intended to measure the competitiveness of a country’s products or industries in the international market. It is now widely applied in energy utilization, spatial advantage function identification, and other areas. This index was improved by Yu et al.^{53} based on the Revealed Comparative Advantage Index (RCA index) constructed by Balassa^{54}. In this study, it is employed to determine the advantageous attributes for carbon compensation zoning in the urban agglomerations in the YRB. The calculation formula is as follows:

$$NRCA_j^i = {X_j^i \mathord{\left/ \vphantom X_j^i X \right. \kern-0pt} X} – {X_j X^i \mathord{\left/ \vphantom X_j X^i XX \right. \kern-0pt} XX}$$

(12)

where *Xi j* represents the indicator value of attribute *j* in county *i*; *X*_{j} represents the total indicator value of all attributes in county *j*; *X*_{i} represents the total indicator value of all attributes in county *i*; *X* represents the total indicator value of all counties and attributes. If *NRCAi j* > 0, it indicates that county *i* has a comparative advantage in attribute* j*.

Referring to relevant studies^{55,56}, the total carbon emission is chosen as the attribute to determine the total scale of carbon compensation zones. The economic contribution coefficient of carbon emissions (ECC) serves as the socio-economic attribute, the ecological support coefficient of carbon emissions (ESC) is the ecological environment attribute, and the degree of land spatial development is the spatial structure attribute. The formulas for ECC, ESC and the degree of land spatial development are as follows:

$$ECC = \fracG_i G/\fracC_i C$$

(13)

$$ESC = \fracS_i S/\fracC_i C$$

(14)

$$DL_\texti = \frac{CL_\texti }{{T_\texti }} \times 100\%$$

(15)

where *G*_{i} and *G* represent the GDP of each county and the corresponding urban agglomeration, respectively; *C*_{i} and *C* represent the carbon emissions of each county and the corresponding urban agglomeration, respectively; *S*_{i} and *S* represent the carbon absorption of each county and the corresponding urban agglomeration, respectively; DL_{i} is the degree of land spatial development of each county, CL_{i} is the construction land area of each county and T_{i} is the total land area of each county.

Based on the four attribute advantage indices obtained from NRCA, the SOM-K-means model is used for cluster analysis to identify three types of carbon compensation zones: payment zone, balanced zone, and compensated zone^{56}. Then, through the combination with the delineation of the main functional zones, the final types of carbon compensation zones are obtained.

link