Create artificial data

Create artificial data for eleven ids (1-11), ten years (2001-2010), three countries (17, 39, 400), and continuous variable x and y.

. clear

. qui set obs 601

. gen id   = int((_n-1)/60)+1

. gen hv   = _n-60*(id-1)

. gen year = 2000+mod(hv,10)+10*(mod(hv,10)==0)    

. gen land = int((2000-year+hv)/10)+1 

. qui replace land =  17 if land==1|land==4

. qui replace land =  39 if land==2|land==5

. qui replace land = 400 if land==3|land==6    

. drop hv

. set seed 86937263

. gen x = runiform()

Create an id that has to be ignored since x is missing.

. qui replace id = 11 if _n==301  

. qui replace x  =  . if id==11

Set x to missing in the year 2002 for all ids but two so that there are too few distinct ids in this year.

. qui replace x = . if year==2002 & id > 2

Create negative values

. set seed 69326378

. gen y = rnormal()

. cls

M A X R D S C

MAXIMUM and MINIMUM are single observations and thus must not be shown. If necessary, the RDSC allows to publish approximate values calculated as averages of a sufficient number of observations (top coding). The following requirements apply: at least five distinct entities have to be covered and the share of the two largest ones must not exceed 85 percentages of the total (dominance criterion).

. maxrdsc id x, min(12) 

No problems for minimum of x. Average minimum based on  5  distinct ids:  .0074648052769979

The mean of the five smallest observations is smaller than the result given by maxrdsc.

. qui sum x in 1/5

. di "  mean " r(mean)
  mean .00493174

If we list the ten smallest observations, we see that the five smallest values belong to only three distinct ids. To cover at least five distinct ids we have to use the seven smallest observations .

. sort x

. list id x in 1/10, noobs

  ┌───────────────┐
  │ id          x │
  ├───────────────┤
  │  8   .0007817 │
  │  8   .0009793 │
  │ 10   .0048133 │
  │  9   .0061389 │
  │  8   .0119455 │
  ├───────────────┤
  │  7   .0132593 │
  │  1   .0143357 │
  │  4   .0150426 │
  │  1   .0150839 │
  │  7    .015683 │
  └───────────────┘

. qui sum x in 1/7

. di "  mean " r(mean)
  mean .00746481

Please do not type the last four lines in your output! We type them here only for clarification.

In the next example at most 12 observations are accepted to approximate the maximum.

. maxrdsc id x, max(12) 

No problems for maximum of x. Average maximum based on  5  distinct ids:  .9986101269721985

. list id year land x in 543/552, noobs

  ┌─────────────────────────────┐
  │ id   year   land          x │
  ├─────────────────────────────┤
  │  9   2005     39   .9878153 │
  │  4   2003    400   .9896339 │
  │  6   2003     39   .9912419 │
  │  6   2004     39   .9931296 │
  │  4   2008    400   .9978754 │
  ├─────────────────────────────┤
  │  8   2010     17    .997907 │
  │  1   2007     17   .9987306 │
  │  6   2006     39   .9992318 │
  │  2   2009     17   .9993058 │
  │  3   2002    400          . │
  └─────────────────────────────┘

. qui sum x in 547/551

. di "  mean " r(mean)    
  mean .99861013

It is possible to specify minimum and maximum at the same time.

. maxrdsc id x, min(9) max(12)

No problems for maximum of x. Average maximum based on  5  distinct ids:  .9986101269721985
No problems for minimum of x. Average minimum based on  5  distinct ids:  .0074648052769979

. maxrdsc id x, min(12) max(9) 

No problems for maximum of x. Average maximum based on  5  distinct ids:  .9986101269721985
No problems for minimum of x. Average minimum based on  5  distinct ids:  .0074648052769979

In the next example 12 observations are not enough to determine the maximum of variable x for Belgium because the 15 largest observations belongt to only four distinct ids.

. foreach i in 17 400 {
  2.     display
  3.     display "country:  `i'"
  4.     maxrdsc id x if land==`i', min(12) max(12) 
  5. }

country:  17

D I S C L O S U R E problem: For variable x 12 observations are not sufficient to determine maximum.

country:  400

No problems for maximum of x. Average maximum based on  5  distinct ids:  .9866548180580139
No problems for minimum of x. Average minimum based on  5  distinct ids:  .021380212690149

N O I T E R A T E

The researcher can require a specific number of observations without iteration. This option may be needed for comparisons with other studies.

. maxrdsc id x, min(10) max(10) noiterate

No problems for maximum of x. Average maximum based on  10  observations:  .9941961109638214
No problems for minimum of x. Average minimum based on  10  observations:  .0098063051176723

. qui sum x in 542/551

. di "  mean " r(mean)        
  mean .99419611

. qui sum x in 1/10

. di "  mean " r(mean)    
  mean .00980631

A B S O L U T E

In some cases the researcher may prefer maximum or minimum of the absolute values.

. maxrdsc id y, min(12) max(12) absolute

No problems for maximum of y. Average maximum based on  5  distinct ids:  2.683389610714383
No problems for minimum of y. Average minimum based on  5  distinct ids:  .0103370569746143

R E P L A C I N G values

E. g. for making a graph of an empirical cumulative distribution function one has to replace existing values. The researcher can provide his own variable or use the option name() and specify only the name of a new variable.

. gen y_user = y

. maxrdsc id y_user, min(12) max(9) update  

No problems for maximum of y_user. Average maximum based on  5  distinct ids:  2.460660934448242
No problems for minimum of y_user. Average minimum based on  5  distinct ids:  -2.637131384440831

or

. maxrdsc id y, min(12) max(9) name(y_tc)   /* tc for topcode */

No problems for maximum of y. Average maximum based on  5  distinct ids:  2.460660934448242
No problems for minimum of y. Average minimum based on  5  distinct ids:  -2.637131384440831

. cumul y_tc, gen(y_tc_cum) equal

. line y_tc_cum y_tc, sort 

. graph export cumul.png, replace
(file cumul.png written in PNG format)
Example Cumulated density function

Example Cumulated density function

Top-coding produces ties so you should specify 'equal' with the cumul command.

In case of entire tables the researcher has to use two variables, one for the total and another for the breakdown. Otherwise some of the largest values regarding the total may be replaced by averages of the following samples.

. maxrdsc id x, min(20) max(20) name(x_tc) 

No problems for maximum of x. Average maximum based on  5  distinct ids:  .9986101269721985
No problems for minimum of x. Average minimum based on  5  distinct ids:  .0074648052769979
. qui gen x_user = x

. foreach i in 17 400 {
  2.     maxrdsc id x_user if land==`i', min(20) max(20) update
  3. }

No problems for maximum of x_user. Average maximum based on  5  distinct ids:  .945697195827961
No problems for minimum of x_user. Average minimum based on  5  distinct ids:  .0207207249113708

No problems for maximum of x_user. Average maximum based on  5  distinct ids:  .9866548180580139
No problems for minimum of x_user. Average minimum based on  5  distinct ids:  .021380212690149

. sort x

. list id land year x x_tc x_user if x_tc!=x & x!=., noobs

  ┌───────────────────────────────────────────────────┐
  │ id   land   year          x       x_tc     x_user │
  ├───────────────────────────────────────────────────┤
  │  8     17   2007   .0007817   .0074648   .0207207 │
  │  8     39   2008   .0009793   .0074648   .0009793 │
  │ 10     39   2004   .0048133   .0074648   .0048133 │
  │  9     17   2006   .0061389   .0074648   .0207207 │
  │  8    400   2007   .0119455   .0074648   .0213802 │
  ├───────────────────────────────────────────────────┤
  │  7     17   2008   .0132593   .0074648   .0207207 │
  │  1    400   2004   .0143357   .0074648   .0213802 │
  │  4    400   2008   .9978754   .9986101   .9866548 │
  │  8     17   2010    .997907   .9986101   .9456972 │
  │  1     17   2007   .9987306   .9986101   .9456972 │
  ├───────────────────────────────────────────────────┤
  │  6     39   2006   .9992318   .9986101   .9992318 │
  │  2     17   2009   .9993058   .9986101   .9456972 │
  └───────────────────────────────────────────────────┘

. sort land x

. list id land year x x_tc x_user if x_user!=x & x!=., sepby(land) noobs

  ┌───────────────────────────────────────────────────┐
  │ id   land   year          x       x_tc     x_user │
  ├───────────────────────────────────────────────────┤
  │  8     17   2007   .0007817   .0074648   .0207207 │
  │  9     17   2006   .0061389   .0074648   .0207207 │
  │  7     17   2008   .0132593   .0074648   .0207207 │
  │  9     17   2003   .0179612   .0179612   .0207207 │
  │  7     17   2001   .0229569   .0229569   .0207207 │
  │  9     17   2005   .0311553   .0311553   .0207207 │
  │  4     17   2001   .0324234   .0324234   .0207207 │
  │  3     17   2007    .041089    .041089   .0207207 │
  │  5     17   2007   .8989528   .8989528   .9456972 │
  │  2     17   2002   .9068126   .9068126   .9456972 │
  │  2     17   2001   .9111468   .9111468   .9456972 │
  │  4     17   2003   .9113399   .9113399   .9456972 │
  │  2     17   2006   .9246309   .9246309   .9456972 │
  │  1     17   2008   .9270924   .9270924   .9456972 │
  │  8     17   2004   .9375268   .9375268   .9456972 │
  │  1     17   2008    .938831    .938831   .9456972 │
  │  2     17   2009   .9455019   .9455019   .9456972 │
  │  2     17   2010   .9546198   .9546198   .9456972 │
  │  2     17   2004   .9552367   .9552367   .9456972 │
  │  4     17   2001   .9602001   .9602001   .9456972 │
  │  8     17   2006     .96332     .96332   .9456972 │
  │  8     17   2010    .997907   .9986101   .9456972 │
  │  1     17   2007   .9987306   .9986101   .9456972 │
  │  2     17   2009   .9993058   .9986101   .9456972 │
  ├───────────────────────────────────────────────────┤
  │  8    400   2007   .0119455   .0074648   .0213802 │
  │  1    400   2004   .0143357   .0074648   .0213802 │
  │  3    400   2006   .0180618   .0180618   .0213802 │
  │  8    400   2004   .0208572   .0208572   .0213802 │
  │  6    400   2004    .021805    .021805   .0213802 │
  │  3    400   2008   .0254984   .0254984   .0213802 │
  │  9    400   2004   .0371579   .0371579   .0213802 │
  │  5    400   2009   .9780802   .9780802   .9866548 │
  │  2    400   2002   .9813354   .9813354   .9866548 │
  │  8    400   2008   .9859142   .9859142   .9866548 │
  │ 10    400   2004   .9870899   .9870899   .9866548 │
  │  4    400   2003   .9896339   .9896339   .9866548 │
  │  4    400   2008   .9978754   .9986101   .9866548 │
  └───────────────────────────────────────────────────┘

If you want to show results outside the RDSC you have to create a table! The following shows a possibility using the stored results

. foreach i in 17 400 {
  2.     di ""
  3.     display "country:  `i'"
  4.     maxrdsc id x if land==`i', min(20) max(20)
  5.     local min_`i' = r(minval)
  6.     local max_`i' = r(maxval)
  7.     qui sum x        if land==`i'
  8.     local n_`i'    = r(N)
  9.     local mean_`i' = r(mean)
 10.     local sd_`i'   = r(sd)
 11. }   

country:  17

No problems for maximum of x. Average maximum based on  5  distinct ids:  .945697195827961
No problems for minimum of x. Average minimum based on  5  distinct ids:  .0207207249113708

country:  400

No problems for maximum of x. Average maximum based on  5  distinct ids:  .9866548180580139
No problems for minimum of x. Average minimum based on  5  distinct ids:  .021380212690149
. foreach i in 17 400 {
  2.     di %30s           "`i'"  %10.0f `n_`i'' %18.4f `mean_`i'' %18.4f `sd_`i''   %18.4f `min_`i''  %18.4f `max_`i''
  3. }   
                            17       183            0.5070            0.2845            0.0207            0.9457
                           400       184            0.5106            0.3013            0.0214            0.9867

K e r n e l density

If the researcher's interest lies in the probability density function he may use kernel density estimates.

. kdensity y

. graph export kernel.png, replace
(file kernel.png written in PNG format)
Example Kernel density function

Example Kernel density function