- 1. The Questionnaire
- 2. Sampling Design
- 3. Imputation of missing variables
- 4. Weighting
- 5. The PHF Field Phase
During the PHF's first wave in 2010 and 2011 a net sample of 3,565 randomly selected households was collected. Wealthy households are oversampled on the basis of micro-geographical information. The second survey took place in 2014 and the third survey was conducted in 2017. The intended survey frequency for future waves is three years. The panel structure mimics that of PSID. All households are re-contacted, and all individuals are tracked. If households break up or isolated individuals break off, new households are added to the panel. In order to address panel mortality and to include new important subgroups, such as migrants, there will be refreshment samples at regular or irregular intervals. The first refreshment sample was added in 2014.
The PHF questionnaire consists of nine modules. All data are collected by face-to-face, computer-assisted personal interviews (CAPI). A "financially knowledgeable person" answers questions on household composition and wealth ("household interview"). In addition, each household member aged 16 and above is asked to respond to shorter modules referring to the financial situation of individuals. In wave one there were paper versions and an online interface for the modules addressed to individual household members in order to capture situations where individuals cannot be interviewed personally. In the second and third wave, these modes of data collection were substituted with CATI (Computer-assisted telephone interviews). Clear preference is however given to personal interviews with all household members. CAPI is also the only mode foreseen for the household interview.
1. The Questionnaire
Most questions are aimed at the household as a whole. These questions are to be answered by the household member who knows best about the household's financial situation. Furthermore, the questionnaire contains questions relating to income, old age provision and the occupation of each household member older than 16. These questions are to be answered by each household member individually. The questionnaire consists of the following modules.
- Real assets and their financing
- Other abilities/credit constraints
- Private businesses and financial assets
- Intergeneration transfers and gifts
- Pensions and insurance policies
Furthermore, the German questionnaire also contains questions on savings behavior and financial literacy. In the second wave a small module on price expectations and questions on potential constraints for real estate purchases were added.
The English language version of the core questionnaire, which is harmonized across all HFCN surveys, can be accessed at the European Central Banks HFCN website. The PHF questionnaire is made available in English and in German (see link below).
2. Sampling Design
In order to capture an adequate number of wealthy households in the final sample, the sampling design is characterized by a higher selection probability of wealthy households. This holds for the participants in the first wave and for the refresher sample in the subsequent waves. This is necessary to increase the statistical power for the analysis of the wealth distribution and research focusing on rare assets typically held by wealthy households. The sampling has three stages.
- Identification of wealthy regions based on income tax statistics
- Identification of wealthy street sections within the wealthy regions
- Selection of addresses
The first stage divides municipalities into three strata according to size and proportion of wealthy households.
The second stage is based on a stratification of street sections. In big cities with 100,000 residents and more, street sections are grouped into two categories − street sections in wealthy neighbourhoods and other street sections. Small and middle-sized municipalities with fewer than 100,000 residents are treated as a single unit. The street sections of those municipalities are not categorized. The reasons for this is that they have only a small number of street sections, especially within wealthy neighbourhoods, and because small municipalities very often do not provide addresses based on a selection of street sections.
In the third stage, adults (18 years and older) are drawn from a public register. In municipalities with fewer than 100,000 residents, individuals are selected by a systematic random selection process out of a list of all registered residents sequenced by family name. In cities with more than 100,000 residents, only addresses from the selected street sections are accepted.
3. Imputation of missing variables
Non-response can be detrimental to the validity of survey data, as it introduces bias into the estimates of the variables of interest. Unit non-response is treated by incorporating non-response weighting. In order to deal with item non-response, missing observations of all major variables will be imputed.
The PHF data are multiply imputed using the method of Rubin (1987). Multiple imputation of wealth survey data was pioneered by Arthur Kennickell at the Survey of Consumer Finance (Board of Governors of the Federal Reserve System). He allowed the PHFteam to use his routines, and Cristina Barceló (EFF, Banco de España) provided a well-documented version geared to an HFCS style survey. The PHF team is extremely grateful to them both.
The individual imputation value is simulated by drawing repeatedly from an estimate of the conditional distribution of the data. In contrast to single imputation, the variation resulting from multiply simulated outcomes can reconcile the deviation caused by the imperfect specification of the imputation model and therefore generate a more efficient estimate. Additionally, users may assess the imputation variation themselves. To facilitate the analysis, guidelines and codes for deriving the statistics out of imputation implicates are provided to the users of the PHF data.
To maintain consistency, careful data editing is needed before imputation. A logical tree is constructed to determine the order of imputation and ensure consistency. Obviously, only missing values due to non-response are to be imputed. Outliers are detected and kept from influencing the estimation of the conditional distributions. The imputation model is developed by examining the pattern of missing values and the economic correlation between variables. A time-consuming development process is needed for the routine in order to achieve convergence and to pass diverse consistency checks.
In general, simple means from survey data will be biased for various reasons: unequal selection probability caused by complex sample design, unit non-response, under-coverage or over-coverage and other design inadequacies. A comprehensive weighting mechanism is designed to compensate for these distortions and also to minimise the inefficiency induced by weighting. Furthermore, replicate weights are provided to allow variability measures of the estimates. Weights are constructed over multiple stages.
First, design weights are assigned to correct for unequal selection probabilities due to sample design. The underrepresented households will be given greater weights and overrepresented households will be given smaller weights. Second, non-response weights are computed to adjust for the impact of non-responding units. The third stage develops calibrated weights to ensure that weighted estimates accurately represent the population in many important dimensions not captured by the sample design. In order to match the overall marginal distributions, calibration will rely on external information such as the German Microcensus and financial accounts.
In some settings, weights may incorporate important sample design information that cannot be published for data protection reasons. Therefore, in order to enable researchers to compute correct standard errors, the HFCN has decided to provide replicate weights that incorporate all sample design information and calibrations for estimated weights. The routine for deriving sampling error out of these replicated samples and weights will be presented. As replicate weights result from a randomised simulation procedure, they will not always lead to the same results as the use of the original weights.
5. The PHF Field Phase
5.1 Wave 1
The PHF field phase for wave one consisted of two major parts, an initial field phase and a "re-launch", in which several fieldwork aspects were redesigned and improved.
The survey began in September 2010 and ended in July 2011. The initial phase lasted 25 weeks, the second an additional 20 weeks At the beginning of the initial field phase, 212 trained interviewers were employed. Of those, 132 interviewers were retained for the re-launch phase, which started in March 2011. For the whole study the gross sample size was 20,501 addresses, split almost evenly between phase 1 (10,258 addresses) and phase 2 (10,243 addresses).
All households were to be first contacted by a personal visit by the interviewer. In the second phase of the study, this was revised so that after a certain time, households that had not been reached by the interviewers in the field were contacted centrally by the survey agency's CATI interviewers to make appointments for the face-to-face interviews. The re-launch phase introduced a number of additional changes. Financial and non-financial incentives for interviewers were modified. For instance, interviewers were allowed to choose the areas they would work in. They were required to contact each "undecided" household at least once every week, and there was a bonus payment for the most successful interviewers. Furthermore, households in areas characterised by bad housing conditions were given additional incentive payments to boost participation among this subgroup.
Despite all these efforts, the survey yielded a response rate of 18.6%, which is relatively low and reflects the reluctance of German households to participate in surveys in general and also the sensitivity of the survey topic.
5.2 Wave 2
The second wave of the PHF took place between April 2014 and November 2014. The households that had participated in wave 1 were re-interviewed for the first time. Together with a refresher sample of 12,805 addresses those 3,202 households formed the gross sample for wave 2. Three hundred trained interviewers were employed. Similarly to the second phase of wave one, there were incentives both for the participating households and the interviewers. Also, many resources were spent on converting soft refusals, contacting households and following households in the panel between the two survey waves.
About 4,500 households participated in the survey, split between panel (2,200) and refresher (2,300) households, leading to a response rate of 70% for the panel, 19.6% for the refresher and 30% overall. The response rate for the refresher sample has slightly increased compared to wave one.
5.3 Wave 3
The third wave of the PHF started in April 2017 and the collection of the data ended in November 2017. Households that participated in wave one or two were re-interviewed. All together, the entire sample contains around 7,500 addresses of refresher households and 5,004 of households that belonged to the net sample of the previous waves. About 5,000 households actually participated in the third survey. Similarly to waves one and two, there were incentives both for the participating households and the interviewers.