稿件:econometrics666@126.com
計量社群里有Stata 16,各位群友可以自行下載使用。
Abstract: The wild bootstrap was originally developed for regression models with heteroskedasticity of unknown form. Over the past 30 years, it has been extended to models estimated by instrumental variables and maximum likelihood and to ones where the error terms are (perhaps multiway) clustered. Like bootstrap methods in general, the wild bootstrap is especially useful when conventional inference methods are unreliable because large-sample assumptions do not hold. For example, there may be few clusters, few treated clusters, or weak instruments. The package boottest can perform a wide variety of wild bootstrap tests, often at remarkable speed. It can also invert these tests to construct confidence sets. As a postestimation command, boottest works after linear estimation commands, including regress, cnsreg, ivregress, ivreg2, areg, and reghdfe, as well as many estimation commands based on maximum likelihood. Although it is designed to perform the wild cluster bootstrap, boottest can also perform the ordinary (nonclustered) version. Wrappers offer classical Wald, score/Lagrange multiplier, and Anderson–Rubin tests, optionally with (multiway) clustering. We review the main ideas of the wild cluster bootstrap, offer tips for use, explain why it is particularly amenable to computational optimization, state the syntax of boottest, artest, scoretest, and waldtest, and present several empirical examples.
摘要:Wild bootstrap起源并發(fā)展于解決回歸方程中未知形式的異方差問(wèn)題。在過(guò)去的30年間,它已經(jīng)被拓展并應用于工具變量法、最大似然法甚至是擾動(dòng)項(可能是多方向)聚集的模型中。正如普通的自舉抽樣法,Wild bootstrap對于解決普通的推斷方法中由于大樣本假設不成立或不可靠的問(wèn)題具有奇效。例如,如果存在弱集群、弱被處理集群、或者是弱工具變量的問(wèn)題。Boottest 程序包通常能夠以卓越的速度大量運行Wlid bootstrap測試,并通過(guò)轉置這些測試來(lái)構建置信集。不過(guò),需要注意的是,作為一個(gè)后驗命令,boottest通常在regress, cnsreg, ivregress, ivreg2, areg, reghdfe以及一些基于最大似然估計的回歸命令之后使用。雖然被設計于運行野性集群 bootstrap,bootttest依然能夠運行傳統(非集群)版本的自舉法。它提供了經(jīng)典的懷特檢驗,分數/拉格朗日乘數以及安德森-羅賓檢驗,可選(多路)集群等選項。在這篇文章中,我們回顧了野性集群bootstrap的主要思想,提供其使用技巧,解釋其尤其適用計算優(yōu)化問(wèn)題的原因,并說(shuō)明了boottest、artest、scoretest以及waldtest的相關(guān)語(yǔ)法,最后給出了幾個(gè)經(jīng)驗示例。
Abstract: In this article, I propose a new approach to language interfacing for statistical software by allowing automatic interprocess communication between R and Stata. I advocate interactive language interfacing in statistical software by automatizing data communication. I introduce the rcall package and provide examples of how the R language can be used interactively within Stata or embedded into Stata programs using the proposed approach to interfacing. Moreover, I discuss the pros and cons of object synchronization in language interfacing.
摘要:在這篇文章中,我提供了一個(gè)在R語(yǔ)言以及Stata之間允許自動(dòng)交互的統計軟件方法來(lái)解決兩者間的語(yǔ)言交互問(wèn)題。我主張通過(guò)以自動(dòng)數據通信的統計軟件來(lái)實(shí)現語(yǔ)言交互。我介紹了rcall程序包并提供了幾個(gè)利用我所提出的交互方法來(lái)實(shí)現R語(yǔ)言與Stata交互或者嵌入Stata程序的案例。進(jìn)一步,我討論了語(yǔ)言交互中對象同步的利弊。
Abstract: In this article, I underscore the importance of syntax coloring in teaching statistics. I also introduce the statax package, which includes JavaScript and LATEX programs for highlighting Stata code in HTML and LATEX documents. Furthermore, I provide examples showing how to implement this package for developing educational materials on the web or for a classroom handout.
摘要:在這篇文章中,我強調了語(yǔ)法高亮顯示在統計學(xué)教學(xué)中的重要性,我也介紹了Statax程序包,其中包括JavaScript和LATEX程序,用于在HTML和LATEX文檔中突出顯示Stata格式代碼。另外,我還提供了幾個(gè)例子用于展示如何使用這些程序包來(lái)開(kāi)放網(wǎng)絡(luò )教學(xué)資源和課堂講義。
Abstract: In this article, I introduce a new command, nehurdle, that collects maximum likelihood estimators for linear, exponential, homoskedastic, and heteroskedastic tobit; truncated hurdle; and type II tobit models that involve explained variables with corner solutions. I review what a corner solution is as well as the assumptions of the mentioned models.
摘要:在這篇文章中,我推出了一個(gè)新命令 nehurdle。該命令可以用于收集線(xiàn)性,指數,同方差,異方差tobit,截斷柵欄的最大似然估計值,以及II類(lèi)型tobit模型的估計值。我回顧了什么是角點(diǎn)解以及估計上述模型的主要方法。
Abstract: The OECD Programme for the International Assessment of Adult Competencies (PIAAC) is currently the only international survey of adult skills. It provides rich data on skills, work and life situations, earnings, and attitudes. To ensure representativeness and high reliability, the study is based on a complex survey design and advanced statistical methods. To obtain correct results from publicly available microdata, one must use special methods that are often too advanced for less experienced researchers. In this article, we present piaactools—a package of three commands that facilitate analysis with PIAAC data. The command piaacdes calculates basic statistics,piaactab computes frequencies of adults at each proficiency level, and piaacreg allows for the use of several regression models with PIAAC data. Output is saved as HTML files that can be opened in most spreadsheets and as Stata matrices that can be further processed in Stata. We also explain how to use these commands and provide examples that can be easily modified for use with different models and variables.
摘要:OECE所組織的國際成人能力評估計劃(PIAAC)是目前僅有的對成人能力評估的國際調查。它提供了大量關(guān)于技能、工作、生活狀況、收入和態(tài)度的相關(guān)數據。為了確保研究的代表性以及高可靠性,本研究采取了復雜的調查設計和先進(jìn)的統計方法。為了確保能從公開(kāi)微觀(guān)數據中獲得正確的結果,必須使用特別的方法,雖然對于經(jīng)驗不足的人員而言,這些方法未免過(guò)于先進(jìn)。在本文,我們介紹了piaactools—一個(gè)包含了三個(gè)命令的程序包,其能夠便利分析PIAAC中的數據。Piaacdes命令用于計算基本統計分析,piaactab命令用于計算成年人熟練水平的頻率,piaacreg則提供使用多種PIAAC數據類(lèi)型的回歸模型。輸出被保存為可以在大多數電子表格中打開(kāi)的HTML格式文件以及可以在Stata中進(jìn)一步處理的Stata矩陣。最后,我們也解釋了如何使用這些命令并提供了幾個(gè)案例幫助我們了解如何輕易修改不同模型和變量。
Abstract: In this article, I present the lsemantica command, which implements latent semantic analysis in Stata. Latent semantic analysis is a machine learning algorithm for word and text similarity comparison and uses truncated singular value decomposition to derive the hidden semantic relationships between words and texts.lsemantica provides a simple command for latent semantic analysis as well as complementary commands for text similarity comparison.
摘要:在這篇文章中,我提出了Isemantica命令,該命令在Stata中實(shí)現了潛在語(yǔ)義分析。潛在語(yǔ)義分析是一種用于單詞和文本相似性比較的機器學(xué)習算法,它使用截斷的奇異值分解來(lái)得出單詞和純文本之間的隱藏語(yǔ)義關(guān)系。Isemantica命令也提供了一個(gè)用于潛在語(yǔ)義分析的簡(jiǎn)單命令以及用于比較文本相似性比較的命令。
Abstract: Kolenikov (2014, Stata Journal 14: 22–59) introduced the package ipfraking for iterative proportional fitting (raking) weight-calibration procedures for complex survey designs. In this article, I briefly describe the original package and updates to the core program and document additional programs that are used to support the process of creating survey weights in the author’s production code.
摘要:Kolenikov(2014)介紹了用于復雜測量設計的迭代比例擬合(傾斜)效準的ipfraking程序。在本文中,我簡(jiǎn)要介紹了原始程序包并更新其核程序。另外,我還記錄了在作者生產(chǎn)代碼過(guò)程中可用于創(chuàng )建調查權重的額外程序。
Abstract: Survival functions are a common visualization of predictions from the Cox model. However, neither Stata’sstcurve command nor the community contributed scurve tvc command allows one to estimate confidence intervals. In this article, I discuss how bootstrap confidence intervals can be formed for covariate-adjusted survival functions in the Cox model. The new bsurvci command automates this procedure and allows users to visualize the results. bsurvci enables one to estimate uncertainty around survival functions estimated from Cox models with time-varying coefficients, a capability that was not previously available in Stata. Furthermore, it provides Stata users with an additional option for survival estimates from Cox models with proportional hazards by allowing them to choose between bootstrap confidence intervals using bsurvci and asymptotic confidence intervals from an existing community-contributed command, survci. Because asymptotic confidence intervals make distributional assumptions when constructing confidence intervals, the bootstrap procedure proposed in this article provides a nonparametric alternative.
摘要:生存函數是Cox模型預測可視化的一個(gè)普通的功能。然而,無(wú)論是Stata內置的sstcurve或者是社區貢獻的 scurve tvc功能都不允許估計置信區間。在本篇文章中,我將討論如何在Cox模型中為協(xié)變量調整后的生存函數形成自舉置信區間。這個(gè)新的bsurvci命令能夠自動(dòng)完成此過(guò)程并允許用戶(hù)將結果可視化。Bsurvci使人們能夠估計生存函數周?chē)牟淮_定性從具有時(shí)變系數的Cox模型,這是Stata以前沒(méi)有的功能。此外,它還允許Stata用戶(hù)使用附加選項進(jìn)行生存估計具有比例風(fēng)險的Cox模型,方法是允許他們使用bsurvci進(jìn)行引導置信區間與現有社區貢獻的命令survci的漸近置信區間之間進(jìn)行選擇。由于漸近置信區間在構造置信區間時(shí)會(huì )做出分布假設,因此本文提出的自舉程序提供了一種非參數替代方案。
Abstract: Technical analysis is an important part of financial industry, research, and teaching. The methodology has two parts: i) calculation of the individual tools and ii) visual representations. In this article, I provide a community-contributed command, candlechart, to draw the most common technical analysis charts. My intent is to draw these charts similarly to industry examples. The popular candle price chart is combined with charts for volume, moving-average convergence divergence, relative strength index, and Bollinger bands.
摘要:技術(shù)分析是金融業(yè),研究和教學(xué)的重要組成部分。該方法包括兩個(gè)部分:i)各個(gè)工具的計算和ii)視覺(jué)表示。在本文中,我提供了一個(gè)社區貢獻的命令Candlechart,以繪制最常見(jiàn)的技術(shù)分析圖。我的意圖是繪制類(lèi)似于行業(yè)示例的這些圖表。將流行的蠟燭價(jià)格圖表與體積,移動(dòng)平均收斂散度,相對強度指數和布林帶的圖表結合在一起。
Abstract: In this article, we introduce two commands, rdpow and rdsampsi, that conduct power calculations and survey sample selection when using local polynomial estimation and inference methods in regression-discontinuity designs. rdpow conducts power calculations using modern robust bias-corrected local polynomial inference procedures and allows for new hypothetical sample sizes and bandwidth selections, among other features.rdsampsi uses power calculations to compute the minimum sample size required to achieve a desired level of power, given estimated or user-supplied bandwidths, biases, and variances. Together, these commands are useful when devising new experiments or surveys in regression-discontinuity designs, which will later be analyzed using modern local polynomial techniques for estimation, inference, and falsification. Because our commands use the communitycontributed (and R) package rdrobust for the underlying bandwidths, biases, and variances estimation, all the options currently available in rdrobust can also be used for power calculations and sample-size selection, including preintervention covariate adjustment, clustered sampling, and many bandwidth selectors. Finally, we also provide companion R functions with the same syntax and capabilities.
摘要:在本文中,我們介紹了兩個(gè)命令rdpow和rdsampsi,它們在斷點(diǎn)回歸設計中使用局部多項式估計和推斷方法時(shí)進(jìn)行功效計算和調查樣本選擇。rdpow使用現代的經(jīng)過(guò)穩健性偏差校正的局部多項式推論程序進(jìn)行功率計算,并允許新的假設樣本大小和帶寬選擇以及其他功能。rdsampsi在給定估計的或用戶(hù)提供的帶寬,偏差和方差的情況下,使用功率計算來(lái)計算達到所需功率水平所需的最小樣本量??傊?,這些命令在斷點(diǎn)回歸設計中設計新的實(shí)驗或調查時(shí)非常有用,稍后將使用現代局部多項式技術(shù)對其進(jìn)行分析,以進(jìn)行估計,推斷和偽造。由于我們的命令使用社區貢獻(和R)軟件包rdrobust進(jìn)行基礎帶寬,偏差和方差估計,因此rdrobust中當前可用的所有選項還可用于功效計算和樣本大小選擇,包括干預前協(xié)變量調整,聚類(lèi)樣本和許多帶寬選擇器。最后,我們還提供了具有相同語(yǔ)法和功能的配套R函數。
Abstract: Indicator or dummy variables record whether some condition is true or false in each observation by a value of 1 or 0. Values may also be missing if truth or falsity is not known, and that fact should be flagged. Such indicators may be created on the fly by using factor-variable notation. tabulate also offers one method for automating the generation of indicators. In this column, we discuss in detail how otherwise to best generate such variables directly, with comments here and there on what not to do.
摘要:指示符或偽變量以1或0的值記錄每次觀(guān)察中某個(gè)條件是真還是假。如果不知道真偽,則該值也可能會(huì )丟失,并且應該標記該事實(shí)??梢允褂靡蜃幼兞勘硎痉▌?dòng)態(tài)創(chuàng )建此類(lèi)指標。表格還提供了一種自動(dòng)生成指標的方法。在本專(zhuān)欄中,我們將詳細討論如何最好地直接生成此類(lèi)變量,并在此處和該處的注釋中說(shuō)明了不應當做什么。
Abstract: In this article, we introduce the qmodel command, which fits parametric models for the conditional quantile function of an outcome variable given covariates. Ordinary quantile regression, implemented in the qregcommand, is a popular, simple type of parametric quantile model. It is widely used but known to yield erratic estimates that often lead to uncertain inferences. Parametric quantile models overcome these limitations and extend modeling of conditional quantile functions beyond ordinary quantile regression. These models are flexible and efficient. qmodel can estimate virtually any possible linear or nonlinear parametric model because it allows the user to specify any combination of qmodel-specific built-in functions, standard mathematical and statistical functions, and substitutable expressions. We illustrate the potential of parametric quantile models and the use of the qmodel command and its postestimation commands through realand simulated-data examples that commonly arise in epidemiological and pharmacological research. In addition, this article may give insight into the close connection that exists between quantile functions and the true mathematical laws that generate data.
摘要:在本文中,我們介紹了qmodel命令,該命令適合給定協(xié)變量的結果變量的條件分位數函數的參數模型。在qreg命令中實(shí)現的普通分位數回歸是一種流行的簡(jiǎn)單類(lèi)型的參數分位數模型。它被廣泛使用,但已知會(huì )產(chǎn)生不穩定的估計,這常常導致不確定的推斷結果。參數分位數模型克服了這些限制,并擴展了條件分位數函數的建模范圍,超越了普通分位數回歸。這些模型靈活高效。qmodel實(shí)際上可以估計任何可能的線(xiàn)性或非線(xiàn)性參數模型,因為它允許用戶(hù)指定qmodel的任意組合特定的內置函數,標準數學(xué)和統計函數以及可替換的表達式。我們通過(guò)流行病學(xué)和藥理學(xué)研究中經(jīng)常出現的真實(shí)和模擬數據示例來(lái)說(shuō)明參數分位數模型的潛力以及qmodel命令及其后估計命令的用法。此外,本文還可以深入了解分位數函數與生成數據的真實(shí)數學(xué)定律之間存在的緊密聯(lián)系。
Abstract: Markov chain models and finite mixture models have been widely applied in various strands of the academic literature. Several studies analyzing dynamic processes have combined both modeling approaches to account for unobserved heterogeneity within a population. In this article, we describe mixmcm, a community-contributed command that fits the general class of mixed Markov chain models, accounting for the possibility of both entries into and exits from the population. To account for the possibility of incomplete information within the data (that is, unobserved heterogeneity), the model is fit with maximum likelihood using the expectation-maximization algorithm. mixmcm enables users to fit the mixed Markov chain models parametrically or semiparametrically, depending on the specifications chosen for the transition probabilities and the mixing distribution. mixmcm also allows for endogenous identification of the optimal number of homogeneous chains, that is, unobserved types or “components”. We illustrate mixmcm‘s usefulness through three examples analyzing farm dynamics using an unbalanced panel of commercial French farms.
摘要:馬爾可夫鏈模型和有限混合模型已廣泛應用于學(xué)術(shù)文獻的各個(gè)方面。一些分析動(dòng)態(tài)過(guò)程的研究結合了兩種建模方法,以解決總體中未觀(guān)察到的異質(zhì)性。在本文中,我們描述mixmcm,這是一個(gè)由社區貢獻的命令,適合混合Markov鏈模型的一般類(lèi)別,同時(shí)說(shuō)明了人口進(jìn)入和退出的可能性。為了說(shuō)明數據中信息不完整的可能性(即未觀(guān)察到的異質(zhì)性),該模型為最大似然擬合模型配備期望最大化算法。mixmcm允許用戶(hù)根據過(guò)渡概率和混合分布選擇的規范,以參量或半參量擬合混合馬爾可夫鏈模型。mixmcm還允許內源性識別均質(zhì)鏈的最佳數量,即未觀(guān)察到的類(lèi)型或“組分”。我們通過(guò)使用不平衡的法國商業(yè)農場(chǎng)小組分析農場(chǎng)動(dòng)態(tài)的三個(gè)示例來(lái)說(shuō)明mixmcm的有用性。
Abstract: In this article, we present a new command, xtspj, that corrects for incidental parameter bias in panel-data models with fixed effects. The correction removes the first-order bias term of the maximum likelihood estimate using the split-panel jackknife method. Two variants are implemented: the jackknifed maximum-likelihood estimate and the jackknifed log-likelihood function (with corresponding maximizer). The model may be nonlinear or dynamic, and the covariates may be predetermined instead of strictly exogenous. xtspjimplements the split-panel jackknife for fixed-effects versions of linear, probit, logit, Poisson, exponential, gamma, Weibull, and negbin2 regressions. It also accommodates other models if the user specifies the log-likelihood function (and, possibly but not necessarily, the score function and the Hessian). xtspj is fast and memory efficient, and it allows large datasets. The data may be unbalanced. xtspj can also be used to compute uncorrected maximum-likelihood estimates of fixed-effects models for which no other xt (see [XT]xt) command exists.
摘要:在本文中,我們提供了一個(gè)新命令xtspj,該命令可以修正具有固定效果的面板數據模型中的偶然參數偏差。該校正使用分屏折刀法去除了最大似然估計的一階偏差項。實(shí)現了兩種變體:截斷的最大似然估計和截斷的對數似然函數(帶有相應的最大化器)。該模型可以是非線(xiàn)性或動(dòng)態(tài)的,并且協(xié)變量可以是預定的,而不是嚴格外生的。xtspj為線(xiàn)性,概率,對數,泊松,指數,伽瑪,威布爾和negbin2回歸的固定效果版本實(shí)現拆分面板折刀。如果用戶(hù)指定對數似然函數(并且可能但不一定是得分函數和Hessian),它也可以容納其他模型。xtspj快速且內存高效,并且允許大型數據集。而且數據允許不平衡。xtspj還可以用于計算不存在其他xt(請參閱[XT] xt)命令的固定效果模型的未校正最大似然估計。
Abstract: Researchers who model fractional dependent variables often need to consider whether their data were generated by a two-part process. Two-part models are ideal for modeling two-part processes because they allow us to model the participation and magnitude decisions separately. While community-contributed commands currently facilitate estimation of two-part models, no specialized command exists for fitting two-part models with process dependency. In this article, I describe generalized two-part fractional regression, which allows for dependency between models’ parts. I show how this model can be fit using the community-contributed cmp command (Roodman, 2011, Stata Journal 11: 159–206). I use a data example on the financial leverage of firms to illustrate how cmp can be used to fit generalized two-part fractional regression. Furthermore, I show how to obtain predicted values of the fractional dependent variable and marginal effects that are useful for model interpretation. Finally, I show how to compute model fit statistics and perform the RESET test, which are useful for model evaluation.
摘要:對分數因變量建模的研究人員經(jīng)常需要考慮其數據是否由兩部分過(guò)程生成。兩部分模型非常適合于對兩部分過(guò)程進(jìn)行建模,因為它們使我們能夠分別對參與決策和規模決策進(jìn)行建模。雖然社區提供的命令當前有助于兩部分模型的估計,但是不存在用于擬合具有過(guò)程依賴(lài)性的兩部分模型的專(zhuān)門(mén)命令。在本文中,我描述了廣義的兩部分式分數回歸,它允許模型各部分之間具有依賴(lài)性。我展示了如何使用社區提供的cmp命令來(lái)擬合該模型(Roodman,2011年,Stata Journal 11:159-206)。我使用有關(guān)公司財務(wù)杠桿的數據示例來(lái)說(shuō)明CMP如何可用于擬合廣義兩部分分數回歸。此外,我展示了如何獲得分數因變量的預測值和對模型解釋有用的邊際效應。最后,我展示了如何計算模型擬合統計信息并執行RESET測試,這對于模型評估非常有用。
Abstract: In this article, I review recent developments of the item-count technique (also known as the unmatched-count or list-experiment technique) and introduce a new package, kict, for statistical analysis of the item-count data. This package contains four commands: kict deff performs a diagnostic test to detect the violation of an assumption underlying the item-count technique. kict ls and kict ml perform least-squares estimation and maximum likelihood estimation, respectively. Each encompasses a number of estimators, offering great flexibility for data analysis. kict pfci is a postestimation command for producing confidence intervals with better coverage based on profile likelihood. The development of the item-count technique is still ongoing. I will continue to update the kict package accordingly.
摘要:在本文中,我回顧了項目計數技術(shù)(也稱(chēng)為不匹配計數或列表實(shí)驗技術(shù))的最新發(fā)展,并介紹了一個(gè)新軟件包kict,用于對項目計數數據進(jìn)行統計分析。該軟件包包含四個(gè)命令:kict deff執行診斷測試,以檢測違反項目計數技術(shù)基礎的假設的情況。kict ls和kict ml分別執行最小二乘估計和最大似然估計。每個(gè)都包含許多估算器,為數據分析提供了極大的靈活性?;妗て辗剖呛蠊烙嬅?,用于基于輪廓似然來(lái)產(chǎn)生具有更好覆蓋率的置信區間。項目計數技術(shù)的開(kāi)發(fā)仍在進(jìn)行中。我將繼續相應地更新kict軟件包。
Abstract: Differences-in-differences evaluates the effect of a treatment. In its basic version, a “control group” is untreated at two dates, whereas a “treatment group” becomes fully treated at the second date. However, in many applications of this method, the treatment rate increases more only in the treatment group. In such fuzzy designs, de Chaisemartin and D’Haultf?uille (2018b, Review of Economic Studies 85: 999–1028) propose various estimands that identify local average and quantile treatment effects under different assumptions. They also propose estimands that can be used in applications with a nonbinary treatment, multiple periods, and groups and covariates. In this article, we present the command fuzzydid, which computes the various corresponding estimators. We illustrate the use of the command by revisiting Gentzkow, Shapiro, and Sinkinson (2011, American Economic Review 101: 2980–3018).
摘要:倍差法可以評估治療的效果。在其基本版本中,“對照組”在兩個(gè)日期沒(méi)有得到治療,而“治療組”在第二個(gè)日期得到了充分治療。但是,在該方法的許多應用中,僅在治療組中治療率才會(huì )增加。在這樣的模糊設計中,de Chaisemartin和D'Haultf?uille(2018b,經(jīng)濟研究評論 85:999-1028)提出了各種估計數,這些估計數在不同的假設下確定了本地平均數和分位數的處理效果。他們還提出了可用于非二進(jìn)制處理,多個(gè)期間以及組和協(xié)變量的應用中的估計值。在本文中,我們介紹了命令Fuzzydid,用于計算各種相應的估算器。我們通過(guò)重新審視Gentzkow,Shapiro和Sinkinson(2011年,美國經(jīng)濟評論 101:2980-3018)來(lái)說(shuō)明命令的使用。
Abstract: Student grade processing using Stata is more reliable than methods like spreadsheets and saves the user timeh, especially when courses are repeated. In this article, I introduce functions that automate some useful grade calculations: the functions curve grades according to combinations of a target grade mean, maximum, standard deviation, and percentile cutoff; convert between numerical grades and letter grades; and convert between 0–100 grades and 0–4 grades (grade point average). The functions can also convert between other grading scales, such as those used in other countries.
摘要:使用Stata進(jìn)行學(xué)生成績(jì)處理比電子表格等方法更可靠,并且可以節省用戶(hù)時(shí)間,尤其當課程重復時(shí)。在本文中,我介紹了一些可以自動(dòng)執行一些有用的坡度計算的函數:這些函數根據目標坡度平均值,最大值,標準偏差和百分位數截止值的組合來(lái)彎曲坡度;在數字等級和字母等級之間轉換;并在0-100等級和0-4等級之間轉換(平均績(jì)點(diǎn))。這些功能還可以在其他等級量表之間進(jìn)行轉換,例如在其他國家/地區中使用的那些。
Abstract: A major challenge of outcomes research is measuring hospital performance using readily available administrative data. When the outcome measure is mortality or morbidity, rates are adjusted to account for preexisting conditions that may confound their assessment. However, the concept of “risk-adjusted” outcomes is frequently misunderstood. In this article, we try to clarify things, and we describe Stata tools for appropriately calculating and displaying risk-standardized outcome measures. We offer practical guidance and illustrate the application of these tools to an example based on real data (30-day mortality following acute myocardial infarction in Latvia).
摘要:結果研究的主要挑戰是使用現成的管理數據來(lái)衡量醫院的績(jì)效。當結果指標是死亡率或發(fā)病率時(shí),調整費率以考慮可能混淆其評估的既往疾病。但是,“風(fēng)險調整”結果的概念經(jīng)常被誤解。在本文中,我們試圖澄清問(wèn)題,并描述用于適當計算和顯示風(fēng)險標準化結果度量的Stata工具。我們提供了實(shí)用指南,并以實(shí)際數據(拉脫維亞急性心肌梗死后30天死亡率)為例,說(shuō)明了這些工具的應用。
第二卷
Abstract: In this article, we present new commands for modeling count data using marginalized zero-inflated distributions. While we mainly focus on presenting new commands for estimating count data, we also present examples that illustrate some of these new commands.
摘要:在本文中,我們提供了使用邊際化零膨脹分布對計數數據進(jìn)行建模的新命令。盡管我們主要致力于提供用于估算計數數據的新命令,但我們還提供了一些示例,以說(shuō)明其中的一些新命令。
Abstract: In August 2017, the National Center for Health Statistics (NCHS), part of the U.S. Federal Statistical System, published new standards for determining the reliability of proportions estimated using their data. These standards require one to take the Korn–Graubard confidence interval (CI), CI widths, sample size, and degrees of freedom to assess reliability of a proportion and determine whether it can be presented. The assessment itself involves determining whether several conditions are met. In this article, I present kg_nchs, a postestimation command that is used following svy: proportion. It allows Stata users to a) calculate the Korn–Graubard CI and associated statistics used in applying the NCHS presentation standards for proportions and b) display a series of three dichotomous flags that show whether the standards are met. I provide empirical examples to show how kg_nchs can be used to easily apply the standards and prevent Stata users from needing to perform manual calculations. While developed for NCHS survey data, this command can also be used with data that stem from any survey with a complex sample design.
摘要:2017年8月,美國聯(lián)邦統計系統一部分的國家衛生統計中心(NCHS)發(fā)布了新標準,用于確定使用其數據估算的比例的可靠性。這些標準要求人們采用Korn-Graubard置信區間(CI),CI寬度,樣本大小和自由度來(lái)評估比例的可靠性并確定是否可以提出。評估本身包括確定是否滿(mǎn)足幾個(gè)條件。在本文中,我提出的kg_nchs命令,一個(gè)用于SVY:比例命令之后的 postestimation命令。它允許Stata用戶(hù)使用:a)計算適用于比例的NCHS表示標準時(shí)使用的Korn-Graubard CI和相關(guān)統計信息,以及b)顯示一系列三個(gè)二分旗,顯示是否符合標準。我提供了一些經(jīng)驗示例,以說(shuō)明如何使用kg_nchs輕松應用標準并防止Stata用戶(hù)進(jìn)行手動(dòng)計算。當為NCHS調查數據開(kāi)發(fā)時(shí),此命令也可以與源自具有復雜樣本設計的任何調查的數據一起使用。
Abstract: Statistical methods that quantify the discourse about causal inferences in terms of possible sources of biases are becoming increasingly important to many social-science fields such as public policy, sociology, and education. These methods are also known as “robustness or sensitivity analyses”. A series of recent works (Frank [2000, Sociological Methods and Research 29: 147–194]; Pan and Frank [2003, Journal of Educational and Behavioral Statistics 28: 315– 337]; Frank and Min [2007, Sociological Methodology 37: 349–392]; and Frank et al. [2013, Educational Evaluation and Policy Analysis 35: 437–460]) on robustness analysis extends earlier methods. We implement these recent developments in Stata. In particular, we provide commands to quantify the percent bias necessary to invalidate an inference from a Rubin causal model framework and the robustness of causal inferences in terms of correlations associated with unobserved variables.
摘要:根據偏差的可能來(lái)源量化因果推理的論述的統計方法對于許多社會(huì )科學(xué)領(lǐng)域(例如公共政策,社會(huì )學(xué)和教育)變得越來(lái)越重要。這些方法也稱(chēng)為“穩健性或敏感性分析”。一系列最新著(zhù)作(Frank [2000,社會(huì )學(xué)方法和研究 29:147-194];Pan和Frank [2003,教育與行為統計雜志 28:315-337];Frank和Min [2007,社會(huì )學(xué)方法 37:349–392];以及Frank等[2013,教育評估和政策分析35:437–460])的魯棒性分析擴展了早期方法。我們在Stata實(shí)現這些最新發(fā)展。特別是,我們提供了一些命令,用于量化使來(lái)自魯賓因果模型框架的推理無(wú)效所需的百分比偏差以及因果推理的魯棒性(與與未觀(guān)察變量相關(guān)的相關(guān)性)。
Abstract: In this article, we describe tvdiff, a community-contributed command that implements a generalization of the difference-in-differences estimator to the case of binary time-varying treatment with pre- and postintervention periods. tvdiff is flexible and can accommodate many actual situations, enabling the user to specify the number of pre- and postintervention periods and a graphical representation of the estimated coefficients. In addition, tvdiff provides two distinct tests for the necessary condition of the identification of causal effects, namely, two tests for the so-called parallel-trend assumption. tvdiff is intended to simplify applied works on program evaluation and causal inference when longitudinal data are available.
摘要:在本文中,我們描述了tvdiff,這是一個(gè)由社區貢獻的命令,用于對干預前和干預后的二元時(shí)變處理情況實(shí)現雙重差分估計值的一般化。tvdiff靈活,可以適應許多實(shí)際情況,使用戶(hù)可以指定干預前后的次數以及估算系數的圖形表示。此外,tvdiff為確定因果關(guān)系的必要條件提供了兩種不同的檢驗,即針對所謂的平行趨勢假設的兩項檢驗。tvdiff旨在簡(jiǎn)化縱向數據可用時(shí)在程序評估和因果推理方面的應用工作。
Abstract: A recurring problem in statistics is estimating and visualizing nonlinear dependency between an effect and an effect modifier. One approach to handle this is polynomial regressions of some order. However, polynomials are known for fitting well only in limited ranges. In this article, I present a simple approach for estimating the effect as a contrast at selected values of the effect modifier. I implement this approach using the flexible restricted cubic splines for the point estimation in a new simple command, emc. I compare the approach with other classical approaches addressing the problem.
摘要:統計中經(jīng)常出現的問(wèn)題是估計和可視化效果與效果修改器之間的非線(xiàn)性相關(guān)性。處理此問(wèn)題的一種方法是某種程度的多項式回歸。但是,多項式僅在有限的范圍內擬合良好。在本文中,我提出了一種簡(jiǎn)單的方法,用于以效果修改器的選定值作為對比來(lái)估計效果。我在新的簡(jiǎn)單命令emc中使用靈活的受限三次樣條進(jìn)行點(diǎn)估計來(lái)實(shí)現此方法。我將這種方法與其他解決該問(wèn)題的經(jīng)典方法進(jìn)行了比較。
Abstract: We develop a command, weaktsiv, for two-sample instrumentalvariables regression models with one endogenous regressor and potentially weak instruments. weaktsiv includes the classic two-sample two-stage least-squares estimator whose inference is valid only under the assumption of strong instruments. It also includes statistical tests and confidence sets with correct size and coverage probabilities even when the instruments are weak.
摘要:我們?yōu)閹в幸粋€(gè)內生回歸變量和潛在弱函數的兩樣本工具變量回歸模型開(kāi)發(fā)了一個(gè)命令,weaktsiv。weaktsiv包括經(jīng)典的兩樣本兩階段最小二乘估計器,其推論僅在強工具假設下才有效。它還包括具有正確大小和覆蓋率概率的統計檢驗和置信度集,即使工具較弱也是如此。
Abstract: An added-variable plot is an effective way to show the correlation between an independent variable and a dependent variable conditional on other independent variables. For multivariate estimation, a simple scatterplot showing x versus y is not adequate to show the partial correlation of x with y, because it ignores the impact of the other covariates. Added-variable plots are especially effective for showing the correlation of a dummy x variable with y because the dummy variable conditional on other covariates becomes a continuous variable, making the relationship easier to visualize.Added-variable plots are also useful for spotting influential outliers in the data that affect the estimated regression parameters. Stata provides added-variable plots after ordinary least-squares regressions with theavplot command. I present a new command, avciplot, that adds a confidence interval and other options to theavplot command.
摘要:加變量圖是顯示自變量與以其他自變量為條件的因變量之間的相關(guān)性的有效方法。對于多變量估計,一個(gè)簡(jiǎn)單的散點(diǎn)圖表示X與y不充分顯示的部分相關(guān)X與?,因為它忽略了其他協(xié)變量的影響。附加變量圖對于顯示虛擬x變量與y的相關(guān)性特別有效,因為以其他協(xié)變量為條件的虛擬變量變?yōu)檫B續變量,使關(guān)系更易于可視化。加變量圖還可用于發(fā)現影響估計回歸參數的數據中有影響的離群值。在使用avplot命令進(jìn)行普通最小二乘回歸之后,Stata提供了可變變量圖。我展示了一個(gè)新命令avciplot,它向avplot命令添加了置信區間和其他選項。
Abstract: Receiver operating characteristic (ROC) analysis is used for comparing predictive models in both model selection and model evaluation. ROC analysis is often applied in clinical medicine and social science to assess the tradeoff between model sensitivity and specificity. After fitting a binary logistic or probit regression model with a set of independent variables, the predictive performance of this set of variables can be assessed by the area under the curve (AUC) from an ROC curve. An important aspect of predictive modeling (regardless of model type) is the ability of a model to generalize to new cases. Evaluating the predictive performance (AUC) of a set of independent variables using all cases from the original analysis sample often results in an overly optimistic estimate of predictive performance. One can use K-fold cross-validation to generate a more realistic estimate of predictive performance in situations with a small number of observations. AUC is estimated iteratively for k samples (the “test” samples) that are independent of the sample used to predict the dependent variable (the “training” sample). cvauroc implements k-fold cross-validation for the AUC for a binary outcome after fitting a logit or probit regression model, averaging the AUCs corresponding to each fold, and bootstrapping the cross-validated AUC to obtain statistical inference and 95% confidence intervals. Furthermore, cvauroc optionally provides the cross-validated fitted probabilities for the dependent variable or outcome, contained in a new variable named fit; the sensitivity and specificity for each of the levels of the predicted outcome, contained in two new variables named _senand _spe; and the plot of the mean cross-validated AUC and k-fold ROC curves.
摘要:接收器工作特性(ROC)分析用于在模型選擇和模型評估中比較預測模型。ROC分析通常用于臨床醫學(xué)和社會(huì )科學(xué)中,以評估模型敏感性和特異性之間的權衡。在用一組自變量擬合二進(jìn)制logistic或Probit回歸模型后,可以通過(guò)ROC曲線(xiàn)的曲線(xiàn)下面積(AUC)來(lái)評估這組變量的預測性能。預測建模的一個(gè)重要方面(與模型類(lèi)型無(wú)關(guān))是模型能夠將其推廣到新案例的能力。使用原始分析樣本中的所有情況評估一組自變量的預測性能(AUC)通常會(huì )導致對預測性能的評估過(guò)于樂(lè )觀(guān)。一個(gè)可以用K折疊式交叉驗證可在只有少量觀(guān)察值的情況下生成更實(shí)際的預測性能估計。對于k個(gè)樣本(“測試”樣本)進(jìn)行迭代估計AUC ,這與用于預測因變量的樣本(“訓練”樣本)無(wú)關(guān)。在擬合logit或概率回歸模型后,cvauroc對二進(jìn)制結果的AUC進(jìn)行k倍交叉驗證,對與每個(gè)折疊相對應的AUC求平均值,并自舉交叉驗證的AUC以獲得統計推斷和95%置信區間。此外,cvauroc可選地為包含在名為fit的新變量中的因變量或結果提供交叉驗證的擬合概率。; 對每個(gè)預期結果水平的敏感性和特異性,包含在兩個(gè)新變量sen和_spe中;以及交叉驗證的平均AUC和k倍ROC曲線(xiàn)的圖。
Abstract: We introduce a command, fayherriot, that implements the Fay– Herriot model (Fay and Herriot, 1979, Journal of the American Statistical Association 74: 269–277), which is a small-area estimation technique (Rao and Molina, 2015, Small Area Estimation), in Stata. The Fay–Herriot model improves the precision of area-level direct estimates using area-level covariates. It belongs to the class of linear mixed models with normally distributed error terms. The fayherriot command encompasses options to a) produce out-of-sample predictions, b) adjust nonpositive random-effects variance estimates, and c) deal with the violation of model assumptions.
摘要:我們引入了一個(gè)命令fayherriot,該命令實(shí)現了Fay–Herriot模型(Fay和Herriot,1979年,美國統計協(xié)會(huì )雜志74:269–277),這是一種小區域估計技術(shù)(Rao和Molina,2015年,《小》面積估算),位于Stata。Fay-Herriot模型使用區域級協(xié)變量提高了區域級直接估計的精度。它屬于具有正態(tài)分布誤差項的線(xiàn)性混合模型。fayherriot命令包含以下選項:a)產(chǎn)生樣本外預測,b)調整非正向隨機效應方差估計,以及c)處理違反模型假設的情況。
Abstract: In this article, I describe a community-contributed command, intcount, that fits one of several regression models for count data observed in interval form. The models available are Poisson, negative binomial, and binomial, and they can be fit in standard or zero-inflated form. I illustrate the command with an application to analysis of data from the UK Understanding Society survey on the demand for healthcare services.
摘要:在本文中,我描述了一個(gè)由社區貢獻的命令intcount,它適合幾種以間隔形式觀(guān)察到的計數數據的回歸模型之一??捎玫哪P蜑镻oisson,負二項式和二項式,它們可以標準或零膨脹形式擬合。我用一個(gè)應用程序來(lái)說(shuō)明該命令,該應用程序用于分析英國理解協(xié)會(huì )關(guān)于醫療保健服務(wù)需求的調查數據。
Abstract: The parallel package allows parallel processing of tasks that are not interdependent. This allows all flavors of Stata to take advantage of multiprocessor machines. Even Stata/MP users can benefit because many community-contributed programs are not automatically parallelized but could be under our framework.
摘要:并行程序包允許并行處理不相互依賴(lài)的任務(wù)。這使Stata的所有形式都可以利用多處理器機器。甚至Stata / MP用戶(hù)也可以從中受益,因為許多社區貢獻的程序不會(huì )自動(dòng)并行化,但可以在我們的框架下進(jìn)行。
Abstract: In this article, we develop a command, xthenreg, that implements the first-differenced generalized method of moments estimation of the dynamic panel threshold model that Seo and Shin (2016, Journal of Econometrics 195: 169–186) proposed. Furthermore, we derive the asymptotic variance formula for a kink-constrained generalized method of moments estimator of the dynamic threshold model and provide an estimation algorithm. We also propose a fast bootstrap algorithm to implement the bootstrap for the linearity test. We illustrate the use of xthenreg through a Monte Carlo simulation and an economic application.
摘要:在本文中,我們開(kāi)發(fā)了一個(gè)命令xthenreg,該命令實(shí)現了Seo和Shin(2016,Journal of Econometrics 195:169-186)提出的動(dòng)態(tài)面板閾值模型的矩估計的一階廣義方法。此外,我們推導了動(dòng)態(tài)閾值模型的彎矩約束矩估計的廣義方法的漸近方差公式,并提供了一種估計算法。我們還提出了一種快速自舉算法,以實(shí)現用于線(xiàn)性測試的自舉。我們通過(guò)蒙特卡洛模擬和經(jīng)濟應用說(shuō)明了xthenreg的使用。
Abstract: In this article, we describe the gidm command for fitting generalized inflated discrete models that deal with multiple inflated values in a distribution. Based on the work of Cai, Xia, and Zhou (Forthcoming, Sociological Methods & Research: Generalized inflated discrete models: A strategy to work with multimodal discrete distributions), generalized inflated discrete models are fit via maximum likelihood estimation. Specifically, the gidm command fits Poisson, negative binomial, multinomial, and ordered outcomes with more than one inflated value. We illustrate this command through examples for count and categorical outcomes.
摘要:在本文中,我們描述了gidm命令,用于擬合處理分布中多個(gè)膨脹值的廣義膨脹離散模型。根據蔡,夏和周的工作(即將出版的社會(huì )學(xué)方法與研究:廣義膨脹離散模型:一種適用于多峰離散分布的策略),可以通過(guò)最大似然估計來(lái)擬合廣義膨脹離散模型。具體而言,gidm命令適合泊松,負二項式,多項式和有序的結果,且具有多個(gè)膨脹值。我們通過(guò)計數和分類(lèi)結果示例來(lái)說(shuō)明此命令。
Abstract: I discuss three related problems about getting the last day of the month in a new variable. Commentary ranges from the specifics of date and other functions to some generalities on developing code. Modular arithmetic belongs in every Stata user’s coding toolbox.
摘要:我討論了有關(guān)在新變量中獲取月份的最后一天的三個(gè)相關(guān)問(wèn)題。注釋的范圍從日期和其他功能的細節到開(kāi)發(fā)代碼的一般性。模塊化算法屬于每個(gè)Stata用戶(hù)的編碼工具箱。
Abstract: In this article, I review the Stata Press publication Survey Weights: A Step-by-Step Guide to Calculation by Valliant and Dever (2018).
摘要:在本文中,我回顧了Stata Press的出版物Survey Survey Weights:Valliant and Dever的逐步計算指南(2018)。
Abstract: In this article, I review The Mata Book: A Book for Serious Programmers and Those Who Want to Be, by William Gould (2018, Stata Press).
摘要:在本文中,我將回顧William Gould 撰寫(xiě)的《The Mata Book:A Book for認真的程序員和那些想成為的人》(2018年,Stata出版社)。
拓展性閱讀
2年,計量經(jīng)濟圈公眾號近1000篇文章,
Econometrics Circle
數據系列:空間矩陣 | 工企數據 | PM2.5 | 市場(chǎng)化指數 | CO2數據 | 夜間燈光 | 官員方言 | 微觀(guān)數據 |
計量系列:匹配方法 | 內生性 | 工具變量 | DID | 面板數據 | 常用TOOL | 中介調節 | 時(shí)間序列 | RDD斷點(diǎn) | 合成控制 |
數據處理:Stata | R | Python | 缺失值 | CHIP/ CHNS/CHARLS/CFPS/CGSS等 |
干貨系列:能源環(huán)境 | 效率研究 | 空間計量 | 國際經(jīng)貿 | 計量軟件 | 商科研究 | 機器學(xué)習 | SSCI | CSSCI | SSCI查詢(xún) |
計量經(jīng)濟圈組織了一個(gè)計量社群,有如下特征:熱情互助最多、前沿趨勢最多、社科資料最多、社科數據最多、科研牛人最多、海外名校最多。因此,建議積極進(jìn)取和有強烈研習激情的中青年學(xué)者到社群交流探討,始終堅信優(yōu)秀是通過(guò)感染優(yōu)秀而互相成就彼此的。
聯(lián)系客服