How to handle large datasets / memory intensive computations in STATA.
Stata
typically loads all data into memory (unlike SAS), and it's ability to perform
computations is constrained by the amount of physical memory it can
access on your PC. There are two scenarios where this might be a
problem. First, if you are loading a really huge data set, you
may not be able to allocate enough memory to Stata to import the data.
Second, you may be working on a more modest data set, but may be
using a routine such as xtabond2 which demands huge amounts of memory.
The Stata website discusses the memory problem in general here.
The
typical solution in either case is to change the set memory command.
For example "set memory 1000M" which will allocate about 1GB to
stata. However, on a computer running a 32 bit operating system,
there is a software limit to how high you can set the memory. In
most cases, it is unlikely that you will be able to allocate more than
about 1.4GB to stata (i.e. set memory 1400M) regardless of how much
physical RAM your computer has. This constraint is due to the
nature of a 32 bit operating system - it applies to windows as well as
linux. Most desktop computers running Windows XP or Vista are
running the 32 bit version, even though most desktops and laptops
purchased in recent years are actually fully capable of running a 64
bit O/S.
The solution:
1. Add more physical RAM - 4, 8, 16 GB - whatever you can, AND,
2. Run a 64 bit version of Stata in a 64 bit operating system.
Item
1 is pure economics, but most machines can be upgraded to 4GB RAM for
relatively little cost. Item 2 is a bit more daunting. You
could replace your O/S with a 64 version - such as Windows Vista or XP
with a 64 bit version. This could be expensive, could involve
creating a dual boot scenario with two partitions, and may annoy your
in house IT folks!
There is a more simple solution - install Ubuntu Linux using wubi. Ubuntu
linux is a free linux O/S - and is one of the most popular and most
well supported versions. You could install it on your computer
in a dual boot setup, but this would involve creating a dedicated
partition. While this is not too hard - it can result in disaster
in rare circumstances if you do some stupid like delete the wrong
partition. Wubi is a
Ubuntu installer that installs ubuntu from inside windows - just like
you would install any normal windows program. Then, when you
reboot, a choice between windows and ubuntu pops up before the computer
boots into windows. You select ubuntu and you are now running
linux. The version of ubuntu that comes with wubi is the 64 bit
version.
Note that ubuntu is not running on top of windows.
When you boot into ubuntu you are running a pure linux setup.
The only difference is that the ubuntu files are sitting in the
windows partition. To remove ubuntu, you just boot into windows
and then go to add/remove programs and remove wubi - just as if you
were removing any windows software.
Finally, once you have
ubuntu up and running, you need to order your 64 bit version of stata
for linux. This will cost about $400 for the SE version if you
are upgrading. You can only install this in linux, as it won't
work in windows. Installation is simple, just follow the
instructions, but you need to know a few basics of linux before you
embark on this - so it is worth figuring out how to run commands from
the terminal and what a "superuser" is. A simple intro to ubuntu
is here (well worth a read if you are a linux newbie). If you have questions you can often get them answered on the ubuntu forum..
A couple of notes:
1.
Wubi does it all - you don't need to create a ubuntu disk or download
anything - just run the wubi installer, answer a couple of simple
prompts and you are done.
2. Running 64 bit stata in a 32 bit O/S won't work.
3. Windows 7 is rumored to be coming as with a 64 bit version as default. Either way, your next O/S must be 64 bit!
4. You must follow the install instructions for Stata very closely. The "sudo su" command is important!
Best of luck and don't hesitate to email me if you have any comments on this.