Data Preparation 17.0
Improve Data Preparation for More Accurate Results
All researchers have to prepare their data prior to analysis. While data preparation tools are included in the SPSS Base product, sometimes you need more specialized techniques to get your data ready. With the SPSS Data Preparation add-on module, you gain new techniques to help you streamline the data preparation stage of the analytical process.
This module helps you to:
Identify suspicious or invalid cases, variables, and data values
- View patterns of missing data
- Summarize variable distributions
- More accurately get your data ready for analysis
Learn how SPSS Data Preparation’s techniques can help you get ready for analysis faster and reach more accurate conclusions.
SPSS Data Preparation is available in English, Japanese, French, German, Italian, and Spanish.
Expand Your Data Preparation Techniques
Use the specialized data preparation techniques in SPSS Data Preparation to facilitate data preparation in the analytical process. Like all add-on modules, SPSS Data Preparation easily plugs into SPSS Statistics Base so you can seamlessly work in the SPSS environment.
Perform data checks
Data validation has typically been a manual process. You might run a frequency on your data, print the frequencies, circle what needs to be fixed, and check for case IDs. Needless to say, this is time consuming. And since every analyst in your organization could use a slightly different method, maintaining consistency from project to project may be a challenge.
To eliminate manual checks, use the Validate Data procedure. This procedure enables you to apply rules to perform data checks based on each variable’s measure level (whether categorical or continuous). For example, if you’re analyzing survey data that has variables on a five-point Likert scale, use the Validate Data procedure to apply a rule for five-point scales and flag all cases that have values outside of the 1-5 range. You can receive reports of invalid cases as well as summaries of rule violations and the number of cases affected. You can specify validation rules for individual variables (such as range checks) and cross-variable checks (for example, “pregnant males”).
With this knowledge you can determine data validity and remove or correct suspicious cases at your discretion prior to analysis.
Quickly find multivariate outliers
Prevent outliers from skewing analyses when you use the Anomaly Detection procedure. This procedure searches for unusual cases based upon deviations from similar cases and gives reasons for such deviations. You can flag outliers by creating a new variable. Once you have identified unusual cases, you can further examine them and determine if they should be included in your analyses.
Preprocess data prior to model building
In order to use algorithms that are designed for nominal attributes (such as Naïve Bayes and logit models), you must bin your scale variables prior to model building. If scale variables aren’t binned, algorithms such as multinomial logistic regression will take an extremely long time to process or they might not converge. This is especially true if you have a large dataset. In addition, the results you receive may be difficult to read or interpret.
Optimal Binning, however, enables you to determine cutpoints to help you reach the best possible outcome for algorithms designed for nominal attributes.
With this procedure, you can select from three types of binning for preprocessing data prior to model building:
- Unsupervised: Create bins with equal counts
- Supervised: Take the target variable into account to determine cutpoints. This method is more accurate than unsupervised; however, it is also more computationally intensive.
- Hybrid approach: Combines the unsupervised and supervised approaches. This method is particularly useful if you have a large amount of distinct values.
System requirements
For SPSS Statistics Base 17.0 for Windows
Operating System
MicrosoftWindows XP (32-bit versions) or Vista(32-bit or 64-bit versions)
Hardware
Intelor AMD x86 processor running at 1GHz or higher
Memory: 512MB RAM; 1GB recommended
Minimum free drive space: 450MB
CD-ROM drive
Super VGA (800x600) or a higher-resolution monitor
For connecting with SPSS Statistics Base Server, a network adapter running the TCP/IP network protocol
Software
Web browser: Internet Explorer 6 or above
For SPSS Statistics Base 17.0 for Mac OS X
Operating system: Apple Mac OS X 10.4 (Tiger) or Mac OSX 10.5 (Leopard)
Hardware
PowerPC or Intel processor
Memory: 512MB RAM; 1GB recommended
Minimum free drive space: 800MB
CD-ROM drive
Super VGA (800x600) or a higher-resolution monitor
Software
Safari 1.3.1, MozillaFirefox1.5 or higher, or Netscape7.2 or higher
Java Standard Edition 5.0 (J2SE 5.0)
SPSS Statistics Base 17.0 for Linux
Operating system*
Any Linux OS that meets the following requirements:
Kernel 2.6.9.42 or higher
glibc 2.3.4 or higher
XFree86-4.0 or higher
libstdc++5
Hardware
Processor: Intel or AMD x86 processor running at 1 GHz or higher
RAM: 512MB RAM; 1GB recommended
450 MB of available hard-disk space
CD-ROM drive
Super VGA (800x600) or a higher-resolution monitor
Software
Web browser: Konqueror 3.4.1 or higher, or Firefox 1.0.6 or higher, or Netscape 7.2 or higher
*Note: SPSS Statistics 17.0 was tested on and is supported only on Red HatEnterprise Linux 4 Desktop and Debian4.0
SPSS Statistics add-on modules
All SPSS Statistics 17.0 add-on modules require SPSS Statistics Base 17.0.
No other system requirements are necessary.
Amos 17.0
Operating system: Windows XP or Windows Vista
Hardware:
Memory: 256MB RAM minimum
125MB or more available hard-drive space
Web browser: Internet Explorer 6.0
SPSS Statistics Server 17.0
Operating system: Windows Server 2003 or Windows Server 2008 (32-bit or 64-bit) or Windows Server 2008 (32-bit or 64-bit); Sun Solaris (SPARC) 9 and later (64-bit only); IBMAIX5.3 and later; or Red Hat Enterprise Linux ES4 and later (64-bit); HP-UX 11i (64-bit Itanium)
Hardware
Minimum CPU: Two CPUs recommended, running at 1GHz or higher
Memory: 512MB RAM per expected concurrent user
Minimum free drive space: 300MB
Required temporary disk space: Calculate by multiplying 2.5 x number of users x expected size of dataset in megabytes
SPSS Statistics Adapter for SPSS Predictive Enterprise Services
Requires Statistics Base 17.0 and SPSS Predictive Enterprise Services
|