All Practice Exams

100+ Free OpenEDG PCAD Practice Questions

Pass your OpenEDG PCAD — Certified Associate Data Analyst with Python (PCAD-31-02) exam on the first try — instant access, no signup required.

✓ No registration✓ No credit card✓ No hidden fees✓ Start practicing immediately
~65-75% Pass Rate
100+ Questions
100% Free
1 / 100
Question 1
Score: 0/0

Given `df = pd.read_csv('sales.csv')`, which method returns the first 5 rows by default?

A
B
C
D
to track
2026 Statistics

Key Facts: OpenEDG PCAD Exam

48

Exam Items

Python Institute

75%

Passing Score

Python Institute

60 min

Exam Duration

Python Institute

$195

Exam Fee (starts at)

OpenEDG store

Lifetime

Validity

Python Institute

PCAD-31-02

Active Version

Released July 2025

The OpenEDG PCAD-31-02 exam (the active version since July 2025) has 48 single-select and multiple-select items in 60 minutes, with a 75% cumulative passing score. The five syllabus blocks are weighted: Data Acquisition and Pre-Processing 29.2%, Programming and Database Skills 33.3%, Statistical Analysis 8.3%, Data Analysis and Modeling 18.8%, and Data Communication and Visualization 10.4%. Candidates need fluency with Python data structures, OOP, exception handling, parameterized SQL queries, Pandas (loc/iloc, groupby/agg/transform, pivot/melt, datetime indexing, merge/concat), NumPy (broadcasting, axis aggregation, reshape), scikit-learn pipelines (train_test_split, fit/predict/transform, GridSearchCV), statsmodels OLS, scipy.stats hypothesis tests, classification metrics (accuracy/precision/recall/F1/ROC AUC) and regression metrics (MSE/RMSE/R²), KMeans clustering, PCA, and Matplotlib/Seaborn visualization. Lifetime validity. Exam fee starts at $195 USD via the OpenEDG store.

Sample OpenEDG PCAD Practice Questions

Try these sample questions to test your OpenEDG PCAD exam readiness. Each question includes a detailed explanation. Start the interactive quiz above for the full 100+ question experience with AI tutoring.

1Given `df = pd.read_csv('sales.csv')`, which method returns the first 5 rows by default?
A.df.first()
B.df.head()
C.df.top(5)
D.df.peek()
Explanation: `DataFrame.head(n=5)` returns the first n rows (default 5) and is the canonical inspection helper. `df.tail(n)` returns the last n. `df.first()` and `df.peek()` are not pandas methods; `df.top()` does not exist either.
2Which Pandas function loads a CSV file from disk into a DataFrame?
A.pd.load_csv
B.pd.read_csv
C.pd.open_csv
D.pd.csv
Explanation: `pd.read_csv(path, ...)` is the standard CSV reader. Common parameters: `sep`, `header`, `index_col`, `dtype`, `parse_dates`, `na_values`, `chunksize`. The other names do not exist in pandas.
3What does `df.shape` return?
A.Number of rows only
B.A tuple `(n_rows, n_columns)`
C.Memory size in MB
D.Column names
Explanation: `DataFrame.shape` returns a 2-tuple `(rows, columns)`. For a Series it returns a 1-tuple. Use `len(df)` for row count, `df.columns` for column names, `df.memory_usage()` for memory.
4Which method gives summary statistics (mean, std, min, max, quartiles) for numeric columns?
A.df.summary()
B.df.describe()
C.df.stats()
D.df.info()
Explanation: `df.describe()` returns count, mean, std, min, 25/50/75 percentiles, and max for numeric columns by default. Use `df.describe(include='all')` for non-numerics too. `df.info()` shows dtypes and non-null counts; the others don't exist.
5How do you select rows where the `age` column is greater than 30?
A.df[df['age'] > 30]
B.df.where(age > 30)
C.df.select(age > 30)
D.df.filter('age > 30')
Explanation: Boolean indexing — `df[df['age'] > 30]` — is the idiomatic pandas filter. Alternatives: `df.query('age > 30')`. `df.where()` returns the same shape replacing False with NaN. `df.filter()` filters by labels not values; `df.select()` doesn't exist.
6What is the difference between `df.loc[]` and `df.iloc[]`?
A.They are identical
B.`loc` uses labels (row/column names); `iloc` uses integer positions
C.`loc` is deprecated
D.`iloc` works only on Series
Explanation: `loc` selects by label: `df.loc[5, 'age']` selects row with index label 5 (which may or may not be position 5). `iloc` selects by integer position: `df.iloc[0, 2]` always selects row 0 column 2. Both support slicing, lists, and boolean arrays.
7Given `s = pd.Series([1, 2, 3, 4, 5], index=['a','b','c','d','e'])`, what does `s.iloc[1:3]` return?
A.The values at positions 1 and 2 (i.e., 2 and 3, with labels 'b' and 'c')
B.The values at labels 1 and 3
C.An empty series
D.All five values
Explanation: `iloc` slicing uses integer positions and is half-open like Python slicing — `[1:3]` returns positions 1 and 2 (excluding 3). The labels of those positions in the result are 'b' and 'c'. By contrast, `s.loc['b':'c']` is label-based and INCLUSIVE on both ends.
8Which method drops rows containing any NaN values?
A.df.dropna()
B.df.fillna()
C.df.isna()
D.df.removena()
Explanation: `df.dropna()` drops rows (default `axis=0`) with any NaN. Use `how='all'` to drop only fully-NaN rows; `subset=['col']` to consider only specific columns; `thresh=n` to keep rows with at least n non-null values. `fillna` fills NaNs; `isna` returns a mask; `removena` doesn't exist.
9How do you fill all NaN values in a numeric DataFrame with the column mean?
A.df.fillna(df.mean())
B.df.replace(NaN, mean)
C.df.dropna(fill='mean')
D.df.mean(fillna=True)
Explanation: `df.fillna(df.mean())` aligns the Series of column means to the DataFrame and fills column-wise. Other strategies: median (`df.median()`), forward-fill (`method='ffill'`), backward-fill (`method='bfill'`), or domain-specific imputation. Always document imputation choices.
10Which method removes duplicate rows from a DataFrame?
A.df.unique()
B.df.distinct()
C.df.drop_duplicates()
D.df.dedup()
Explanation: `df.drop_duplicates()` returns a DataFrame with duplicate rows removed. Useful parameters: `subset=['col1','col2']` to compare only chosen columns; `keep='first'|'last'|False` to control which duplicate to retain (or drop them all). `unique()` exists on Series only and returns unique values, not a DataFrame.

About the OpenEDG PCAD Exam

The OpenEDG PCAD certification (Associate level — more advanced than the entry-level PCED) validates practical, end-to-end data analysis skills using Python and SQL. The PCAD-31-02 exam covers the full data analysis lifecycle: acquisition (CSV/JSON/SQL/web), cleaning and validation, statistical analysis, supervised and unsupervised modeling with scikit-learn, hypothesis testing with scipy.stats and statsmodels, dimensionality reduction (PCA), clustering (KMeans), classification and regression metrics, and visualization with Matplotlib/Seaborn — all anchored in real Pandas/NumPy syntax.

Assessment

48 single-select and multiple-select items including scenario-based items; covers five blocks: Data Acquisition & Pre-Processing (29.2%), Programming & Database Skills (33.3%), Statistical Analysis (8.3%), Data Analysis & Modeling (18.8%), Data Communication & Visualization (10.4%).

Time Limit

60 minutes

Passing Score

75%

Exam Fee

$195 USD (OpenEDG Testing Service / Python Institute)

OpenEDG PCAD Exam Content Outline

33%

Programming and Database Skills

Python data structures and control flow; functions, modules, OOP (classes, __init__, encapsulation); list/dict comprehensions; exception handling and validation patterns; file I/O (csv, json); SQL SELECT/INSERT/UPDATE/DELETE; database connectivity via DB-API 2.0; parameterized queries to prevent injection; reading SQL into Pandas (pd.read_sql) and writing back (df.to_sql).

29%

Data Acquisition and Pre-Processing

Reading CSV (pd.read_csv with dtype, na_values, parse_dates, chunksize), Excel (pd.read_excel), JSON (json.load/loads, pd.read_json), SQL (pd.read_sql); data validation against schema/ranges/business rules; standardization of units, date formats, text case; missing data detection (isna) and handling (dropna, fillna with mean/median/ffill); deduplication (drop_duplicates); type conversion (pd.to_numeric, astype) with error coercion.

19%

Data Analysis and Modeling

Pandas groupby with agg/transform/filter; pivot/pivot_table/melt for reshaping; merge/concat/join; datetime indexing and resample; supervised models (LinearRegression, LogisticRegression, DecisionTree, RandomForest); unsupervised (KMeans, PCA); train_test_split with stratify; cross_val_score and GridSearchCV; sklearn Pipeline to prevent leakage; preprocessing (StandardScaler, MinMaxScaler, OneHotEncoder, RobustScaler).

10%

Data Communication and Visualization

Matplotlib (plt.plot, plt.bar, plt.scatter, plt.hist, plt.boxplot, plt.subplots, plt.savefig); Seaborn (barplot with aggregation, scatterplot with hue, heatmap for correlations, pairplot, boxplot, violinplot); audience-tailored chart selection; narrative construction with evidence; choosing the right chart for the message.

9%

Statistical Analysis

Descriptive statistics (mean, median, mode, variance, std, quartiles, percentiles, IQR); distributions and central tendency; outlier detection (Tukey's 1.5*IQR fences, ±3σ rule); correlation (Pearson for linear, Spearman for monotonic); hypothesis testing with scipy.stats (ttest_ind, ttest_rel, chi-square, ANOVA); p-value interpretation; bootstrapping for non-parametric inference; OLS regression with statsmodels for inferential output.

How to Pass the OpenEDG PCAD Exam

What You Need to Know

  • Passing score: 75%
  • Assessment: 48 single-select and multiple-select items including scenario-based items; covers five blocks: Data Acquisition & Pre-Processing (29.2%), Programming & Database Skills (33.3%), Statistical Analysis (8.3%), Data Analysis & Modeling (18.8%), Data Communication & Visualization (10.4%).
  • Time limit: 60 minutes
  • Exam fee: $195 USD

Keys to Passing

  • Complete 500+ practice questions
  • Score 80%+ consistently before scheduling
  • Focus on highest-weighted sections
  • Use our AI tutor for tough concepts

OpenEDG PCAD Study Tips from Top Performers

1Master Pandas to fluency — loc vs iloc, boolean indexing, groupby (agg, transform, filter), pivot vs pivot_table vs melt, merge/concat, datetime indexing with resample, and the .str / .dt accessors. The Programming and Data Analysis blocks together account for over 50% of the exam.
2Learn parameterized SQL queries — placeholders (? for sqlite, %s for psycopg2/MySQLdb) with a tuple/dict of parameters. f-strings and string concatenation are dangerous (SQL injection) and are tested. pd.read_sql and df.to_sql connect Pandas to databases.
3Build a sklearn Pipeline mental model: train_test_split FIRST, then fit_transform on train and transform on test (or use Pipeline + cross_val_score to automate it). Fitting a scaler on the full dataset leaks test-set statistics — a common scenario question.
4Memorize classification metrics: precision = TP/(TP+FP), recall = TP/(TP+FN), F1 = harmonic mean, ROC AUC = ranking quality (0.5 = random, 1.0 = perfect). Accuracy is misleading for imbalanced classes — use F1 or ROC AUC instead.
5Memorize regression metrics: MSE, RMSE (same units as target), MAE (robust to outliers), R² (proportion of variance explained, 1.0 = perfect, 0 = mean baseline, can be negative).
6Practice scipy.stats.ttest_ind / ttest_rel and interpret p-values: p < α reject H0 (evidence of effect); we don't 'accept' H0, just fail to reject. Pair p-values with effect sizes and confidence intervals.
7Drill exception handling: try / except ValueError when parsing user input, with-context for files (`with open(path) as f`), and avoiding bare `except:`. PCAD tests robust real-world code patterns.
8Visualization: pick the right chart for the message — scatter for two continuous, histogram for one continuous, bar for categorical, box plot for distribution comparison, heatmap for correlation matrix. Seaborn barplot AGGREGATES (default mean) — different from raw bars.
9Build at least one end-to-end Jupyter notebook per week: ingest CSV/SQL/JSON, validate and clean, EDA with describe and viz, fit and evaluate a model, write up findings. The exam tests workflow understanding, not isolated trivia.

Frequently Asked Questions

What is the OpenEDG PCAD certification?

PCAD (Certified Associate Data Analyst with Python) is OpenEDG / Python Institute's associate-level certification — more advanced than the entry-level PCED — validating end-to-end data analysis skills with Python and SQL. The active version is PCAD-31-02 (released July 2025). Topics span data acquisition, cleaning, statistical analysis, supervised and unsupervised machine learning, and visualization. It is independent of any vendor and forms a stepping stone toward the professional-level PCPD.

How is PCAD different from PCED?

PCED (Certified Entry-Level Data Analyst with Python, PCED-30-01) is the entry-level credential — 30 questions in 40 minutes, $59, focused on Python recap, NumPy, basic Pandas, simple visualization, and descriptive statistics. PCAD is the next level up — 48 questions in 60 minutes, $195+, covering the full data analysis lifecycle including SQL, OOP, scikit-learn modeling pipelines, hypothesis testing, and dimensionality reduction. PCAD assumes PCED-level Python comfort plus more advanced data and ML skills.

How is the PCAD-31-02 exam structured?

48 items (single-select and multiple-select, including scenario-based) in 60 minutes. Five blocks with explicit weightings: Data Acquisition and Pre-Processing 29.2%, Programming and Database Skills 33.3%, Statistical Analysis 8.3%, Data Analysis and Modeling 18.8%, and Data Communication and Visualization 10.4%. Cumulative score must reach at least 75% to pass. Delivered remotely via the OpenEDG Testing Service (TestNow) with online proctoring.

How much does the PCAD exam cost?

PCAD-31-02 voucher options via the OpenEDG store start at $195 (single-shot exam). Bundled options: $215 with retake, $225 with practice test, $245 with retake plus practice test. The voucher is valid for one year from purchase. Compared to other associate-level data analyst certifications, PCAD is competitively priced and includes optional bundled practice tests.

How hard is the PCAD exam?

PCAD is associate-level (intermediate). Candidates with PCED- or PCEP-level Python comfort plus 6-12 months of practical Pandas/SQL/scikit-learn experience usually pass with 60-100 hours of focused study. The Programming & Database block (33.3%) and Data Acquisition block (29.2%) together comprise nearly two-thirds of the exam — practical fluency with Pandas and parameterized SQL is the biggest study leverage. The statistics block is small (8.3%) but trips candidates without scipy.stats / statsmodels exposure.

What study materials are recommended?

Free OpenEDG / Python Institute courses on edube.org cover the syllabus; supplement with the official Python documentation, the Pandas user guide, the scikit-learn user guide, and the statsmodels documentation. Build at least one end-to-end Jupyter notebook per study week — ingest data, clean and validate, EDA, model, evaluate, visualize, narrate. Practice 200+ questions across all five blocks; aim for 85%+ on practice exams before scheduling.

Does PCAD certification expire?

No — PCAD has lifetime validity, like other OpenEDG / Python Institute credentials (PCEP, PCED, PCPP). There is no recertification or continuing education requirement. The certification version (PCAD-31-02) reflects the syllabus you tested against; if OpenEDG releases a new version (e.g., PCAD-31-03), your existing credential remains valid even though new candidates would test against the newer syllabus.

Should I take PCED first?

Highly recommended but not required. PCED ($59) covers Python and basic data analysis at an entry level — solid foundation for PCAD. If you already have several months of practical Pandas/SQL/scikit-learn experience, you can go directly to PCAD. If you are coming from general programming with limited data work, take PCED first to validate the foundations and avoid stretching across two unfamiliar domains at once.