pyspark.pandas.DataFrame.info¶
-
DataFrame.
info
(verbose: Optional[bool] = None, buf: Optional[IO[str]] = None, max_cols: Optional[int] = None) → None[source]¶ Print a concise summary of a DataFrame.
This method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage.
- Parameters
- verbosebool, optional
Whether to print the full summary.
- bufwritable buffer, defaults to sys.stdout
Where to send the output. By default the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output.
- max_colsint, optional
When to switch from the verbose to the truncated output. If the DataFrame has more than max_cols columns, the truncated output is used.
- null_countsbool, optional
Whether to show the non-null counts.
Deprecated since version 3.4.0.
- Returns
- None
This method prints a summary of a DataFrame and returns None.
See also
DataFrame.describe
Generate descriptive statistics of DataFrame columns.
Examples
>>> int_values = [1, 2, 3, 4, 5] >>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon'] >>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0] >>> df = ps.DataFrame( ... {"int_col": int_values, "text_col": text_values, "float_col": float_values}, ... columns=['int_col', 'text_col', 'float_col']) >>> df int_col text_col float_col 0 1 alpha 0.00 1 2 beta 0.25 2 3 gamma 0.50 3 4 delta 0.75 4 5 epsilon 1.00
Prints information of all columns:
>>> df.info(verbose=True) <class 'pyspark.pandas.frame.DataFrame'> Index: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 int_col 5 non-null int64 1 text_col 5 non-null object 2 float_col 5 non-null float64 dtypes: float64(1), int64(1), object(1)
Prints a summary of columns count and its dtypes but not per column information:
>>> df.info(verbose=False) <class 'pyspark.pandas.frame.DataFrame'> Index: 5 entries, 0 to 4 Columns: 3 entries, int_col to float_col dtypes: float64(1), int64(1), object(1)
Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:
>>> import io >>> buffer = io.StringIO() >>> df.info(buf=buffer) >>> s = buffer.getvalue() >>> with open('%s/info.txt' % path, "w", ... encoding="utf-8") as f: ... _ = f.write(s) >>> with open('%s/info.txt' % path) as f: ... f.readlines() ["<class 'pyspark.pandas.frame.DataFrame'>\n", 'Index: 5 entries, 0 to 4\n', 'Data columns (total 3 columns):\n', ' # Column Non-Null Count Dtype \n', '--- ------ -------------- ----- \n', ' 0 int_col 5 non-null int64 \n', ' 1 text_col 5 non-null object \n', ' 2 float_col 5 non-null float64\n', 'dtypes: float64(1), int64(1), object(1)']