Skip to main content

Python PySpark: How to Resolve "TypeError: 'Column' object is not callable"

When working with PySpark DataFrames, you might encounter the TypeError: 'Column' object is not callable. This error signals a fundamental misunderstanding of how PySpark's DataFrame and Column objects interact. It typically occurs when you try to execute a method (like .show()) directly on a Column object, or when you attempt to apply a standard Python function to a column without registering it as a User-Defined Function (UDF).

This guide will explain the distinction between PySpark DataFrame and Column objects, walk through the common scenarios that trigger this error, and provide clear, correct code to resolve it.

Understanding the Error: DataFrame vs. Column Objects

In PySpark, these two objects have distinct roles:

  • DataFrame: A distributed, immutable collection of data organized into named columns. It is conceptually similar to a table in a relational database. It has actions (like .show(), .collect()) and transformations (like .select(), .filter(), .withColumn()).
  • Column: A reference to a column within a DataFrame. It represents a transformation to be applied to the data in that column but does not hold the data itself. You cannot call DataFrame actions like .show() on a Column object.

The TypeError: 'Column' object is not callable occurs because you are attempting to execute a function or method on a Column object as if it were a DataFrame or a simple Python data type.

Scenario 1: Calling a DataFrame Method on a Column

This is the most frequent cause of the error. You select a column and then try to call a DataFrame action like .show() on it directly.

Example of code causing the error:

from pyspark.sql import SparkSession, Row

spark = SparkSession.builder.getOrCreate()

sdf = spark.createDataFrame([
Row(name="Tom", age=29),
Row(name="Jane", age=26),
Row(name="Lisa", age=22)
])

# sdf['name'] returns a Column object, which does not have a .show() method.
try:
sdf['name'].show()
except TypeError as e:
print(f"Error: {e}")

Output:

Error: 'Column' object is not callable

Solution: To display a specific column, you must use the DataFrame.select() transformation. This method returns a new DataFrame containing only the selected columns, on which you can then call .show().

from pyspark.sql import SparkSession, Row

spark = SparkSession.builder.getOrCreate()

sdf = spark.createDataFrame([
Row(name="Tom", age=29),
Row(name="Jane", age=26),
Row(name="Lisa", age=22)
])

# ✅ Correct: Use .select() to get a new DataFrame, then call .show()
sdf.select('name').show()

Output:

+----+
|name|
+----+
| Tom|
|Jane|
|Lisa|
+----+

Scenario 2: Applying a Python Function to a Column Incorrectly

Another common mistake is passing a PySpark Column object to a standard Python function within a withColumn transformation. PySpark does not automatically apply the function to each row; instead, it passes the entire Column object to your function.

The Code Causing the Error:

from pyspark.sql import SparkSession, Row
from pyspark.sql.functions import col

spark = SparkSession.builder.getOrCreate()
sdf = spark.createDataFrame([Row(name="Tom"), Row(name="Jane")])

# A standard Python function
def to_upper(text):
return text.upper() # This expects 'text' to be a string

try:
# This passes the 'Column' object to to_upper, not the string values.
# The function fails when it tries to call .upper() on the Column object.
new_df = sdf.withColumn("name", to_upper(col('name')))
new_df.show()
except Exception as e:
# The underlying error is often an AttributeError, which PySpark wraps
print(f"Error: A TypeError or AttributeError occurs. Original error message is often related to the function's internal operation on a Column type.")

Output:

Error: A TypeError or AttributeError occurs. Original error message is often related to the function's internal operation on a Column type.

Solution: To apply a custom Python function element-wise, you must register it as a User-Defined Function (UDF) using the @udf decorator. This tells Spark to serialize the function and apply it to each row's value on the worker nodes.

from pyspark.sql import SparkSession, Row
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType

spark = SparkSession.builder.getOrCreate()
sdf = spark.createDataFrame([
Row(name="Tom", age=29),
Row(name="Jane", age=26),
Row(name="Lisa", age=22)
])

# ✅ Correct: Register the function as a UDF
@udf(returnType=StringType())
def to_upper(text):
if text:
return text.upper()
return None

new_df = sdf.withColumn("name", to_upper(col('name')))
new_df.show()

Output:

+----+---+                                                                      
|name|age|
+----+---+
| TOM| 29|
|JANE| 26|
|LISA| 22|
+----+---+
note

For better performance, always try to use built-in PySpark functions (e.g., pyspark.sql.functions.upper) instead of UDFs whenever possible.

Scenario 3: Using an Unavailable Column Method

This error can also appear if you try to use a Column method that doesn't exist, either because of a typo or because you are using an older version of PySpark that does not support it.

Example of the code causing the error: the .contains() method was added in Spark 2.2. Running the code below on an older version like Spark 2.1 would cause an error.

# This code will fail on PySpark < 2.2 because the .contains() method
# does not exist on the Column object.

from pyspark.sql import SparkSession, Row
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType

spark = SparkSession.builder.getOrCreate()
sdf = spark.createDataFrame([
Row(name="Tom", age=29),
Row(name="Jane", age=26),
Row(name="Lisa", age=22)
])

try:
sdf.filter(sdf.name.contains('a')).show()
except (TypeError, AttributeError) as e:
print(f"Error on older Spark versions: {e}")

Output:

Traceback (most recent call last):
File "main.py", line 5, in <module>
sdf.filter(sdf.name.contains('a')).show()
^^^
NameError: name 'sdf' is not defined

Solution: the solution is to ensure you are using a modern version of PySpark and that you are calling a valid, correctly spelled method on the Column object.

from pyspark.sql import SparkSession, Row
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType

spark = SparkSession.builder.getOrCreate()
sdf = spark.createDataFrame([
Row(name="Tom", age=29),
Row(name="Jane", age=26),
Row(name="Lisa", age=22)
])

try:
sdf.filter(sdf.name.contains('a')).show()
except (TypeError, AttributeError) as e:
print(f"Error on older Spark versions: {e}")

# ✅ Correct: Ensure PySpark is updated (e.g., to version 3.x)
# This code works correctly on modern versions.
sdf.filter(sdf.name.contains('a')).show()

Output:

+----+---+                                                                      
|name|age|
+----+---+
|Jane| 26|
|Lisa| 22|
+----+---+
warning

Always double-check method names against the official PySpark API documentation for your specific version to avoid typos and ensure the method is available.

Conclusion

The TypeError: 'Column' object is not callable in PySpark is a signal to revisit the distinction between DataFrames and Columns. To fix this error:

  1. Use DataFrame transformations like .select() or .filter() to work with columns before calling an action like .show(). Never call DataFrame actions directly on a Column.
  2. Wrap custom Python functions in a UDF (@udf) to apply them element-wise to a column's data.
  3. Verify Column methods against the official documentation to ensure they exist in your PySpark version and are spelled correctly.

By understanding that a Column represents a plan for a transformation, not the data itself, you can write effective and error-free PySpark code.