HasData
Back to all posts

How to Scrape TikTok with Python

Valentina Skakun
Valentina Skakun
Last update: 6 Nov 2025

TikTok is a dynamic, JavaScript-heavy platform. Most of the data you see in the browser isn’t in the page’s initial HTML. Requests to the page return only partial information: no video list, no profile stats, sometimes not even a username.

To collect complete data, you need to look deeper — into scripts, API endpoints, and JSON responses that TikTok loads behind the scenes.

In this guide, we’ll walk you through several ways to extract profile data, video lists, comments, and search results. We’ll also explain how to handle common issues such as IP limits, expired links, and 403 errors.

TikTok Page Structure

The first idea that comes to mind when scraping a TikTok page is to find CSS selectors or XPath paths and extract data from them:

But that data is incomplete, and extracting it this way is extremely time-consuming. There’s an easier, faster approach: in the HTML code, you’ll find a script with the ID __UNIVERSAL_DATA_FOR_REHYDRATION__ that contains all profile data in a structured JSON format. 

This works for profile data and individual video pages. For other pages, it’s often simpler to find the right endpoint that returns the required data:

Scraping TikTok Profiles

Using standard parsing from the profile page, you can only extract part of the data:

Instead, we can use the script content, which contains much more detailed, actionable information:

FieldDescription
user.uniqueIdPublic username (handle)
user.nicknameDisplay name
user.signatureBio or profile description
user.avatarLargerURL of the large profile picture
user.avatarMediumURL of the medium profile picture
user.avatarThumbURL of the thumbnail profile picture
user.verifiedBoolean; indicates if the account is verified
followerCountNumber of followers
followingCountNumber of accounts the user follows
heartTotal likes received
videoCountNumber of videos posted
titlePage title for sharing
descPage description for sharing

The JSON includes extra metadata you can ignore. Apply filtering logic to extract only relevant fields before saving results.

Install and import the required libraries:

  • Requests for fetching the page content
  • BeautifulSoup for parsing the HTML and finding the script
  • JSON for processing the data

Before running the script, update the target URL. The following example uses the HasData Web Scraping API, which automatically handles proxies, rate limits, and CAPTCHAs, bypassing TikTok’s native limit of roughly 100 requests per hour. You don’t need to rotate IPs or manage sessions manually, HasData does it for you:

import requests
import json
from bs4 import BeautifulSoup

url = "https://www.tiktok.com/@funny_funny66066"

#Get your API key at https://hasdata.com/sign-up
api_key = "YOUR-API-KEY"
api_url = "https://api.hasdata.com/scrape/web"

payload = json.dumps({
    "url": url,
    "proxyType": "residential",
    "proxyCountry": "DE",
    "blockResources": False,
    "wait": 1000,
    "jsRendering": True
})

headers = {
    "Content-Type": "application/json",
    "x-api-key": api_key
}

# Send the request to HasData API
response = requests.post(api_url, headers=headers, data=payload)

# Parse the response as JSON
data = response.json()

# Extract content from the response
content_html = data.get("content")

# Parse HTML
soup = BeautifulSoup(content_html, "html.parser")

# Find the script with the JSON data
script = soup.find("script", id="__UNIVERSAL_DATA_FOR_REHYDRATION__")
if script:
    data = json.loads(script.string)
   
    # Extract user details from __DEFAULT_SCOPE__
    user_detail = data.get("__DEFAULT_SCOPE__", {}).get("webapp.user-detail")
   
    # Save only the user-detail section
    with open("tiktok_user_detail.json", "w", encoding="utf-8") as f:
        json.dump(user_detail, f, ensure_ascii=False, indent=4)

The output is a JSON file containing data like this:

{
    "userInfo": {
        "user": {
            "uniqueId": "funny_funny66066",
            "nickname": "Animals Funny Videos",
            "signature": "Follow me and good-looking things with you\nhttp//: guxw.net",
            "avatarLarger": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7343166051334488106~tplv-tiktokx-cropcenter:1080:1080.jpeg",
            "avatarMedium": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7343166051334488106~tplv-tiktokx-cropcenter:720:720.jpeg",
            "avatarThumb": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7343166051334488106~tplv-tiktokx-cropcenter:100:100.jpeg",
            "verified": false,
            "createTime": 1666686070,
            "secUid": "MS4wLjABAAAAyl5blLK97VXJ0vZ_MQ7y5tnFfgEvYxYdleqw1-JAnwRcoyAGGpkkXxUvlQMwG5Xc",
            "privateAccount": false,
            "language": "en"
        },
        "stats": {
            "followerCount": 18600,
            "followingCount": 808,
            "heart": 126300,
            "videoCount": 186,
            "friendCount": 345
        }
    },
    "shareMeta": {
        "title": "Animals Funny Videos on TikTok",
        "desc": "@funny_funny66066 18.6k Followers, 808 Following, 126.3k Likes - Watch awesome short videos created by Animals Funny Videos"
    }
}

If the profile contains contact details, they will appear in the signature field. This field usually surfaces the profile description, so you’ll need to use regular expressions to isolate the email address from the rest of the text:

import json
import re

# Load JSON from file
with open("tiktok_user_detail.json", "r", encoding="utf-8") as f:
    data = json.load(f)

# Extract signature text
signature = data.get("userInfo", {}).get("user", {}).get("signature", "")

# Find email with regex
email = re.search(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", signature)

Keep in mind that users rarely include contact details, so don’t rely on this field being present.

Scraping TikTok Videos

This part is a bit more complex, since the video list isn’t included in the same JSON we parsed earlier. Still, we can use that data to get videos from the profile. 

Get the video list

To get the video list, you can use either a) a browser automation library that scrolls through the page like a real user:

Or b) an endpoint that returns all needed data: 

https://www.tiktok.com/api/post/item_list/

Using an endpoint involves two main steps:

  1. Get the secUid value from the JSON we extracted in the previous section.
  2. Use it to call the endpoint and retrieve the videos.

Note that the API returns no more than 35 videos per request. To get more, use the cursor parameter (the position or token of the last video) to load the next page.

You can check the hasMore parameter, if it’s true, there are more videos available.

Finally, extract and save the clean, actionable data from the returned JSON, excluding metadata and unnecessary clutter.

import requests
import json
from bs4 import BeautifulSoup
import time

profile_url = "https://www.tiktok.com/@funny_funny66066"   # put the username page here
api_base = "https://www.tiktok.com/api/post/item_list/"
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
    "accept": "*/*",
    "referer": profile_url,
}

# fetch profile page and extract secUid 
r = requests.get(profile_url, headers=headers, timeout=15)
r.raise_for_status()
soup = BeautifulSoup(r.text, "html.parser")
script = soup.find("script", id="__UNIVERSAL_DATA_FOR_REHYDRATION__")
if not script or not script.string:
    raise SystemExit("Could not find rehydration script on profile page.")

raw = json.loads(script.string)

secuid = (
    raw.get("__DEFAULT_SCOPE__", {})
       .get("webapp.user-detail", {})
       .get("userInfo", {})
       .get("user", {})
       .get("secUid")
)

if not secuid:
    raise SystemExit("secUid not found in page JSON. Page structure may differ.")

print("secUid:", secuid)

# paginated fetch using secUid 
result = []
cursor = 0
has_more = True

while has_more:
    params = {
        "aid": "1988",
        "count": "35", 
        "cursor": cursor,
        "device_platform": "web_pc",
        "secUid": secuid,
    }

    resp = requests.get(api_base, headers=headers, params=params, timeout=15)
    # handle non-json responses
    try:
        data = resp.json()
    except ValueError:
        print("Non-JSON response, status:", resp.status_code)
        break


    for item in data.get("itemList", []):
        # collect only the clean video shape you asked for
        video = {
            "profileURL": profile_url,
            "id": item.get("id"),
            "desc": item.get("desc"),
            "createTime": item.get("createTime"),
            "duration": item.get("video", {}).get("duration"),
            "cover": item.get("video", {}).get("cover"),
            "originCover": item.get("video", {}).get("originCover"),
            "dynamicCover": item.get("video", {}).get("dynamicCover"),
            "playAddr": item.get("video", {}).get("playAddr"),
            "downloadAddr": item.get("video", {}).get("downloadAddr"),
            "videoID": item.get("video", {}).get("videoID"),
            "width": item.get("video", {}).get("width"),
            "height": item.get("video", {}).get("height"),
            "size": item.get("video", {}).get("size"),
            "definition": item.get("video", {}).get("definition"),
            "ratio": item.get("video", {}).get("ratio"),
            "videoQuality": item.get("video", {}).get("videoQuality"),
            "stats": {
                "playCount": item.get("stats", {}).get("playCount"),
                "diggCount": item.get("stats", {}).get("diggCount"),
                "commentCount": item.get("stats", {}).get("commentCount"),
                "shareCount": item.get("stats", {}).get("shareCount"),
                "collectCount": item.get("stats", {}).get("collectCount")
            },
            "music": {
                "authorName": item.get("music", {}).get("authorName"),
                "title": item.get("music", {}).get("title"),
                "playUrl": item.get("music", {}).get("playUrl"),
                "duration": item.get("music", {}).get("duration")
            },
            "textExtra": item.get("textExtra", []),
            "challenges": [ch.get("title") for ch in item.get("challenges", [])]
        }
        result.append(video)


    # pagination tokens
    cursor = data.get("cursor", 0)
    has_more = bool(data.get("hasMore", False))

    time.sleep(0.5)

print("Collected videos:", len(result))

# Save cleaned output
with open("videos_clean_full.json", "w", encoding="utf-8") as f:
    json.dump(result, f, ensure_ascii=False, indent=2)

Download TikTok Videos 

One of the fields in the API response (and in our saved JSON) is the download link, downloadAddr.

We could download videos directly during data extraction, but to keep things organized, we’ll do that in a separate script.

We’ll use the links from the saved JSON file. The downloadAddr links in TikTok’s API responses expire within about an hour or less; old links return 403 Access Denied errors.

import requests
import os


# Folder for downloaded videos
download_folder = "tiktok_videos"
os.makedirs(download_folder, exist_ok=True)


# Download a single video from URL
def download_video(url, filename):
    try:
        # Stream download to avoid memory overload
        headers = {
            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
            "accept": "*/*",
            "referer": "https://www.tiktok.com",
        }
        response = requests.get(url,  headers=headers, stream=True)
        response.raise_for_status()
        filepath = os.path.join(download_folder, filename)
        with open(filepath, "wb") as f:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)
        print(f"[OK] {filename} downloaded")
    except Exception as e:
        print(f"[ERROR] {filename} failed: {e}")


# Load JSON with video info
with open("videos_clean_full.json", "r", encoding="utf-8") as f:
    videos = json.load(f)


# Iterate over videos and download
for video in videos:
    download_url = video.get("downloadAddr")
    video_id = video.get("id")
    if download_url:
        filename = f"{video_id}.mp4"
        download_video(download_url, filename)
    else:
        print(f"[WARN] No download link for video {video_id}")

Scraping TikTok Comments

The next step is extracting comments from videos. 

Use TikTok’s comment API endpoint to extract comments efficiently. Note that TikTok limits comment scraping to 5,000 per video; pagination stops afterward:

import requests
import json

# list of TikTok video IDs to fetch comments from
video_ids = ["7518505042194861325", "7501766490077678891"]

# number of comments to fetch per video
count = 30
# standard desktop browser User-Agent to avoid blocking
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/141.0.0.0 Safari/537.36"
}

comments = []

# iterate through each video ID and fetch comments
for vid in video_ids:
    url = f"https://www.tiktok.com/api/comment/list/?aid=1988&aweme_id={vid}&count={count}"
    resp = requests.get(url, headers=headers)
    # attempt to parse the JSON response
    try:
        data = resp.json()
    except ValueError:
        print(f"Non-JSON response for video {vid}, status:", resp.status_code)
        continue

    # loop through each comment object
    for c in data.get("comments", []):
        comments.append(
            {
                "video_id": c.get("aweme_id"),
                "video_desc": c["share_info"].get("desc"),
                "video_title": c["share_info"].get("title"),
                "video_url": c["share_info"].get("url"),
                "comment_id": c.get("cid"),
                "text": c.get("text"),
                "likes": c.get("digg_count"),
                "language": c.get("comment_language"),
                "timestamp": c.get("create_time"),
                "author": {
                    "name": c["user"].get("nickname"),
                    "username": c["user"].get("unique_id"),
                    "user_id": c["user"].get("uid"),
                    "avatar": (
                        c["user"]["avatar_thumb"]["url_list"][0]
                        if c["user"].get("avatar_thumb")
                        else None
                    ),
                },
                "comment_url": c["share_info"].get("url"),
            }
        )

# save all collected comments into a JSON file
with open("comments.json", "w", encoding="utf-8") as f:
    json.dump(comments, f, ensure_ascii=False, indent=2)

print(f"Saved {len(comments)} comments to comments.json")

Search scraping is the hardest part, because TikTok dynamically generates headers. Endpoints like:

api/search/general/full/

or:

api/search/item/full/

require valid, time-sensitive headers.

To capture the responses, use a headless browser (e.g., Playwright) and intercept API calls directly.

import json
from playwright.sync_api import sync_playwright

keyword = "cat funny video"
search_url = f"https://www.tiktok.com/search?q={keyword}"

def extract_useful_data(api_response):
    # Extract only the fields we care about from TikTok API
    useful = []
    data_list = api_response.get("data", [])

    for entry in data_list:
        item = entry.get("item", {})
        video = item.get("video", {})
        author = item.get("author", {})
        music = item.get("music", {})
        stats = item.get("stats", {})
        hashtags = [
            tag.get("hashtagName")
            for tag in item.get("textExtra", [])
            if tag.get("hashtagName")
        ]

        # Build simplified dict with relevant info
        useful.append({
            "id": item.get("id"),
            "desc": item.get("desc"),
            "createTime": item.get("createTime"),
            "video_url": video.get("playAddr"),
            "cover": video.get("cover"),
            "author_name": author.get("nickname"),
            "author_id": author.get("id"),
            "author_uniqueId": author.get("uniqueId"),
            "author_avatar": author.get("avatarThumb"),
            "music_title": music.get("title"),
            "music_author": music.get("authorName"),
            "hashtags": hashtags,
            "stats": {
                "likes": stats.get("diggCount"),
                "comments": stats.get("commentCount"),
                "shares": stats.get("shareCount"),
                "plays": stats.get("playCount"),
                "favorites": stats.get("collectCount")
            }
        })
    return useful

def run(playwright):
    # Launch browser with visible window for debugging
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    api_response = None

    # Listen for all network responses and capture TikTok search API
    def handle_response(response):
        nonlocal api_response
        if "api/search/general/full/" in response.url:  # check for API endpoint
            try:
                api_response = response.json()  # store JSON for later parsing
            except Exception:
                pass  # ignore invalid responses

    page.on("response", handle_response)

    # Open TikTok search page
    page.goto(search_url, timeout=60000)
    page.wait_for_timeout(10000)  # wait a bit for the API to respond

    if api_response:
        # Parse API response into simplified structure
        useful_data = extract_useful_data(api_response)

        # Save to JSON file
        with open("search_clean.json", "w", encoding="utf-8") as f:
            json.dump(useful_data, f, ensure_ascii=False, indent=2)

    # Close browser cleanly
    browser.close()

# Start Playwright and run scraping
with sync_playwright() as playwright:
    run(playwright)

If you prefer to stick with the Requests library instead of using Playwright, you can copy the headers manually from your browser and reuse them for a while, until they expire.

Final Thoughts

Now you have a working setup for scraping TikTok profiles and related data. You can improve it by adding error handling, managing rate limits, or handling CAPTCHAs to avoid being blocked.

The gist is, most data is already accessible via JSON scripts and endpoints, so scraping it mostly comes down to collecting it reliably or integrating it with other tools.

All examples are available in the repository.

Valentina Skakun
Valentina Skakun
I'm a technical writer who believes that data parsing can help in getting and analyzing data. I'll tell about what parsing is and how to use it.
Articles

Might Be Interesting