Introduce PttBrain

02/17/20193 Min Read — In PttBrain

Before jumping into the PttBrain product, I would like to talk about what PTT is a little bit.

About PTT

PTT Bulletin Board System, is the largest terminal-based bulletin board system (BBS) based in Taiwan. PTT has more than 1.5 million registered users, with over 150,000 users online during peak hours. The BBS has over 20,000 boards covering a multitude of topics, and more than 20,000 articles and 500,000 comments are posted every day. (Wikipedia)

Basically, PTT is Taiwanese "Reddit". The main user interface can be roughly divided into two parts (boards & articles).

Boards

In each board, users can browse specific "category" of articles sorted by upload date. We will explain the number on the left side later.

The following is one example,
in Soft_Job board, people are talking about software jobs and sharing interview experiences.

Articles

Article can be published by those users who have privilege (rules vary based on board).
Let's take a look at one example first.

On the top nav bar, we can see "author"[作者], "title"[標題], "datetime"[時間].
Scrolling down, just below main content, "green text" tells where the ip was when author posted this article.
At the end, we can see a full list of comments.
For each comment, the sign at the front means if this user liked [推] / hated [噓] or replied [→] to the article, followed by user id, content and posted datetime.

Introduce PttBrain

Even though PTT is a fantastic social media platform full of rich information,
I still had questions like

  • Can we categorized users in each board ?
  • Where was the article posted ? (not just IP)
  • What is the average number of articles posted daily per board ?
  • Can we see users historical activities ?
  • There should be a better search engine to search article content not just title

At the moment I came out with an idea

How about building a data analytic tool which scraps PTT every day and then do grouping / visualization using modern technology ?

Then PttBrain was born on 2018/01/01.

Tech Stacks

Since this is a personal fund project, my expected spending would be less than $100 USD/month. The stacks look like

  • Front End: React / Redux / Semantic UI
  • Back End: NodeJS / Firebase
  • Database: PostgreSQL (AWS EC2, m5.large), Heroku Redis
  • Website / Server Hoisting: Heroku
  • CI/CD: CircleCI

Product Features

We use zombodb as our search engine plugin, which indexes PostgreSQL and documents data using Elasticsearch.

Below are the sample screenshots; the search keyword would be highlighted.

  1. Searching articles alt text
  2. Searching user IDs alt text
Home Page

Here you can see statistics across whole PTT popular boards.

  1. See all boards which PttBrain is currently tracking alt text
  2. Popular articles in a week alt text
  3. Distribution of all user IPs alt text
User Page

One of killer features PttBrain has is to track each users' activities and IP locations

As sample page shown below, you can see several user's metrics

  1. Last visit datetime
  2. All posted articles
  3. All historical comments (pie chart -> distribution of boards, line chart -> # of comments in time series)
  4. IP distribution (article + comments) alt text

Board Page

Some very interesting metrics are shown on the top

  1. Popular visit datetime, average # of daily articles
  2. Active users who had posted most popular articles (within a week)
  3. Users who posted "like" comments the most (within a month)
  4. Users who posted "hate" comments the most (within a month)
  5. All articles in the board

alt text

Article Page

Currently this page is similar to native PTT article page. The difference is here we improve user experience by using modern web components

alt text

That's It

I hope you enjoy the introduction so far.
We are looking for more talented engineers / data scientist / marketing / designers to join this community.
Please follow us and give us any feedbacks.

© 2020 by Warren. All rights reserved.
Last build: 11/28/2021
;