Before jumping into the PttBrain product, I would like to talk about what PTT is a little bit.
PTT Bulletin Board System, is the largest terminal-based bulletin board system (BBS) based in Taiwan. PTT has more than 1.5 million registered users, with over 150,000 users online during peak hours. The BBS has over 20,000 boards covering a multitude of topics, and more than 20,000 articles and 500,000 comments are posted every day. (Wikipedia)
Basically, PTT is Taiwanese "Reddit". The main user interface can be roughly divided into two parts (boards & articles).
In each board, users can browse specific "category" of articles sorted by upload date. We will explain the number on the left side later.
The following is one example,
in Soft_Job board, people are talking about software jobs and sharing interview experiences.
Article can be published by those users who have privilege (rules vary based on board).
Let's take a look at one example first.
On the top nav bar, we can see "author"[作者], "title"[標題], "datetime"[時間].
Scrolling down, just below main content, "green text" tells where the ip was when author posted this article.
At the end, we can see a full list of comments.
For each comment, the sign at the front means if this user liked [推] / hated [噓] or replied [→] to the article, followed by user id, content and posted datetime.
Even though PTT is a fantastic social media platform full of rich information,
I still had questions like
- Can we categorized users in each board ?
- Where was the article posted ? (not just IP)
- What is the average number of articles posted daily per board ?
- Can we see users historical activities ?
- There should be a better search engine to search article content not just title
At the moment I came out with an idea
How about building a data analytic tool which scraps PTT every day and then do grouping / visualization using modern technology ?
Then PttBrain was born on 2018/01/01.
Since this is a personal fund project, my expected spending would be less than $100 USD/month. The stacks look like
- Front End: React / Redux / Semantic UI
- Back End: NodeJS / Firebase
- Database: PostgreSQL (AWS EC2, m5.large), Heroku Redis
- Website / Server Hoisting: Heroku
- CI/CD: CircleCI
We use zombodb as our search engine plugin, which indexes PostgreSQL and documents data using Elasticsearch.
Below are the sample screenshots; the search keyword would be highlighted.
- Searching articles
- Searching user IDs
Here you can see statistics across whole PTT popular boards.
- See all boards which PttBrain is currently tracking
- Popular articles in a week
- Distribution of all user IPs
One of killer features PttBrain has is to track each users' activities and IP locations
As sample page shown below, you can see several user's metrics
- Last visit datetime
- All posted articles
- All historical comments (pie chart -> distribution of boards, line chart -> # of comments in time series)
- IP distribution (article + comments)
Some very interesting metrics are shown on the top
- Popular visit datetime, average # of daily articles
- Active users who had posted most popular articles (within a week)
- Users who posted "like" comments the most (within a month)
- Users who posted "hate" comments the most (within a month)
- All articles in the board
Currently this page is similar to native PTT article page. The difference is here we improve user experience by using modern web components
I hope you enjoy the introduction so far.
We are looking for more talented engineers / data scientist / marketing / designers to join this community.
Please follow us and give us any feedbacks.