Summarized News Application using TF-IDF

International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018
p-ISSN: 2395-0072
Summarized News Application using TF-IDF
Badreesh Shetty1, Vinayak Shetty2, Rohit Shinde3, Prof. Torana Kamble4
1,2,3 Student,
Final Year Computer Engineering, Navi Mumbai
Dept. of Computer Engineering, Bharati Vidyapeeth College of Engineering, Maharashtra, India
concise and succinct news which is summarized assures
maximum Reader satisfaction.
Abstract - Modern hand held devices such as
smartphones and PDAs have become increasingly powerful
in recent years. Dramatic breakthroughs in processing
power along with the number of extra features included in
these devices have opened the doors to a wide range of
commercial possibilities. In Particular, most cell phones
regularly include cameras, processors comparable to PCs
from only a few years ago, and internet access.
The primary objective of our New Application is
to provide a quality summarized news to the
As mobile devices becomes more like PCs they will come to
replace objects we tend to carry around such as cameras,
mp3 players, credit cards, etc. In short, we will be using
them to accomplish our daily tasks. One application that
falls into this category is the Summarized News App.
To identify the top and trending news and provide
it to the users
Categorize the new into different sub categories
and providing it to the users in effective and easy
The prime objective of “Summarized News Application” is to
create an Android Application that will give a summarized
version of a long article, thereby reducing reading time of
readers and providing only the crux of the article. Reader
can search already loaded articles in the app which have
been summarized. Reader can bookmark articles for further
reference in the app. Articles are categorized for the
Reader’s convenience as the user can access news faster
through Categories. Images are loaded according to
relevance of the article.
Summarize the news so that user’s attention is
not diverted and keeping the news easy and
It’s a very useful and informative app. It keeps one
updated and saves a lot of time by providing news
in just 60 words that can be read in just 30
A. Generation of Summarized News
The Project is developed in Android Studio using Java and
Sublime for PHP with Cmder as Command Panel.
Dedicated Admin Panel with Summarized News
generated from TF-IDF Algorithm. Admin posts in
the relevant categories. News is then visible to all
the users.
Key Words: News, TF-IDF, News Summary, Bookmark,
Push Notification.
The era of mobile technologies opens the windows to the
android app. The websites are vanishing and the mobile
phones are emerging. We are introducing an android
application software which would let us read news and
also provide news in more summarized manner. It is
helpful for those people who doesn’t like to read the whole
news. It acts as an overview of the whole news.
To create an android application for readers to
minimize their reading time. Reader needs to be updated
with the latest news. So the android application should be
easy and efficient to use. Readers have to often read long
articles with the most important details not pointed out. A
© 2018, IRJET
Impact Factor value: 6.171
Fig-1: Generation of Summary
ISO 9001:2008 Certified Journal
| Page 1282
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018
p-ISSN: 2395-0072
B. Categorized News
G. Sharing News
Pre-categorized News is available to the User
based on Relevance.
Summarized News article can be shared to other
Before going through the Algorithm, we first need to
understand how a TF-IDF works. A Term Frequency is a
count of how many times a word occurs in a given
document (synonymous with bag of words). The Inverse
Document Frequency is the the number of times a word
occurs in a corpus of documents. TF-IDF is used to weight
words according to how important they are relevant to
their context. Words that are used frequently in many
documents will have a lower weighting while infrequent
ones will have a higher weighting.
Fig-2: Admin Panel
C. Home Feed
News feed visible to user as per addition of
articles to the Admin Panel. News is updated
Where, w is weightage of word in context to article
tf is term frequency i.e. no of occurrences of word in
df is inverse document frequency i.e. no. of occurrence of
word in a corpus
N is the corpus document
A. Import all Important Libraries
Libraries like math for mathematical functions, re
for Regex functions , requests for requesting URL
Webpage, beautiful soup for scraping important
details, stop words for flushing out stop words,
corpus which is predefined document, sentence
tokenize, word tokenize for tokenizing words as
well as sentences.
Fig-3: Home Feed of Device
D. Push Notification
Notification is sent to Readers where they can
slide to avoid or tap on the news title to get
further article details
B. Request of URL and Scraping of Webpage
Beautiful Soup is used for scraping web pages
which have important details regarding. All text
related to news is scraped leaving behind all
images as well as ad related to the news article
E. Bookmarked News
Favorite News can be bookmarked for further
reference of Reader. This function helps in
reducing time for searching favorite news from
the roster of articles
C. Extracting Page Data
Comment on Article
From requested web page article is extracted for
performing the algorithmic functions. According
to the title website the URL is parsed.
Readers can give their view or opinion on by
commenting on News article. All the Readers can
view these comments on the articles opened.
© 2018, IRJET
Impact Factor value: 6.171
ISO 9001:2008 Certified Journal
| Page 1283
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018
p-ISSN: 2395-0072
D. Summarization of Article
Tokens of Sentences and Words is generated and
stopwords are not taken in to account as they
weight wrong calculations. Frequency of tokens is
found out with respect to the TF-IDF Algorithm.
From sentence ratio the best sentences with have
relevant information is generated. The generated
Summary is then added to the Admin Panel in the
System for posting it on the Android Application.
The application will help the readers in reading
summarized news articles that saves a lot of reading time
and the reader can send important articles to other
readers. It is efficient and always available online.
Thus the application helps in keeping the reader updated
on daily news. It is beneficial as it has no cost and always
available online. Also, you will have a clear idea and
understanding of what is happening in your country and
the whole world.
A. When something is reportable anywhere around the
world the news get updated regularly, where newspapers
get typically printed once or most doubly every day.
Whereas the online news is typically updated whenever
there's one thing value news.
We take this opportunity to express our profound
gratitude and deep regards to our guide Mrs. Torana
Kamble for her exemplary guidance, monitoring and
constant encouragement throughout the course of this
B. Online news are better as compared to regular
newspapers as it saves a lot of time and money. This news
doesn't need to be printed and there is no need for anyone
to deliver them either. They are just published online and
anyone from the world can view them with a few simple
C. News is available in summarized manner for people
who are not interested to read the whole news or want
just an overview of the news.
D. There is no limit to how many articles or news one can
read. With newspapers, people can only read the articles
or news contained in the newspaper.
7.1 Hardware Requirements
Any Snap dragon, Intel
,Exynos Chip
512 MB
7.2 Software Requirements
Operating System
Android Studio, Sublime
Features such as news recommender, news poll, news
based on location, dictionary.
© 2018, IRJET
Impact Factor value: 6.171
ISO 9001:2008 Certified Journal
| Page 1284
Download PDF