-------------------
-------------------
|
A discussion about the utility of the searchers' library
the library (09/12/03 11:18:43)
| |
what can we do with it?
i have printed a number of the articles, and they are of a very high
academic level.
let's say we'd want to implement a personalized spider that makes use
of one of the probabilistic algorithms explained in so many of these
pdfs. we would still need a huge amount of research, to even understand
the article.
well ok, depending on your background you may need more or less, varying
from a beginners course in statistics and graph theory to just a small
checkup on what was the SAT satisfiability problem again.
i'm trying to print the localsearch.pdf now, but i have a hard time convincing
the printer script that it should print it as size A4, not Letter :)
anyway, this is hard stuff. any ideas on how we can put such theory to
work for our searching needs?
just read it - ok, that's easy. slightly boring, even. but perhaps if you read
a large number of these essays you get a general feeling for the developing
of search algorithms, and that might prove very useful indeed.
pick one and work on it - you need to pick just one, because it's a lot of
work. you need to do background research to figure out the algos that are
refered but you don't understand, etc.
dus. :)
- ritz
ritz
|
Re: the library (09/12/03 22:00:55)
| |
Yeah, or we can go one more step, and try to use tools which able to "summarize" those articles. Oh first we need to find/setup/evaluate/ those ones...
I feel EVERY JOB, even the ones we doin' for fun is just chained together with a lot of other jobs, and so on... like a web :-)?
But the interest question IMO is - in this wonderful technoworld of ours, where we have these supercomputers and everything from the net, - what is the shortcut to knowledge? Translate everything to your own language, AND automatically summarize/index them? I am librarian-type, I love to read and learn, but able to read only small part of the stuff I collect. THE ULTIMATE PERSONAL SOFTWARE IS some kind of secretary I think.
And also a nice but different path is reinventing things. Just build your own tool without reading anything, refine it, learn from its weaknesses.
You can try some pdf2txt they goes with ghostscript&TeX I think - if you don't need the fancy printing, only text.
have
|
Re: Re: the library (09/12/03 23:05:24)
| |
| Yeah, or we can go one more step, and try to use tools which able to "summarize" those articles. Oh first we need to find/setup/evaluate/ those ones...
I feel EVERY JOB, even the ones we doin' for fun is just chained together with a lot of other jobs, and so on... like a web :-)? |
.. so we need to find a way to organize a huge web of knowledge.. hmm
i think i know some pdfs that discuss such a thing.. ^_^
and we're back to start..
perhaps we can put a bit of an inductive loop here somewhere.. ehm
bootstrapping anyone? :))))
| But the interest question IMO is - in this wonderful technoworld of ours, where we have these supercomputers and everything from the net, - what is the shortcut to knowledge? Translate everything to your own language, AND automatically summarize/index them? I am librarian-type, I love to read and learn, but able to read only small part of the stuff I collect. THE ULTIMATE PERSONAL SOFTWARE IS some kind of secretary I think.
And also a nice but different path is reinventing things. Just build your own tool without reading anything, refine it, learn from its weaknesses.
You can try some pdf2txt they goes with ghostscript&TeX I think - if you don't need the fancy printing, only text. |
yeah i got pdf2txt, but have you tried to read some of those scientific
papers? they're full of formulas (LaTeX indeed) ..
no, my experience is that these are read better in a comfortable chair, on
paper.. or in the bus, or waiting for it.. or being somewhere else, pretending
you're reading something highly interesting (which it is) :)
i have printed localsearch.pdf, reading it whenever i have time (which is
not right now) .. looks interesting, it's about traversing a 'small world'
exponential linkage graph (like the web, or p2p networks) .. perhaps i can
implement it in a simple spider.. although i never got the spider example
on searchlores to work.. (didn't try really hard, though)..
if anyone else has read any of those library pdfs, i'd love to hear about
their thoughts..
perhaps write some small summary or whatever about it, to link on the
library page?
- ritz
ritz
|
"Just build your own tool without reading anything, refine it, learn from its weaknesses." (10/12/03 10:41:03)
| |
My sentiments exactly. I wanted to implement a proximity search algo and tried a couple of 'academic' papers which were just a load of dingo's kidneys, so I scrapped them and started from scratch. Had two minor rewrites since - to allow for fuzzy and exact string searching.
On the other hand, for the exact string search I will (when I find time to complete it) use an academic paper (also 'crappy' in a way - the guy released three almost identical papers for three concurent years... also, I'm sure that this is a standard algorythm, already existing from some years). Well, it's actually for *substring* searching in a string, but the algo is just what I need.
The major problem I see with these papers is that they are made to sound pompous in order to impress the other empty 'academic' heads. For a person that just wants to make the damn thing work they are tedious to read, hard to quickly evaluate and can probably be replaced with an hour or so serious thinking on the problem. Now, with the more avant-garde problems they may be the only source of reliable knowledge, which sadly means that one has to swallow the tons of crap to find the important bits in there, but hey - nobody said it should be easy :)
The scholars just a decade ago had to battle with library catalogues, countless issues of scientific magazines etc. - we're just lucky :))
Mordred
|
Re: "Just build your own tool without reading anything, refine it, learn from its weaknesses." (10/12/03 15:31:47)
| |
The odd paper or two is actually readable. And this is the only way to popularize a topic. See how Paul Graham popularized naive bayes by explaining the algo nicely in 'a plan for spam'.
Then there is Terry Welch who did lzw, and actually made it readable in 84 I think... this started off the mad lzw boom iirc. Before then no one implemented it coz it was locked up in academic-crap-papers.
There are a million other algorithms that are wonderful to behold and that make me shed a single tear. but most of these are also locked up.
I have a nice idea, for a popularized algo that still hasn't hit the masses, but is rather cool. It is for visualization, and there is a tool called 'spacemonger' that implements it on win9x. This algo, this visualization algo, has much merit. Check it out and think about how it can be used to represent search matches -- clustering, etc.
Enjoy and get cracking!
rai.jack
|
Re: Re: "Just build your own tool without reading anything, refine it, learn from its weaknesses." (10/12/03 19:41:25)
| |
It seems like many of the writers of the "old school" are much more readable than modern technical authors. Good technical writing is characterized by plain speaking and clear thinking. For instance, Claude Shannon's papers are among the most readable you could ask for, as is Knuth, or Henry Baker. Also, all Paul Graham's writings, Ron Rivest, Jeremy Gibbons (functional programming guy), are all excellent technical writers.
There is a correlation between the writing ability and the quality of the ideas, I think. Overwrought writing is maybe a result of trying to inflate poor and undercooked ideas.
sonof
|
Re: Re: Re: Conclusion (10/12/03 22:13:39)
| |
"The major problem I see with these papers is that they are made to sound pompous in order to impress the other empty 'academic' heads. For a person that just wants to make the damn thing work they are tedious to read, hard to quickly evaluate and can probably be replaced with an hour or so serious thinking on the problem. Now, with the more avant-garde problems they may be the only source of reliable knowledge, which sadly means that one has to swallow the tons of crap to find the important bits in there, but hey - nobody said it should be easy :)" "There is a correlation between the writing ability and the quality of the ideas, I think. Overwrought writing is maybe a result of trying to inflate poor and undercooked ideas."
So there is a chance to write:
1. a parser, which classify the document and ring the alarm if it is "Overwrought"? Make a diff between wordlists from "good" and "bad" articles, fish the "stopwords" from the bad one.
2. Or a dumb summarizer/distiller which change exuberant terms to simply ones ( or if they mean nothing then to nothing ). A such-working tool is Solvay's Newspeak.
have
|
Alternatively, you could play bingo with them :) (11/12/03 16:34:21)
| |
http://www.hobotraveler.com/wankwordbingo.htm
Mordred
|
Re: Re: Spacemonger (10/12/03 21:48:34)
| |
I love Spacemonger, and use it, even thought about make static pictures from the different levels of its output, and use them like imagemaps on the front of a CD-navigating system. I made something like a software-encyclopedia ( software organized in some logic). Now that's interest with Spacemonger you can see that the class "A" is half-size of the class "B", "C" is the tenth of the whole. Great program. In my collection in the same level like Spacemonger there is "Scanner" by Steffen Gerlach, check out that one if you want.
have
|
http://www.tigerbliss.com/disk_analysis.html (11/12/03 00:31:15)
| |
Can you explain your CD-navigating system and your software-encyclopedia? It sounds interesting. Thanks!
rai.jack
|
yet another: http://www.methylblue.com/filelight/ (n/t) (11/12/03 00:52:46)
| |
sonof
|
Re: explain... (12/12/03 00:13:37)
| |
Now I remember Sequiaview, I don't like it so much like Spacemonger.
The navigating system is nothing interest. If you do not understand it from my post above it is my fault. So how you "navigate" your HD with Spacemonger is clean for you. Now if you make a CD that is a static thing. You can make screenshots from Spacemonger's output on every level of the directory-tree from your wished (virtual)CD-root, then make those pictures to imagemaps on html-pages, organize the pages to some structure, burn them together with the original data. What you have in the end is Spacemonger's functionality through a browser without using the program itself ( make the CD a bit more platform-independent ).
The software-encyclopedia is the same line like up here: I didn't see any solution I loved, so I worked on my own. The target ( was ), to pick the best/freeware tools to doing fundamental things on PC and organize them some reusable form for the younger ones, teach them to not fear the machine, to find simple solutions to problems which look too difficult for a 'user/student', to encourage them. It is a sketch, I wish to split it to a 'necessary' and a 'recommended' pack maybe ( or kid and adult ). The future big thing maybe include docs and programs in each section, so you can read the iso-doc, and try the tool like isobuster. The main nodes are:
1byteorganization ( partitions, fileformats, crc, backup )
2packing ( data-in/out, program-in/out )
3visualization ( viewers, editors )
4analyze ( data, program, diffs )
5search ( offline, online, database )
6convert ( automatic-data/program, manual-sed,awk )
7tools ( offline, online )mostly the programs which cleans up windows's shit
8advanced ( registry, filemanagers, inctrl )
9progtools ( dummys, winshow-likes, PE-muckers )
Ainterne ( defense, offense, lowlevel )
Bteaching ( geogr., radcarbon, ET-count, pi, calculators )
Csecurity ( some crypto-tool )
AppendixA ( needed, missed dlls, maybe later with scripts )
AppendixB BOOKS ( comp-related )
AppendixC WEB_MIRROR
have
|
Portal
© 1952-2032 Fravia's searchlore, all rights reserved, all wrongs reversed
|